Data Operations Improves Trust in Data for Analytics and AI

Written by Matt Aslett | Jun 12, 2024 10:00:00 AM

Enterprises are embracing the potential for artificial intelligence (AI) to deliver improvements in productivity and efficiency. As they move from initial pilots and trial projects to deployment into production at scale, many are realizing the importance of agile and responsive data processes, as well as tools and platforms that facilitate data management, with the goal of improving trust in the data used to fuel analytics and AI. This has led to increased attention on the role of data operations (DataOps) and its role in the application of agile development, development operations (DevOps) and lean manufacturing by data engineering professionals in support of data production. I assert that through 2026, more than one-half of enterprises will have adopted agile and collaborative DataOps practices to facilitate responsiveness, avoid repetitive tasks and deliver measurable data reliability improvements.

DataOps has been part of the lexicon of the data market for almost a decade, with the term used to describe a set of tools, practices and philosophy used to ensure the quality, flexibility and reliability of data and analytics initiatives. DataOps also encompasses the development, testing, deployment and orchestration of data integration and processing pipelines, along with improved data quality and validity via data monitoring and observability. These were the key capabilities that we used to assess software providers in our 2023 Buyers Guide for DataOps, which provides an assessment of how well providers’ offerings meet buyers’ requirements. The research was comprised of parallel evaluations of products addressing each of three core areas of functionality: data pipelines, data orchestration and data observability.

The development, testing and deployment of data pipelines is a fundamental accelerator of data-driven strategies, enabling enterprises to extract data from the operational applications and data platforms designed to run the business and load, integrate and transform it into the analytic data platforms and tools used to analyze the business. Healthy data pipelines are necessary to ensure data is integrated and processed in the sequence required to generate business intelligence (BI) and support the development and deployment of applications driven by AI. Traditionally, data pipelines have involved batch extract, transform and load (ETL) processes, but the need for real-time data processing is driving demand for continuous data processing and more agile data pipelines that are adaptable to changing business conditions and requirements, including the increased reliance on streaming data and events.

Data orchestration enables the flow of data across the organization by automating and coordinating the creation, scheduling and monitoring of data pipelines. This is increasingly important given the growing complexity of evolving data sources and requirements. At the highest level of abstraction, data orchestration covers three key capabilities: data collection (including data ingestion, preparation and cleansing); data transformation (additionally including integration and enrichment); and data activation (making the results available to compute engines, analytics and data science tools or operational applications). Data orchestration has the potential to drive improved efficiency and agility in data and analytics projects, whether deployed stand-alone or embedded in larger data engineering platforms.

Data observability software is also a critical aspect of data-driven decision-making, addressing one of the most significant impediments to generating value from data by providing an environment for monitoring the quality and reliability of data. Maintaining data quality and trust is a perennial data management challenge, often preventing enterprises from operating at the speed of business. Despite almost two-thirds (64%) of participants in Ventana Research’s Analytics and Data Benchmark Research citing reviewing data for quality issues as being the most time-consuming aspect of analytics initiatives, less than one-quarter (22%) are very confident in the quality of data being generated by data preparation efforts. Data observability automates the monitoring of data freshness, distribution, volume, schema and lineage, as well as the reliability and health of the overall data environment.

Software providers with products that address at least two of these core areas of functionality were deemed to provide a superset of functionality to address DataOps overall. To get the most out of DataOps, I recommend that enterprises evaluate tools and technologies in relation to support for agile and collaborative practices, with an emphasis on continuous measurable improvement, as well as collaboration and automation. In comparison with traditional data management products, DataOps tools emphasize capabilities such as continuous delivery of analytic insights, process simplification, code generation, automation to avoid repeated errors and reduce repetitive tasks, the incorporation of stakeholder feedback and advancement, and measurable improvement in the efficient generation of insights from data. I also recommend that enterprises evaluate products in relation to the identification of data quality issues and the provision of recommendations for remediation and prevention. Enterprises should also be aware that products are only one aspect of delivering on the promise of DataOps. New approaches to people, process and information are also required to deliver agile and collaborative development, testing and deployment of data and analytics workloads, as well as DataOps.

Regards,

Matt Aslett

View full post