Data Operations (DataOps) has been part of the lexicon of the data market for almost a decade, with the term used to describe products, practices and processes designed to support agile and continuous delivery of data analytics. DataOps takes inspiration from DevOps, which describes a set of tools, practices and philosophy used to support the continuous delivery of software applications in the face of constant changes. DataOps describes a set of tools, practices and philosophy used to ensure the quality, flexibility and reliability of data and analytics initiatives, with an emphasis on continuous measurable improvement, as well as agility, collaboration and automation. Interest in products and services that support DataOps is growing. I assert that by 2025, one-half of organizations will have adopted a DataOps approach to their data engineering processes, enabling them to be more flexible and agile.
It is the focus on agility, collaboration and automation that separates DataOps from traditional approaches to data management, which were typically based on tools and practices that were batch-based, manual and rigid. This distinction between DataOps and traditional data management tools seems clear in theory. In reality, there is a level of opacity as traditional data management vendors have incorporated capabilities in recent years that make their products more automated, collaborative and agile. How agile, collaborative and automated must products be to be considered part of the DataOps category? Confusion over the definition of DataOps is not helped by multiple vendors in the industry using the term in a self-serving manner, including or excluding characteristics to the benefit of their specific product.
From our perspective, there are two categories of DataOps definitions: broad and narrow. The broad definition of DataOps focuses on the higher-level principles and describes DataOps as the combination of people, process and technology needed to automate the delivery of data to users in an organization and enable collaboration to facilitate data-driven decisions. This definition is broad enough that it could be interpreted to encompass all products and services that address data management and data governance, including many traditional batch-based, manual products, as well as agile, collaborative and automated tools.
A narrower definition of DataOps focuses on the practical application of agile development, DevOps and lean manufacturing to the tasks and skills employed by data engineering professionals in support of data analytics development and operations. This narrow definition emphasizes specific capabilities such as continuous delivery of analytic insight, process simplification, code generation, automation to avoid repeated errors and reduce repetitive tasks, stakeholder feedback and advancement, and measurable improvement in the efficient generation of insight from data. As such, this narrow definition can be used to create a defined set of criteria for agile and collaborative practices that products and services can be measured against.
Ventana Research’s perspective, based on our interaction with the vendor and user communities, aligns with the narrow definition. While traditional data management and data governance are complementary, our DataOps coverage focuses specifically on the delivery of agile business intelligence (BI) and data science through the automation and orchestration of data integration and processing pipelines, incorporating improved data reliability and integrity via data monitoring and observability. To be more specific, we believe that DataOps products and services provide functionality that addresses a particular set of capabilities: agile and collaborative data operations; the development, testing and deployment of data and analytics workloads; data pipeline orchestration; and data pipeline observability. These are the key criteria that we will be using to assess DataOps products and services in our forthcoming 2023 DataOps Value Index study. This will consist of three parallel evaluations: data orchestration; data observability; and overall DataOps platforms. Additionally, we recognize that some DataOps products and services also address monitoring the underlying data infrastructure. We see this as being part of the overall data operations discipline but complementary to DataOps’ primary focus on data pipeline development, deployment, orchestration and observability.
I have written before about the importance of data pipeline orchestration to facilitating data-driven analytics. Data orchestration provides the capabilities to automate and accelerate the flow of data from multiple sources to support analytics initiatives and drive business value. At the highest level of abstraction, data orchestration covers three key capabilities: collection (including data ingestion, preparation and cleansing); transformation (additionally including integration and enrichment); and activation (making the results available to compute engines, analytics and data science tools, or operational applications). By 2026, more than one-half of organizations will adopt data orchestration technologies to automate and coordinate data workflows and increase efficiency and agility in data and analytics projects.
I have also previously written about the importance of data pipeline observability to ensuring healthy data pipelines and high data quality. Monitoring the quality and reliability of data used for analytics and governance projects is not new, but data pipeline observability utilizes machine learning (ML) to automate the monitoring of data used for analytics projects to ensure that it is complete, valid and consistent, as well as relevant and free from duplication. Data pipeline observability also addresses monitoring not just the data stored in an individual data warehouse or data lake, but also the associated upstream and downstream data pipelines.
Data orchestration and data observability products address two of the most significant impediments to generating value from data. Participants in Ventana Research’s Analytics and Data Benchmark Research cite preparing data for analysis (69%) and reviewing data for quality and consistency issues (64%) as the two most time-consuming tasks in analyzing data. As always, however, products are only one aspect of delivering on the promise of DataOps. New approaches to people, process and information are also required to deliver agile and collaborative development, testing and deployment of data and analytics workloads, as well as data operations. I recommend that organizations seeking to improve the value that they are generating from their analytics and data initiatives investigate the potential benefits of data pipeline orchestration and data pipeline observability products alongside processes and methodologies that support rapid innovation and experimentation, automation, collaboration, measurement and monitoring, and high data quality.