Welcome back -

Services for Organizations

Using our research, best practices and expertise, we help you understand how to optimize your business processes using applications, information and technology. We provide advisory, education, and assessment services to rapidly identify and prioritize areas for improvement and perform vendor selection

Consulting & Strategy Sessions

Ventana On Demand

    Services for Investment Firms

    We provide guidance using our market research and expertise to significantly improve your marketing, sales and product efforts. We offer a portfolio of advisory, research, thought leadership and digital education services to help optimize market strategy, planning and execution.

    Consulting & Strategy Sessions

    Ventana On Demand

      Services for Technology Vendors

      We provide guidance using our market research and expertise to significantly improve your marketing, sales and product efforts. We offer a portfolio of advisory, research, thought leadership and digital education services to help optimize market strategy, planning and execution.

      Analyst Relations

      Demand Generation

      Product Marketing

      Market Coverage

      Request a Briefing

        Matt Aslett's Analyst Perspectives

        << Back to Blog Index

        Soda Provides Collaborative Approach to Data Observability

        Data observability was a hot topic in 2022 and looks likely to be a continued area of focus for innovation in 2023 and beyond. As I have previously described, data observability software is designed to automate the monitoring of data platforms and data pipelines, as well as the detection and remediation of data quality and data reliability issues. There has been a Cambrian explosion of data observability software vendors in recent years, and while they have fundamental capabilities in common, there is also room for differentiation. One such vendor is Soda Data, which offers an open-source platform for self-service data observability that is focused on facilitating collaboration between business decision-makers and data teams responsible for generating and managing data to improve trust in data.

        Headquartered in Brussels, Belgium, Soda was founded in late 2018 by CEO Maarten Masschelein and CTO Tom Baeyens, seasoned executives in the European software sector,Ventana_Research_Analytics_and_Data_Benchmark_Research_Most_Time_in_Analytics_Process_20221031 (1)-1 with experience at companies including Collibra, JBoss and Alfresco. The company is focused on a perennial data management challenge: reducing the amount of time that organizations spend dealing with data quality and reliability issues. Although manual software products have been used for many years to detect and fix data quality problems, these are no longer efficient given the increasing reliance of organizations on data pipelines to ensure data-driven decision-making. Almost two-thirds of participants (64%) in our Analytics and Data Benchmark Research cited reviewing data for quality issues as being one of the most time-consuming aspects of analytics initiatives, second only to preparing data for analysis. Data observability takes advantage of machine learning (ML) to automate the monitoring and remediation of data quality issues. It is a key element of Data Operations (DataOps), alongside data orchestration. Like all data observability products, Soda’s offering is designed to enable organizations to detect and fix data quality issues. The company has differentiated its offering by focusing on the needs of data consumers, including data analysts and business decision-makers, as well as data producers, including data engineers and IT teams. Soda is particularly focused on midsized customers that are mature in their use of data and are reliant on data and data teams. A prime example of its growing number of customers is HelloFresh. The company has also attracted the interest of investors, raising 11.5 million euros ($13.9 million) in its February 2021 Series A funding round, provided by Singular, Point Nine Capital, Hummingbird Ventures, DCF and angel investors.

        While data quality is a persistent data management challenge, Soda’s founders identified it as an increasingly critical problem based on the growing reliance of organizations on dataVentana_Research_2023_Assertion_DataOps_Data_Reliability_47_S engineering, as well as a lack of tooling available to software engineers to manage, monitor and fix data quality issues. Soda’s founders recognized that while engineers are typically responsible for maintaining data reliability, the tools available to them were primarily focused on testing and monitoring data ingestion and integration pipelines, rather than the quality of the data inputs and outputs. “Garbage in, garbage out” is an age-old concept in computer science. A data pipeline functioning as expected does not provide any guarantees as to whether the data generated by the pipeline can be relied upon for decision-making. As such, we are seeing increased interest in data observability to complement data pipeline orchestration. I assert that through 2025, 6 in 10 organizations will invest in data reliability initiatives to improve trust in data through automated data quality monitoring, alerts and resolution.

        Soda’s founders also identified that while data teams are responsible for maintaining data reliability, the arbiters of data quality in any organization are not data engineers but data consumers, such as data analysts and business decision-makers. The company’s approach to data observability, Soda Cloud, is designed to be a self-service platform that empowers and incentivizes everyone in an organization to participate in improving data quality. Soda Cloud provides an environment through which data consumers can set expectations for data quality by defining data quality agreements, as well as take responsibility for automatically generated alerts related to data quality issues, investigate and report incidents. Data producers can use Soda Cloud to prioritize and resolve incidents using root-cause analysis functionality, as well as take steps to prevent repetition via the implementation of circuit breakers within data pipelines. These are checks that stop the data pipeline in the event of failure until the related data has been reviewed.

        Specifically, Soda Cloud offers a low-code environment for data consumers to define data quality agreements and write data quality checks using SodaCL (Soda Checks Language), a human-readable, domain-specific language for data quality management. Another key element of Soda Cloud is Soda Core, an open-source command line tool which connects to and scans the source data platforms (Amazon Athena, Amazon Redshift, Apache Spark, Databricks, Google BigQuery, PostgreSQL and Snowflake). Soda Core converts the quality checks written in SodaCL into SQL queries that are executed against the relevant datasets to identify invalid, missing or unexpected data. Soda Core is also responsible for sending the metadata related to issues identified by data quality checks to Soda Cloud, where it can be monitored and reviewed by users. Soda also provides Soda Agent, a container environment with an instance of Soda Core that can be installed in a customer’s own cloud environment. Soda Cloud provides integration with data catalogs (Alation, Amundsen, Collibra and Metaphor), data orchestration tools (Apache Airflow, Dagster, dbt and Prefect), incident management tools (Jira, Opsgenie, PagerDuty and ServiceNow), business intelligent dashboards (Google Looker, Microsoft PowerBI and Salesforce Tableau), and collaborative communication applications (Microsoft Teams and Salesforce Slack). Licensing options for Soda Cloud include Soda Team for small groups, as well as Soda Enterprise, which provides availability for all users in an organization.

        We are still at the early stages of adoption of data observability technology, and while customer interest is growing, driven by an increased focus on data reliability as well as agile, automated DataOps tooling, so is the number of competing vendors. Soda’s focus on facilitating agreements between data consumers and data producers is a differentiator and reflects the importance of data users in identifying data quality concerns. While SodaCL is human-readable, there is the potential to lower barriers to adoption for data consumers with a more visual no-code approach. I recommend that organizations exploring approaches to improving data reliability should evaluate the emerging data observability providers, including Soda, to understand how they can facilitate greater trust in data and accelerate data-driven business decisions.


        Matt Aslett


        Matt Aslett
        Director of Research, Analytics and Data

        Matt Aslett leads the software research and advisory for Analytics and Data at Ventana Research, now part of ISG, covering software that improves the utilization and value of information. His focus areas of expertise and market coverage include analytics, data intelligence, data operations, data platforms, and streaming and events.


        Our Analyst Perspective Policy

        • Ventana Research’s Analyst Perspectives are fact-based analysis and guidance on business, industry and technology vendor trends. Each Analyst Perspective presents the view of the analyst who is an established subject matter expert on new developments, business and technology trends, findings from our research, or best practice insights.

          Each is prepared and reviewed in accordance with Ventana Research’s strict standards for accuracy and objectivity and reviewed to ensure it delivers reliable and actionable insights. It is reviewed and edited by research management and is approved by the Chief Research Officer; no individual or organization outside of Ventana Research reviews any Analyst Perspective before it is published. If you have any issue with an Analyst Perspective, please email them to ChiefResearchOfficer@ventanaresearch.com

        View Policy

        Subscribe to Email Updates

        Analyst Perspectives Archive

        See All