I have recently written about the importance of healthy data pipelines to ensure data is integrated and processed in the sequence required to generate business intelligence, and the need for data pipelines to be agile in the context of real-time data processing requirements. Data engineers, who are responsible for monitoring, managing and maintaining data pipelines, are under increasing pressure to deliver high-performance and flexible data integration and processing pipelines that are capable of handling the rising volume and frequency of data. Automation is a potential solution to this challenge, and several vendors, such as Ascend.io, have emerged in recent years to reduce the manual effort involved in data engineering.
Ascend.io was founded in 2015 to take the drudgery out data engineering through automation. The resulting Ascend Data Automation Cloud delivers a single platform for data ingestion, transformation, orchestration, observability and delivery, with a declarative approach to data pipeline creation and management. The Ascend Data Automation Cloud can be deployed to Amazon Web Services, Microsoft Azure and Google Cloud Platform, and is available as a hosted service or deployed in a customer’s existing, self-managed cloud account. The overall goal is to improve the productivity of data engineers. This is achieved by providing an interface to develop, test and productionize data pipelines in coordination with continuous integration and continuous delivery development processes as part of a DataOps approach to data management.
DataOps encompasses automated data monitoring and the continuous delivery of data into operational and analytical processes, and is being rapidly adopted to deliver more agile data integration and preparation. I assert that through 2025, awareness of DataOps will continue to increase as organizations adapt data integration and engineering processes to meet the growing need for continuous and automated data ingestion, transformation and delivery. Ascend.io has attracted customers such as electric vehicle charging company Be Power, men’s grooming company Harry’s, workplace furnishings firm HNI, beauty product provider Mayvenn and newspaper publisher New York Post. The company has also attracted the attention of investors, and in April 2022 announced a $31 million series B funding round led by Tiger Global with Shasta Ventures and Accel. Ascend.io will use the latest funding round to accelerate its go-to-market capabilities and fund geographic expansion. The financing will also fund the further development of the Ascend Data Automation Cloud with a specific focus on data pipeline automation across multi-cloud data mesh environments.
Data pipelines are used to transport and transform data to support data processing and analytics requirements. Traditionally the transportation of data between operational data platforms and analytic data platforms has been via batch data management processes. However, data-driven organizations are increasingly treating the steps involved in extracting, integrating, aggregating, preparing, transforming and loading data as a continual process, with data pipelines used to enable the flow of information through the organization, increasingly scheduled, automated and orchestrated by data engineers without the need for constant manual intervention. I assert that by 2024, 6 in ten organizations will adopt data engineering processes that span data integration, transformation and preparation, producing repeatable data pipelines that create more agile information architectures.
Ascend.io’s Data Automation Cloud is primarily targeted at data engineers and is designed to enable them to define dataflows using SQL, Scala, Python or Java. These dataflows are directed acyclic graphs that specify data source(s), Apache Spark-based transformations and outputs for the transformed data (such as a data warehouse, data lake or analytics tool).
Ascend Data Automation Cloud addresses five key areas of functionality: data ingestion, transformation, orchestration, observability and delivery.
- For data ingestion, it offers connectivity to a variety of data sources, including data lakes, data warehouses, databases, application programming interfaces, event streaming and message queuing systems via a library of pre-built data connectors as well as the ability to create reusable custom code connectors using Python. The product also supports cross-cloud data replication with automated data detection, data profiling and data reformatting as well as incremental data propagation.
- For data transformation, Ascend Data Automation Cloud features a visual data pipeline builder as well as a software development toolkit and the ability to design pipelines with simple, declarative definitions. The declarative approach supports business agility by abstracting the pipeline from the underlying resources. Any stage of a data pipeline can be treated as a query-able table, enabling ad hoc queries to be run against pipeline stages as part of the prototyping process.
- Data orchestration capabilities include resource-aware pipeline orchestration, support for continuously running workloads and automated rollback and backfilling, while multi-DAG orchestration enables pipelines to be connected together with upstream changes automatically reflected in any dependent downstream pipelines.
- Data observability is critical in monitoring the quality and reliability of dataflows used for analytics and governance projects. Ascend.io provides real-time pipeline visibility and alerting, along with lineage tracking and change detection, and automated lineage tracing.
- For data delivery, Ascend.io’s Data Automation Cloud provides connectors for Jupyter and Zeppelin notebooks as well as support for business intelligence and visualization tools, along with replicated delivery of data to multiple data platforms to support numerous applications.
Ascend.io has a relatively low profile and could expand on nascent attempts to articulate the broader value of Ascend Data Automation Cloud to improve agility and accelerate the delivery of business value for data-driven organizations and departments. While automation is a key aspect of the offering, there may be the potential to add further value using machine learning. The company has, however, assembled a broad and deep portfolio of capabilities to address the requirements data engineers have in relation to managing and automating data pipelines, and I recommend that organizations evaluate Ascend.io and the Ascend Data Automation Cloud when exploring opportunities to improve data agility through automation and orchestration.