Matt Aslett's Analyst Perspectives

CelerData Enables Real-Time Analytics for the Data Lakehouse

Written by Matt Aslett | Jun 21, 2023 10:00:00 AM

Organizations increasingly rely on real-time analytics to make informed decisions and stay competitive in today’s data-driven business landscape. As the complexity of data grows with the continuous addition of diverse sources, customers and workers alike expect real-time responsiveness. Accelerated query performance is crucial to process and extract valuable insights from data in a timely manner. Traditional analytics applications are often insufficient for managing the scale, velocity and variety of data organizations generate across various platforms.

Our Analytics and Data Benchmark Research shows that less than one-quarter (22%) of organizations currently analyze data in real time. As a result, for most businesses, a competitive opportunity could go unnoticed. CelerData is an analytic data platform vendor that offers real-time support for lakehouse architecture.

CelerData’s high-performance analytical database supports real-time and high-concurrency workloads. The platform enables organizations to run real-time analytics by querying streaming and historical data without combining streaming data into batches. Its advanced query engine supports thousands of concurrent users at 10,000 queries per second.

The company has built its commercial data platform around the open-source project StarRocks, a massive, parallel, online analytical processing database that enables real-time queries on analytics workloads. In February, CelerData announced that the StarRocks Project is now part of the Linux Foundation of open source technologies. The move to Linux signifies StarRocks' commitment to growth through community-driven development.

CelerData offers two commercial products built on StarRocks: CelerData Enterprise and CelerData Cloud. CelerData Enterprise provides a data platform for developing applications that rely on the ability to query data in real time. Central to this capability is StarRocks’ native vectorized query engine, which simplifies data ingestion pipelines, improving data freshness and potentially lowering extract, transfer and load costs. Another key component is StarRocks’ cost-based optimization functionality with support for materialized views and multi-table joins. Above and beyond the core data processing functionality of StarRocks, CelerData Enterprise provides complementary security, auto-deployment and administration functionality.

CelerData Cloud is the company’s managed service offering that enables organizations to accelerate query performance and utilize cloud-native capabilities, including automated and elastic resource management and the separation of compute and storage. CelerData Enterprise and CelerData Cloud also take advantage of StarRocks’ industry-standard SQL and compatibility with the MySQL protocol, enabling users to maximize existing business intelligence tools and applications by using MySQL drivers.

CelerData recently announced the latest version of its enterprise analytics platform, CelerData Version 3. This release enables users to analyze data stored in data lakes via support for the Apache Hive, Apache Iceberg, Apache Hudi and Delta Lake table format. Users can take advantage of this functionality to query external data sources, including Hadoop Distributed File System and Amazon Simple Storage Service, without moving the data. Version 3 also introduces ACID transaction control and data governance for handling batch and real-time data. CelerData 3’s cloud-native architecture uses cloud-object storage to improve reliability and reduce storage cost. Organizations mostly rely on data ingestion for analytics. The process involves importing large datasets from multiple sources into a single source before analyzing it. CelerData simplifies the process by streamlining data pipelines and addressing challenges related to denormalized tables. This enables engineering teams and data users to make real-time analytics manageable.

I recently explored how real-time analytic data platforms can be used to develop and support data-intensive operational applications. While it may seem counter-intuitive to use an analytic database to support an operational application, the level of intelligence that is integrated into the next generation of applications in the form of contextually relevant recommendations, predictions and forecasting means that many organizations are challenging prevailing assumptions. I assert that by 2026, more than one-half of organizations will adopt specialist real-time analytic data platforms to develop and support data-intensive operational applications.

By adopting streaming analytics, organizations can rapidly respond to emerging opportunities, capitalize on market trends, mitigate risks and proactively address potential threats. CelerData was formed in August 2022 and is still improving its product to meet the evolving needs of organizations. The company should continue to invest in developing its data analytics platform and automation capabilities. With CelerData, organizations can bring data analytics to a unified platform, enabling real-time insights on data lakehouse architecture. It offers the capability to conduct high-performance analytics without joining data, enhancing efficiency and productivity. I recommend that organizations looking for a flexible data analytics platform that queries operational applications and the data lakehouse in real time consider CelerData.

Regards,

Matt Aslett