I have written about the increased demand for data-intensive operational applications infused with the results of analytic processes, such as personalization and artificial intelligence-driven recommendations. I previously described the use of hybrid data processing to enable analytics on application data within operational data platforms. As is often the case in the data platforms sector, however, there is more than one way to peel an orange. Recent years have also seen the emergence of several analytic data platforms that deliver real-time analytic processing suitable for data-intensive operational applications.
The analytic data platforms sector is traditionally associated with specialist analytic databases used to support applications that analyze the business, including decision support, business intelligence, data science and artificial intelligence and machine learning. Typically, these are deployed in the form of data warehouses and data marts. In recent decades data lakes and data lakehouses have emerged as means of performing analytics on semi-structured and unstructured data unsuitable for storing and processing in a data warehouse. Sometimes these data lakes and lakehouses are used to replace a data warehouse, although data lake and data warehouse environments coexist for almost three-quarters of participants (74%) in Ventana Research’s Data Lakes Dynamics Insights research.
Collectively, these analytic data platforms are designed to complement operational data platforms, which support applications used to run the business. To protect the performance of both operational and analytic workloads, traditional architectures have involved the extraction, transformation and loading of data from one or more operational data platform into a dedicated analytic data platform, enabling operational and analytic workloads to run concurrently without adversely impacting each other.
This approach of separated workloads is well-aligned with the requirements of traditionally distinct transactional operational applications as well as business intelligence reports, dashboards and exploratory data science. However, it is less well-suited to the requirements of intelligent applications that, while operational in nature, rely on real-time analytic processing to deliver functionality, including contextually relevant recommendations, predictions and forecasting.
Real-time analysis of data is still relatively rare in most organizations. Less than one-quarter of participants in Ventana Research’s Analytics and Data Benchmark Research (22%) say their organizations analyze the data they collect in real time. However, there is a growing list of applications that require real-time processing of data.
Consumers are increasingly engaged with data-driven services from the likes of Airbnb, DoorDash, ING Bank, Netflix, Spotify and Uber that are differentiated by personalization and contextually relevant recommendations. Additionally, worker-facing applications are also increasingly infused with personalization and contextually relevant information, targeting users based on their roles and responsibilities. Specifically, the need to ETL operational data into an external data platform for analytic processing before returning results to the operational application make it unsuitable for operational applications that rely on real-time processing of data. My colleagues have written about the importance of real-time data processing and intelligence in customer experience, people analytics, finance analytics and marketing and sales effectiveness, while other examples of these applications include fraud detection and prevention in credit approval processes, predictive maintenance and anomaly detection in manufacturing and IoT and supply chain planning and forecasting in retail and logistics.
We know that operational data platforms with hybrid data processing capabilities can be used to develop and deploy these intelligent applications. Another alternative is to use one of the new breed of real-time analytic databases targeted at application developers responsible for developing real-time intelligent applications. It may seem counter-intuitive to use an analytic database to support an operational application, and it is early stages for adoption of these real-time database products. But given the growing demand for applications that require real-time processing of data, I assert that by 2026, more than one-half of organizations will adopt specialist real-time analytic data platforms to develop and support data-intensive operational applications. There are already some significant organizations using them. CelerData, ClickHouse, Imply, Rockset and StarTree can, between them, point to customers including Lenovo, Trip.com, eBay, Spotify, Uber, Netflix, TrueCar, Allianz, Seesaw, Just Eat, and Stripe. Additionally, there are many more organizations using open source projects including Apache Druid, Apache Pinot and StarRocks.
There are some similarities in these products: They are all OLAP databases designed to deliver second query latency at high concurrency, and they are all primarily available as managed database-as-a-service. However, there are also some key differences, including support for real-time joins, support for streaming data ingestion, elastic scalability, high-concurrency query performance, compute resource efficiency, development agility, support for real-time data updates and deletes and support for nested data. Depending on the nature of the application being developed, these differences could have big implications. The difference between query performance that is sub-second queries and sub 100 milliseconds is enormous, for example. I recommend that organizations considering options for the development of intelligent applications include real-time analytic databases alongside operational databases in evaluations, and pay close attention as to whether a product can meet query performance requirements regardless of whether it is considered operational or analytic database.