I have written recently about increased demand for data-intensive applications infused with the results of analytic processes, such as personalization and artificial intelligence (AI)-driven recommendations. Almost one-quarter of respondents (22%) to Ventana Research’s Analytics and Data Benchmark Research are currently analyzing data in real time, with an additional 10% analyzing data every hour. There are multiple data platform approaches to delivering real-time data processing and analytics and more agile data pipelines. These include the use of streaming and event data processing, as well as the use of hybrid data processing to enable analytics to be performed on application data within operational data platforms. Another approach, favored by a group of emerging vendors such as Rockset, is to develop these data-intensive applications on a specialist, real-time analytic data platform specifically designed to meet the performance and agility requirements of data-intensive applications.
I recently noted that as demand for real-time interactive applications becomes more pervasive, the use of streaming data is becoming more mainstream. Streaming data and event processing has been part of the data landscape for many decades, but for much of that time, data streaming was a niche activity. Although adopted in industry segments with high-performance, real-time data processing and analytics requirements such as financial services and telecommunications, data streaming was far less common elsewhere. That has changed significantly in recent years, fueled by the proliferation of open-source and cloud-based streaming data and event technologies that have lowered the cost and technical barriers to developing new applications able to take advantage of data in-motion. This is a trend we expect to continue, to the extent that streaming data and event processing becomes an integral part of mainstream data-processing architectures.
I have recently written about the importance of healthy data pipelines to ensure data is integrated and processed in the sequence required to generate business intelligence, and the need for data pipelines to be agile in the context of real-time data processing requirements. Data engineers, who are responsible for monitoring, managing and maintaining data pipelines, are under increasing pressure to deliver high-performance and flexible data integration and processing pipelines that are capable of handling the rising volume and frequency of data. Automation is a potential solution to this challenge, and several vendors, such as Ascend.io, have emerged in recent years to reduce the manual effort involved in data engineering.
I recently explained how emerging application requirements were expanding the range of use cases for NoSQL databases, increasing adoption based on the availability of enhanced functionality. These intelligent applications require a close relationship between operational data platforms and the output of data science and machine learning projects. This ensures that machine learning and predictive analytics initiatives are not only developed and trained based on the relationships inherent in operational applications, but also that the resulting intelligence is incorporated into the operational application in real time to support capabilities such as personalization, recommendations and fraud detection. Graph databases already support operational use cases such as social media, fraud detection, customer experience management and recommendation engines. Graph database vendors such as Neo4j are increasingly focused on the role that graph databases can play in supporting data scientists, enabling them to develop, train and run algorithms and machine learning models on graph data in the graph database, rather than extracting it into a separate environment.