I have recently written about the organizational and cultural aspects of being data-driven, and the potential advantages data-driven organizations stand to gain by responding faster to worker and customer demands for more innovative, data-rich applications and personalized experiences. I have also explained that data-driven processes require more agile, continuous data processing, with an increased focus on extract, load and transform processes — as well as change data capture and automation and orchestration — as part of a DataOps approach to data management. Safeguarding the health of data pipelines is fundamental to ensuring data is integrated and processed in the sequence required to generate business intelligence. The significance of these data pipelines to delivering data-driven business strategies has led to the emergence of vendors, such as Astronomer, focused on enabling organizations to orchestrate data engineering pipelines and workflows.
The data catalog has become an integral component of organizational data strategies over the past decade, serving as a conduit for good data governance and facilitating self-service analytics initiatives. The data catalog has become so important, in fact, that it is easy to forget that just 10 years ago it did not exist in terms of a standalone product category. Metadata-based data management functionality has had a role to play within products for data governance and business intelligence for much longer than that, of course, but the emergence of the data catalog as a product category provided a platform for metadata-based data inventory and discovery that could span an entire organization, serving multiple departments, use cases and initiatives.
I recently wrote about the need for organizations to take a holistic approach to the management and governance of data in motion alongside data at rest. As adoption of streaming data and event processing increases, it is no longer sufficient for streaming data projects to exist in isolation. Data needs to be managed and governed regardless of whether it is processed in batch or as a stream of events. This requirement has resulted in established data management vendors increasing their focus on streaming data and event processing through product development as well as acquisitions. It has also resulted in streaming and event specialists, such as Confluent, adding centralized management and governance capabilities to their existing offerings as they seek to establish or reinforce the strategic importance of streaming data as part of a modern approach to data management.
I have written recently about increased demand for data-intensive applications infused with the results of analytic processes, such as personalization and artificial intelligence (AI)-driven recommendations. Almost one-quarter of respondents (22%) to Ventana Research’s Analytics and Data Benchmark Research are currently analyzing data in real time, with an additional 10% analyzing data every hour. There are multiple data platform approaches to delivering real-time data processing and analytics and more agile data pipelines. These include the use of streaming and event data processing, as well as the use of hybrid data processing to enable analytics to be performed on application data within operational data platforms. Another approach, favored by a group of emerging vendors such as Rockset, is to develop these data-intensive applications on a specialist, real-time analytic data platform specifically designed to meet the performance and agility requirements of data-intensive applications.
I recently noted that as demand for real-time interactive applications becomes more pervasive, the use of streaming data is becoming more mainstream. Streaming data and event processing has been part of the data landscape for many decades, but for much of that time, data streaming was a niche activity. Although adopted in industry segments with high-performance, real-time data processing and analytics requirements such as financial services and telecommunications, data streaming was far less common elsewhere. That has changed significantly in recent years, fueled by the proliferation of open-source and cloud-based streaming data and event technologies that have lowered the cost and technical barriers to developing new applications able to take advantage of data in-motion. This is a trend we expect to continue, to the extent that streaming data and event processing becomes an integral part of mainstream data-processing architectures.
I have recently written about the importance of healthy data pipelines to ensure data is integrated and processed in the sequence required to generate business intelligence, and the need for data pipelines to be agile in the context of real-time data processing requirements. Data engineers, who are responsible for monitoring, managing and maintaining data pipelines, are under increasing pressure to deliver high-performance and flexible data integration and processing pipelines that are capable of handling the rising volume and frequency of data. Automation is a potential solution to this challenge, and several vendors, such as Ascend.io, have emerged in recent years to reduce the manual effort involved in data engineering.
I recently explained how emerging application requirements were expanding the range of use cases for NoSQL databases, increasing adoption based on the availability of enhanced functionality. These intelligent applications require a close relationship between operational data platforms and the output of data science and machine learning projects. This ensures that machine learning and predictive analytics initiatives are not only developed and trained based on the relationships inherent in operational applications, but also that the resulting intelligence is incorporated into the operational application in real time to support capabilities such as personalization, recommendations and fraud detection. Graph databases already support operational use cases such as social media, fraud detection, customer experience management and recommendation engines. Graph database vendors such as Neo4j are increasingly focused on the role that graph databases can play in supporting data scientists, enabling them to develop, train and run algorithms and machine learning models on graph data in the graph database, rather than extracting it into a separate environment.
Streaming data has been part of the industry landscape for decades but has largely been focused on niche applications in segments with the highest real-time data processing and analytics performance requirements, such as financial services and telecommunications. As demand for real-time interactive applications becomes more pervasive, streaming data is becoming a more mainstream pursuit, aided by the proliferation of open-source streaming data and event technologies, which have lowered the cost and technical barriers to developing new applications that take advantage of data in motion. Ventana Research’s Streaming Data Dynamic Insights enables an organization to assess its relative maturity in achieving value from streaming data. I assert that by 2024, more than one-half of all organizations’ standard information architectures will include streaming data and event processing, allowing organizations to be more responsive and provide better customer experiences.
When joining Ventana Research, I noted that the need to be more data-driven has become a mantra among large and small organizations alike. Data-driven organizations stand to gain competitive advantage, responding faster to worker and customer demands for more innovative, data-rich applications and personalized experiences. Being data-driven is clearly something to aspire to. However, it is also a somewhat vague concept without clear definition. We know data-driven organizations when we see them — the likes of Airbnb, DoorDash, ING Bank, Netflix, Spotify, and Uber are often cited as examples — but it is not necessarily clear what separates the data-driven from the rest. Data has been used in decision-making processes for thousands of years, and no business operates without some form of data processing and analytics. As such, although many organizations may aspire to be more data-driven, identifying and defining the steps required to achieve that goal are not necessarily easy. In this Analyst Perspective, I will outline the four key traits that I believe are required for a company to be considered data-driven.
Topics: embedded analytics, Analytics, Business Intelligence, Data Governance, Data Integration, Data, Digital Technology, natural language processing, data lakes, AI and Machine Learning, data operations, Streaming Analytics, digital business, data platforms, Analytics & Data, Streaming Data & Events
I recently wrote about the growing range of use cases for which NoSQL databases can be considered, given increased breadth and depth of functionality available from providers of the various non-relational data platforms. As I noted, one category of NoSQL databases — graph databases — are inherently suitable for use cases that rely on relationships, such as social media, fraud detection and recommendation engines, since the graph data model represents the entities and values and also the relationships between them. The native representation of relationships can also be significant in surfacing “features” for use in machine learning modeling. There has been a concerted effort in recent years by graph database providers, including TigerGraph, to encourage and facilitate the use of graph databases by data scientists to support the development, testing and deployment of machine learning models.