Almost all organizations are investing in data science, or planning to, as they seek to encourage experimentation and exploration to identify new business challenges and opportunities as part of the drive toward creating a more data-driven culture. My colleague, David Menninger, has written about how organizations using artificial intelligence and machine learning (AI/ML) report gaining competitive advantage, improving customer experiences, responding faster to opportunities and threats, and improving the bottom line with increased sales and lower costs. One-quarter of participants (25%) in Ventana Research’s Analytics and Data Benchmark Research are already using AI/ML, while more than one-third (34%) plan to do so in the next year, and more than one-quarter (28%) plan to do so eventually. As organizations adopt data science and expand their analytics initiatives, they face no shortage of options for AI/ML capabilities. Understanding which is the most appropriate approach to take could be the difference between success and failure. The cloud providers all offer services, including general-purpose ML environments, as well as dedicated services for specific use cases, such as image detection or language translation. Software vendors also provide a range of products, both on-premises and in the cloud, including general-purpose ML platforms and specialist applications. Meanwhile, analytic data platform providers are increasingly adding ML capabilities to their offerings to provide additional value to customers and differentiate themselves from their competitors. There is no simple answer as to which is the best approach, but it is worth weighing the relative benefits and challenges. Looking at the options from the perspective of our analytic data platform expertise, the key choice is between AI/ML capabilities provided on a standalone basis or integrated into a larger data platform.
I have written recently about the similarities and differences between data mesh and data fabric. The two are potentially complementary. Data mesh is an organizational and cultural approach to data ownership, access and governance. Data fabric is a technical approach to automating data management and data governance in a distributed architecture. There are various definitions of data fabric, but key elements include a data catalog for metadata-driven data governance and self-service, agile data integration.
In their pursuit to be data-driven, organizations are collecting and managing more data than ever before as they attempt to gain competitive advantage and respond faster to worker and customer demands for more innovative, data-rich applications and personalized experiences. As data is increasingly spread across multiple data centers, clouds and regions, organizations need to manage data on multiple systems in different locations and bring it together for analysis. As the data volumes increase and more data sources and data types are introduced in the organization, it creates challenges to storing, managing, connecting and analyzing the huge set of information that is spread across multiple locations. Having a strong foundation and scalable data management architecture in place can help alleviate many of the challenges organizations face when they are scaling and adding more infrastructure. We have written about the potential for hybrid and multi-cloud platforms to safeguard data across heterogenous environments, which plays to the strengths of companies, such as Actian, that provide a single environment with the ability to integrate, manage and process data across multiple locations.
I have written a few times in recent months about vendors offering functionality that addresses data orchestration. This is a concept that has been growing in popularity in the past five years amid the rise of Data Operations (DataOps), which describes more agile approaches to data integration and data management. In a nutshell, data orchestration is the process of combining data from multiple operational data sources and preparing and transforming it for analysis. To those unfamiliar with the term, this may sound very much like the tasks that data management practitioners having been undertaking for decades. As such, it is fair to ask what separates data orchestration from traditional approaches to data management. Is it really something new that can deliver innovation and business value, or just the rebranding of existing practices designed to drive demand for products and services?
Ventana Research’s Data Lakes Dynamics Insights research illustrates that while data lakes are fulfilling their promise of enabling organizations to economically store and process large volumes of raw data, data lake environments continue to evolve. Data lakes were initially based primarily on Apache Hadoop deployed on-premises but are now increasingly based on cloud object storage. Adopters are also shifting from data lakes based on homegrown scripts and code to open standards and open formats, and they are beginning to embrace the structured data-processing functionality that supports data lakehouse capabilities. These trends are driving the evolution of vendor product offerings and strategies, as typified by Cloudera’s recent launch of Cloudera Data Platform (CDP) One, described as a data lakehouse software-as-a-service (SaaS) offering.
I have recently written about the organizational and cultural aspects of being data-driven, and the potential advantages data-driven organizations stand to gain by responding faster to worker and customer demands for more innovative, data-rich applications and personalized experiences. I have also explained that data-driven processes require more agile, continuous data processing, with an increased focus on extract, load and transform processes — as well as change data capture and automation and orchestration — as part of a DataOps approach to data management. Safeguarding the health of data pipelines is fundamental to ensuring data is integrated and processed in the sequence required to generate business intelligence. The significance of these data pipelines to delivering data-driven business strategies has led to the emergence of vendors, such as Astronomer, focused on enabling organizations to orchestrate data engineering pipelines and workflows.
The data catalog has become an integral component of organizational data strategies over the past decade, serving as a conduit for good data governance and facilitating self-service analytics initiatives. The data catalog has become so important, in fact, that it is easy to forget that just 10 years ago it did not exist in terms of a standalone product category. Metadata-based data management functionality has had a role to play within products for data governance and business intelligence for much longer than that, of course, but the emergence of the data catalog as a product category provided a platform for metadata-based data inventory and discovery that could span an entire organization, serving multiple departments, use cases and initiatives.
I have recently written about the importance of healthy data pipelines to ensure data is integrated and processed in the sequence required to generate business intelligence, and the need for data pipelines to be agile in the context of real-time data processing requirements. Data engineers, who are responsible for monitoring, managing and maintaining data pipelines, are under increasing pressure to deliver high-performance and flexible data integration and processing pipelines that are capable of handling the rising volume and frequency of data. Automation is a potential solution to this challenge, and several vendors, such as Ascend.io, have emerged in recent years to reduce the manual effort involved in data engineering.
When joining Ventana Research, I noted that the need to be more data-driven has become a mantra among large and small organizations alike. Data-driven organizations stand to gain competitive advantage, responding faster to worker and customer demands for more innovative, data-rich applications and personalized experiences. Being data-driven is clearly something to aspire to. However, it is also a somewhat vague concept without clear definition. We know data-driven organizations when we see them — the likes of Airbnb, DoorDash, ING Bank, Netflix, Spotify, and Uber are often cited as examples — but it is not necessarily clear what separates the data-driven from the rest. Data has been used in decision-making processes for thousands of years, and no business operates without some form of data processing and analytics. As such, although many organizations may aspire to be more data-driven, identifying and defining the steps required to achieve that goal are not necessarily easy. In this Analyst Perspective, I will outline the four key traits that I believe are required for a company to be considered data-driven.
Topics: embedded analytics, Analytics, Business Intelligence, Data Governance, Data Integration, Data, Digital Technology, natural language processing, data lakes, AI and Machine Learning, data operations, Streaming Analytics, digital business, data platforms, Analytics & Data, Streaming Data & Events
I previously described the concept of hydroanalytic data platforms, which combine the structured data processing and analytics acceleration capabilities associated with data warehousing with the low-cost and multi-structured data storage advantages of the data lake. One of the key enablers of this approach is interactive SQL query engine functionality, which facilitates the use of existing business intelligence (BI) and data science tools to analyze data in data lakes. Interactive SQL query engines have been in use for several years — many of the capabilities were initially used to accelerate analytics on Hadoop — but have evolved along with data lake initiatives to enable analysis of data in cloud object storage. The open source Presto project is one of the most prominent interactive SQL query engines and has been adopted by some of the largest digital-native organizations. Presto managed-services provider Ahana is on a mission to bring the advantages of Presto to the masses.