Data mesh is the latest trend to grip the data and analytics sector. The term has been rapidly adopted by numerous vendors — as well as a growing number of organizations —as a means of embracing distributed data processing. Understanding and adopting data mesh remains a challenge, however. Data mesh is not a product that can be acquired, or even a technical architecture that can be built. It is an organizational and cultural approach to data ownership, access and governance. Adopting data mesh requires cultural and organizational change. Data mesh promises multiple benefits to organizations that embrace this change, but doing so may be far from easy.
Topics: business intelligence, Analytics, Data Governance, Data Integration, Data, Digital Technology, Digital transformation, data lakes, data operations, digital business, data platforms, Analytics & Data, Streaming Data & Events
Despite widespread and increasing use of the cloud for data and analytics workloads, it has become clear in recent years that, for most organizations, a proportion of data-processing workloads will remain on-premises in centralized data centers or distributed-edge processing infrastructure. As we recently noted, as compute and storage are distributed across a hybrid and multi-cloud architecture, so, too, is the data it stores and relies upon. This presents challenges for organizations to identify, manage and analyze all the data that is available to them. It also presents opportunities for vendors to help alleviate that challenge. In particular, it provides a gap in the market for data-platform vendors to distinguish themselves from the various cloud providers with cloud-agnostic data platforms that can support data processing across hybrid IT, multi-cloud and edge environments (including Internet of Things devices, as well as servers and local data centers located close to the source of the data). Yellowbrick Data is one vendor that has seized upon that opportunity with its cloud Data Warehouse offering.
I recently examined how evolving functionality had fueled the adoption of NoSQL databases, recommending that organizations evaluate NoSQL databases when assessing options for data transformation and modernization efforts. This recommendation was based on the breadth and depth of functionality offered by NoSQL database providers today, which has expanded the range of use cases for which NoSQL databases are potentially viable. There remain a significant number of organizations that have not explored NoSQL databases as well as several workloads for which it is assumed NoSQL databases are inherently unsuitable. Given the advances in functionality, organizations would be well-advised to maintain up-to-date knowledge of available products and services and an understanding of the range of use cases for which NoSQL databases are a valid option.
The various NoSQL databases have become a staple of the data platforms landscape since the term entered the IT industry lexicon in 2009 to describe a new generation of non-relational databases. While NoSQL began as a ragtag collection of loosely affiliated, open-source database projects, several commercial NoSQL database providers are now established as credible alternatives to the various relational database providers, while all the major cloud providers and relational database giants now also have NoSQL database offerings. Almost one-quarter (22%) of respondents to Ventana Research’s Analytics and Data Benchmark Research are using NoSQL databases in production today, and adoption is likely to continue to grow. More than one-third (34%) of respondents are planning to adopt NoSQL databases within two years (21%) or are evaluating (14%) their potential use. Adoption has been accelerated by the evolving functionality offered by NoSQL products and services, the growing maturity of specialist NoSQL vendors, and new commercial offerings from cloud providers and established database providers alike. This evolution is exemplified by the changing meaning of the term NoSQL itself. While it was initially associated with a rejection of the relational database hegemony, it has retroactively been reinterpreted to mean “Not Only SQL,” reflecting the potential for these new databases to coexist with and complement established approaches.
As businesses become more data-driven, they are increasingly dependent on the quality of their data and the reliability of their data pipelines. Making decisions based on data does not guarantee success, especially if the business cannot ensure that the data is accurate and trustworthy. While there is potential value in capturing all data — good or bad — making decisions based on low-quality data may do more harm than good.
I recently described the emergence of hydroanalytic data platforms, outlining how the processes involved in generating energy from a lake or reservoir were analogous to those required to generate intelligence from a data lake. I explained how structured data processing and analytics acceleration capabilities are the equivalent of turbines, generators and transformers in a hydroelectric power station. While these capabilities are more typically associated with data warehousing, they are now being applied to data lake environments as well. Structured data processing and analytics acceleration capabilities are not the only things required to generate insights from data, however, and the hydroelectric power station analogy further illustrates this. For example, generating hydroelectric power also relies on pipelines to ensure that the water is transported from the lake or reservoir at the appropriate volume to drive the turbines. Ensuring that a hydroelectric power station is operating efficiently also requires the collection, monitoring and analysis of telemetry data to confirm that the turbines, generators, transformers and pipelines are functioning correctly. Similarly, generating intelligence from data relies on data pipelines that ensure the data is integrated and processed in the correct sequence to generate the required intelligence, while the need to monitor the pipelines and processes in data-processing and analytics environments has driven the emergence of a new category of software: data observability.
As I stated when joining Ventana Research, the socioeconomic impacts of the pandemic and its aftereffects have highlighted more than ever the differences between organizations that can turn data into insights and are agile enough to act upon it and those that are incapable of seeing or responding to the need for change. Data-driven organizations stand to gain competitive advantage, responding faster to worker and customer demands for more innovative, data-rich applications and personalized experiences. One of the key methods that accelerates business decision-making is reducing the lag between data collection and data analysis.
I recently described how the data platforms landscape will remain divided between analytic and operational workloads for the foreseeable future. Analytic data platforms are designed to store, manage, process and analyze data, enabling organizations to maximize data to operate with greater efficiency, while operational data platforms are designed to store, manage and process data to support worker-, customer- and partner-facing operational applications. At the same time, however, we see increased demand for intelligent applications infused with the results of analytic processes, such as personalization and artificial intelligence-driven recommendations. The need for real-time interactivity means that these applications cannot be served by traditional processes that rely on the batch extraction, transformation and loading of data from operational data platforms into analytic data platforms for analysis. Instead, they rely on analysis of data in the operational data platform itself via hybrid data processing capabilities to accelerate worker decision-making or improve customer experience.
Ventana Research recently announced its 2022 Market Agenda for Data, continuing the guidance we have offered for nearly two decades to help organizations derive optimal value and improve business outcomes.
Few trends have had a bigger impact on the data platforms landscape than the emergence of cloud computing. The adoption of cloud computing infrastructure as an alternative to on-premises datacenters has resulted in significant workloads being migrated to the cloud, displacing traditional server and storage vendors. Almost one-half (49%) of respondents to Ventana Research’s Analytics and Data Benchmark Research currently use cloud computing products for analytics and data, and a further one-quarter plan to do so. In addition to deploying data workloads on cloud infrastructure, many organizations have also adopted cloud data and analytics services offered by the same cloud providers, displacing traditional data platform vendors. Organizations now have greater choice in relation to potential products and providers for data and analytics workloads, but also need to think about integrating services offered by cloud providers with established technology and processes. Having pioneered the concept, Amazon Web Services has arguably benefitted more than most from adoption of cloud computing, and is also in the process of expanding and adjusting its portfolio to alleviate challenges and encourage even greater adoption.