I previously described the concept of hydroanalytic data platforms, which combine the structured data processing and analytics acceleration capabilities associated with data warehousing with the low-cost and multi-structured data storage advantages of the data lake. One of the key enablers of this approach is interactive SQL query engine functionality, which facilitates the use of existing business intelligence (BI) and data science tools to analyze data in data lakes. Interactive SQL query engines have been in use for several years — many of the capabilities were initially used to accelerate analytics on Hadoop — but have evolved along with data lake initiatives to enable analysis of data in cloud object storage. The open source Presto project is one of the most prominent interactive SQL query engines and has been adopted by some of the largest digital-native organizations. Presto managed-services provider Ahana is on a mission to bring the advantages of Presto to the masses.
I recently wrote about the potential benefits of data mesh. As I noted, data mesh is not a product that can be acquired, or even a technical architecture that can be built. It’s an organizational and cultural approach to data ownership, access and governance. While the concept of data mesh is agnostic to the technology used to implement it, technology is clearly an enabler for data mesh. For many organizations, new technological investment and evolution will be required to facilitate adoption of data mesh. Meanwhile, the concept of the data fabric, a technology-driven approach to managing and governing data across distributed environments, is rising in popularity. Although I previously touched on some of the technologies that might be applicable to data mesh, it is worth diving deeper into the data architecture implications of data mesh, and the potential overlap with data fabric.
I recently described the use cases driving interest in hybrid data processing capabilities that enable analysis of data in an operational data platform without impacting operational application performance or requiring data to be extracted to an external analytic data platform. Hybrid data processing functionality is becoming increasingly attractive to aid the development of intelligent applications infused with personalization and artificial intelligence-driven recommendations. These applications can be used to improve customer service; engagement, detect and prevent fraud; and increase operational efficiency. Several database providers now offer hybrid data processing capabilities to support these application requirements. One of the vendors addressing this opportunity is SingleStore.
I recently described how the operational data platforms sector is in a state of flux. There are multiple trends at play, including the increasing need for hybrid and multicloud data platforms, the evolution of NoSQL database functionality and applicable use-cases, and the drivers for hybrid data processing. The past decade has seen significant change in the emergence of new vendors, data models and architectures as well as new deployment and consumption approaches. As organizations adopted strategies to address these new options, a few things remained constant – one being the influence and importance of Oracle. The company’s database business continues to be a core focus of innovation, evolution and differentiation, even as it expanded its portfolio to address cloud applications and infrastructure.
I recently wrote about the importance of data pipelines and the role they play in transporting data between the stages of data processing and analytics. Healthy data pipelines are necessary to ensure data is integrated and processed in the sequence required to generate business intelligence. The concept of the data pipeline is nothing new of course, but it is becoming increasingly important as organizations adapt data management processes to be more data driven.
Topics: business intelligence, Analytics, Data Governance, Data Integration, Data, Digital Technology, Digital transformation, data lakes, AI and Machine Learning, data operations, digital business, data platforms, Analytics & Data, Streaming Data & Events
The various NoSQL databases have become a staple of the data platforms landscape since the term entered the IT industry lexicon in 2009 to describe a new generation of non-relational databases. While NoSQL began as a ragtag collection of loosely affiliated, open-source database projects, several commercial NoSQL database providers are now established as credible alternatives to the various relational database providers, while all the major cloud providers and relational database giants now also have NoSQL database offerings. Almost one-quarter (22%) of respondents to Ventana Research’s Analytics and Data Benchmark Research are using NoSQL databases in production today, and adoption is likely to continue to grow. More than one-third (34%) of respondents are planning to adopt NoSQL databases within two years (21%) or are evaluating (14%) their potential use. Adoption has been accelerated by the evolving functionality offered by NoSQL products and services, the growing maturity of specialist NoSQL vendors, and new commercial offerings from cloud providers and established database providers alike. This evolution is exemplified by the changing meaning of the term NoSQL itself. While it was initially associated with a rejection of the relational database hegemony, it has retroactively been reinterpreted to mean “Not Only SQL,” reflecting the potential for these new databases to coexist with and complement established approaches.
I recently described how the data platforms landscape will remain divided between analytic and operational workloads for the foreseeable future. Analytic data platforms are designed to store, manage, process and analyze data, enabling organizations to maximize data to operate with greater efficiency, while operational data platforms are designed to store, manage and process data to support worker-, customer- and partner-facing operational applications. At the same time, however, we see increased demand for intelligent applications infused with the results of analytic processes, such as personalization and artificial intelligence-driven recommendations. The need for real-time interactivity means that these applications cannot be served by traditional processes that rely on the batch extraction, transformation and loading of data from operational data platforms into analytic data platforms for analysis. Instead, they rely on analysis of data in the operational data platform itself via hybrid data processing capabilities to accelerate worker decision-making or improve customer experience.
Data lakes have enormous potential as a source of business intelligence. However, many early adopters of data lakes have found that simply storing large amounts of data in a data lake environment is not enough to generate business intelligence from that data. Similarly, lakes and reservoirs have enormous potential as sources of energy. However, simply storing large amounts of water in a lake is not enough to generate energy from that water. A hydroelectric power station is required to harness and unleash the power-generating potential of a lake or reservoir, utilizing a combination of turbines, generators and transformers to convert the energy of the flowing water into electricity. A hydroanalytic data platform, the data equivalent of a hydroelectric power station, is required to harness and unleash the intelligence-generating potential of a data lake.
As I noted when joining Ventana Research, the range of options faced by organizations in relation to data processing and analytics can be bewildering. When it comes to data platforms, however, there is one fundamental consideration that comes before all others: Is the workload primarily operational or analytic? Although most database products can be used for operational or analytic workloads, the market has been segmented between products targeting operational workloads, and those targeting analytic workloads for almost as long as there has been a database market.
Enterprises looking to adopt cloud-based data processing and analytics face a disorienting array of data storage, data processing, data management and analytics offerings. Departmental autonomy, shadow IT, mergers and acquisitions, and strategic choices mean that most enterprises now have the need to manage data across multiple locations, while each of the major cloud providers and data and analytics vendors has a portfolio of offerings that may or may not be available in any given location. As such, the ability to manage and process data across multiple clouds and data centers is a growing concern for large and small enterprises alike. Almost one-half (49%) of respondents to Ventana Research’s Analytics and Data Benchmark Research study are using cloud computing for analytics and data, of which 42% are currently using more than one cloud provider.