Matt Aslett's Analyst Perspectives

Cloudera Facilitates Versatility with Enterprise Data and AI

Written by Matt Aslett | Oct 3, 2023 10:00:00 AM

The data platforms market may appear to have little or nothing to do with haute couture, but it is one of the data sectors most strongly influenced by the fickle finger of fashion. In recent years, various architectural approaches to data storage and processing have enjoyed a phase in the limelight, including data warehouse, data mart, data hub, data lake, cloud data warehouse, object storage, data lakehouse, data fabric and data mesh. These approaches are often heralded as the next big thing, set to overshadow those that came before. In practice, they are typically complementary, with most organizations employing a combination of products with different architectures to achieve business goals. Data platform vendors such as Cloudera need to build on trusted customer relationships to sustain business, even as fashions come and go.

Cloudera was founded in 2008 to build a business around the Apache Hadoop data-processing framework. It enjoyed a rapid rise thanks to high levels of interest in the Hadoop project and big data, establishing itself as a primary data platform provider for Fortune 500 companies in financial services, retail, healthcare, telecommunications, manufacturing and energy/utilities along with government.

Cloudera merged with its primary rival, Hortonworks, in 2019 amid market consolidation and a shift of trends toward cloud-based data processing and support for cloud object storage. It made its debut on the New York Stock Exchange in 2017. Cloudera was acquired by investment firms Clayton, Dubilier & Rice and KKR for $5.3 billion in June 2021, providing the privacy to transition its customer base to public and private cloud offerings outside the glare of public markets.

The company has also highlighted the hybrid nature of its Cloudera Data Platform offering, providing versatility to support multiple workloads, deployment locations and architectural approaches. CDP can serve as an operational and analytic data platform, with functionality to address data engineering, streaming data and analytics and machine learning and artificial intelligence, including generative AI. It is available on-premises, in the cloud and across hybrid infrastructure. And while Cloudera might be typically deployed as a data lakehouse, the company is keen to highlight to existing and potential customers that its security, governance, and management capabilities provide a unified data fabric that facilitates the adoption of CDP as part of a data mesh.

As one of the key vendors associated with Apache Hadoop and big data, Cloudera was undeniably a beneficiary of fashion in the data platform market during its early years. The trend toward big data enabled the company to establish itself as a primary data provider for numerous customers, especially large enterprises in financial services, healthcare, telecommunication, government, retail and utilities. Most of the company’s early deployments were on-premises on physical server architecture.

The shift to the cloud had a fundamental impact on Cloudera’s product and customer base: Today, most of Cloudera’s customers – and the vast majority of its recurring revenue – are associated with the CDP offering, which is available on private and public cloud infrastructure. CDP Private Cloud is available for deployment on virtual private cloud infrastructure, and is comprised of the CDP Private Cloud Base foundation layer and data services to address data engineering, data warehouse and machine learning workloads. CDP Public Cloud is available on Amazon Web Service, Microsoft Azure and Google Cloud Platform and offers additional services to address data movement, stream processing, data hub and operational database workloads, in addition to engineering, data warehouse and machine learning.

Support for hybrid cloud architecture is important for Cloudera, with many of its rivals offering only public cloud services. More than one-half (52%) of participants in Ventana Research’s Analytics and Data Benchmark Research have a hybrid architecture involving on-premises and cloud infrastructure. Earlier this year, Cloudera enhanced CDP’s functionality for monitoring the management of data, applications and infrastructure across public and private clouds with the general availability of Cloudera Observability, which provides functionality for system and service monitoring as well as workload optimization and financial governance.

CDP is associated with the lakehouse architecture thanks to its support for object storage, data governance, table formats and query engine functionality. However, Cloudera’s underlying SDX security and governance technologies provide capabilities associated with a unified data fabric for managing and governing data across distributed environments, while the combination of SDX with data movement and data catalog capabilities supports the data mesh concept to facilitate the sharing, discovery and self-service access to data products. I have described the similarities and differences between data fabric and data mesh and see growing interest in both. I assert that by 2026, more than one-half of organizations will adopt technologies to facilitate the delivery of data as a product while adapting cultural and organizational approaches to data ownership in the context of data mesh.

Cloudera has also recently taken steps to enhance CDP’s support for generative AI – in particular, the role enterprise data plays in improving trust in the output of large language models. The company has outlined its vision of layering support for generative AI models and applications on CDP for enhanced customer support. It recently launched LLM Chatbot Augmented with Enterprise Data as an addition to its catalog of Applied Machine Learning Prototypes, providing a framework for accelerating the development, deployment and monitoring of enterprise ML applications.

Designer Coco Chanel is credited with saying, "Fashion changes, but style endures." It’s a concept that Cloudera and other data platform vendors are no doubt conscious of while navigating the multiple trends influencing the data sector. The fashion for Hadoop and big data established Cloudera as a major player in the data platform sector, but thanks to its embrace of the cloud and investment in management, security and governance technologies, CDP is well-placed to play an integral part in data fabric and data mesh. Cloudera has been less vocal to date than some of its competitors about generative AI, but the company is taking a mature and considered approach to the enterprise applicability of LLMs. I recommend that any organization looking for a trusted partner to develop a strategy to take advantage of generative AI include Cloudera in its evaluations.

Regards,

Matt Aslett