Services for Organizations

Using our research, best practices and expertise, we help you understand how to optimize your business processes using applications, information and technology. We provide advisory, education, and assessment services to rapidly identify and prioritize areas for improvement and perform vendor selection

Consulting & Strategy Sessions

Ventana On Demand

    Services for Investment Firms

    We provide guidance using our market research and expertise to significantly improve your marketing, sales and product efforts. We offer a portfolio of advisory, research, thought leadership and digital education services to help optimize market strategy, planning and execution.

    Consulting & Strategy Sessions

    Ventana On Demand

      Services for Technology Vendors

      We provide guidance using our market research and expertise to significantly improve your marketing, sales and product efforts. We offer a portfolio of advisory, research, thought leadership and digital education services to help optimize market strategy, planning and execution.

      Analyst Relations

      Demand Generation

      Product Marketing

      Market Coverage

      Request a Briefing



        Matt Aslett's Analyst Perspectives

        << Back to Blog Index

        DataStax Adds Vector Search to Address Generative AI

        As I have previously explained, we expect an increased demand for intelligent operational applications infused with the results of analytic processes, such as personalization and artificial intelligence-driven recommendations. These systems rely on the analysis of data in the operational data platform to accelerate worker decision-making or improve customer experience.

        AI-driven intelligent applications require a new approach to data processing that enables real-time performance of machine learning on operational data to deliver instant, relevant information for accelerated decision-making. I explained last year how NoSQL database provider DataStax added streaming data capabilities to its portfolio to address the processing of data in motion and at rest and support the development of interactive, real-time, data-driven applications. Since then, the company has further expanded its portfolio with the addition of open-source temporal event processing for machine learning and vector search capabilities to support the development of generative AI applications.

        DataStax was founded in 2010 to build a business around the Apache Cassandra open-source distributed, non-relational database. The company is still best known as a provider of operational data platforms, both Ventana_Research_2023_Assertion_ODP_Hybrid_Processing_52_Son-premises and in the cloud, and has continued to contribute to the development of Apache Cassandra in addition to its DataStax Enterprise distribution. DataStax has also acquired capabilities for graph data processing as well as cloud management services, and in 2020 launched the Astra DB managed database-as-a-service offering. The company has expanded its purview further through two recent acquisitions: DataStax acquired messaging and event-streaming cloud service provider Kesque in January 2021, followed by the acquisition of machine learning specialist Kaskada in January 2023. Both acquisitions addressed growing requirements for real-time data processing and intelligence through a combination of operational and analytic processing. I assert that through 2026, operational data platform providers will continue to invest in hybrid operational and analytic processing capabilities to support growing demand for data-intensive intelligent operational applications.

        As a result of acquiring Kesque, DataStax added the ability to process streaming data using the Apache Pulsar open-source project to support the development of interactive, real-time, data-driven applications. The addition of Kaskada enhanced DataStax’s ability to support customers in the development of real-time AI applications. The company subsequently relicensed its machine learning feature engine as open-source software. More recently DataStax responded to the popularization of large language models and generative AI with the addition of vector search capabilities to its Astra DB database-as-a-service to complement LLMs with approved enterprise content and data.

        Data platforms continue to be DataStax’s primary focus, with the company offering Luna for Apache Cassandra, a commercial support subscription for open source Apache Cassandra. It also offers the DataStax Enterprise commercial distribution with added security and other enterprise features and the Astra DB database-as-a-service. For stream and event processing, DataStax offers Luna Streaming, a commercial support offering for Apache Pulsar, and the Astra Streaming managed service.

        The addition of Kaskada to DataStax’s product portfolio enables organizations with real-time data to adopt real-time AI through a combination of data platform, data streaming and AI/ML products and services. Kaskada is a unified batch and event-stream processing engine with a declarative query language that performs aggregations, joins and windowing to support analytics applications, dashboards and machine learning. Kaskada enables the processing of temporal event data, facilitating the development of applications providing real-time ML on event data. Luna ML is a commercial support offering for Kaskada Open Source, with additional Real-Time AI functionality and services from DataStax to help customers develop and deploy applications that maximize predictive and generative AI. Real-Time AI from DataStax provides additional capabilities to enable the development of AI applications, including the recently introduced vector search capabilities to support generative AI applications based on LLMs.

        Although we are at a very early stage of identifying enterprise use cases for generative AI, we expect adoption to grow rapidly. We assert that through 2025, one-quarter of organizations will deploy generative AI Ventana_Research_2023_Assertion_DigTech_Generative_AI_56_S-2embedded in one or more software applications. The ability to trust the output of generative AI models will be critical to their adoption by enterprises. There are multiple approaches to reducing accuracy and trust concerns, one of which is using vector embeddings and vector search to augment generic models with enterprise information and data.

        Vector embeddings are multi-dimensional mathematical representations of features or attributes of raw data, which could include text, images, audio or video. Vector search utilizes vector embeddings to perform similarity searches by enabling rapid identification and retrieval of similar or related data. Potential applications for vector search include natural language processing and recommendation systems that find and recommend products similar in function or style, either visually or based on written descriptions. Vector embeddings and vector search complement large language models to reduce accuracy and trust concerns by incorporating embeddings that represent approved enterprise content and data. Astra Vector Search utilizes DataStax’s Storage Attached Indexing to enable the creation of multiple secondary indexes on Astra DB database tables. DataStax has also released CassIO, an open-source library to integrate the Cassandra database with frameworks such as LangChain, making it easier for developers to access Cassandra’s capabilities, including vector search.

        DataStax has expanded its addressable market considerably in recent years by adding streaming data and machine learning capabilities and expertise to its existing data platform focus. The company must still work to organize these capabilities into a combined offering, but the expanded portfolio puts DataStax in a stronger position to compete to support the next generation of intelligent operational applications. I recommend that any organization considering options for data platform, streaming and operational AI include DataStax in evaluations. The addition of vector search to address generative AI use cases illustrates how the company continues to adopt its platform in response to evolving requirements and use cases.

        Regards,

        Matt Aslett

        Authors:

        Matt Aslett
        Director of Research, Analytics and Data

        Matt Aslett leads the software research and advisory for Analytics and Data at Ventana Research, now part of ISG, covering software that improves the utilization and value of information. His focus areas of expertise and market coverage include analytics, data intelligence, data operations, data platforms, and streaming and events.

        JOIN OUR COMMUNITY

        Our Analyst Perspective Policy

        • Ventana Research’s Analyst Perspectives are fact-based analysis and guidance on business, industry and technology vendor trends. Each Analyst Perspective presents the view of the analyst who is an established subject matter expert on new developments, business and technology trends, findings from our research, or best practice insights.

          Each is prepared and reviewed in accordance with Ventana Research’s strict standards for accuracy and objectivity and reviewed to ensure it delivers reliable and actionable insights. It is reviewed and edited by research management and is approved by the Chief Research Officer; no individual or organization outside of Ventana Research reviews any Analyst Perspective before it is published. If you have any issue with an Analyst Perspective, please email them to ChiefResearchOfficer@ventanaresearch.com

        View Policy

        Subscribe to Email Updates



        Analyst Perspectives Archive

        See All