Services for Organizations

Using our research, best practices and expertise, we help you understand how to optimize your business processes using applications, information and technology. We provide advisory, education, and assessment services to rapidly identify and prioritize areas for improvement and perform vendor selection

Consulting & Strategy Sessions

Ventana On Demand

    Services for Investment Firms

    We provide guidance using our market research and expertise to significantly improve your marketing, sales and product efforts. We offer a portfolio of advisory, research, thought leadership and digital education services to help optimize market strategy, planning and execution.

    Consulting & Strategy Sessions

    Ventana On Demand

      Services for Technology Vendors

      We provide guidance using our market research and expertise to significantly improve your marketing, sales and product efforts. We offer a portfolio of advisory, research, thought leadership and digital education services to help optimize market strategy, planning and execution.

      Analyst Relations

      Demand Generation

      Product Marketing

      Market Coverage

      Request a Briefing



        Matt Aslett's Analyst Perspectives

        << Back to Blog Index

        Vector Search and RAG Improve Trust in Generative AI

        I previously discussed the trust and accuracy limitations of large language models, suggesting that data and analytics vendors provide guidance about potentially inaccurate results and the risks of creating a misplaced level of trust. In the months that have followed, we are seeing some clarity from these vendors about the approaches organizations can take to increase trust and accuracy when developing applications that incorporate generative AI, including fine-tuning and prompt engineering. It is clear that one of the most important approaches will be the use of vector search to augment generative AI with context from enterprise content and data via a concept known as retrieval-augmented generation.

        Vector search and RAG have taken the data platform market by storm as providers of both operational and analytic data platforms position products to benefit from the huge surge of interest in generative AI. This focus helps organizations develop applications that combine generative AI and enterprise data. Vendors already supporting vector search have accelerated marketing efforts, while others have fast-tracked capabilities to store and process vectors. Before explaining the technical aspects of vector search and RAG, it is worth recapping some of the previously mentioned limitations of LLMs to understand why vector search and RAG are so important to help overcome them.  

        As my colleague Dave Menninger previously explained, generative AI creates content such as text, digital images, audio, video or even computer programs and models with artificial intelligence. We expect the adoption of Ventana_Research_2023_Assertion_DigTech_Generative_AI_56_Sgenerative AI to grow rapidly, asserting that through 2025, one-quarter of organizations will deploy generative AI embedded in one or more software applications.

        The large language models that enable text-based generative AI can increase productivity by improving natural language processing. However, they are not without fault. LLMs generate content that is grammatically valid rather than factually accurate. As a result, the content generated by LLMs can include factual inaccuracies such as fictitious data and source references. The reason is that foundation models only have “knowledge” of the information they are trained on. This could be enormous amounts of public information, but public LLMS do not have access to an organization’s private data and content. A public LLM can provide accurate responses about generic questions for which there is a large corpus of freely available information, but ask it a question that requires private data that it has not been trained on — for instance, about a particular company’s latest sales figures — and it will generate text that is plausible but has no basis in factual data.

        A useful analogy for thinking about the limitations of generative AI is human memory. Training and tuning a model’s foundational functionality are akin to creating the implicit memories humans use to carry out functions without conscious thought. An example is learning how to drive a car. Once the functional aspects of operating a vehicle have been embedded in implicit memory, people can drive a car without consciously thinking about how to do so.

        But, implicit knowledge of how to operate a vehicle is not enough to complete a journey. Knowing how to drive a car does not equate to knowing which routes to avoid when traveling from point A to point B. This is the job of conscious, explicit memory, which provides the context to complete the journey without making a wrong turn. If a driver makes the same journey enough times, knowledge of the route becomes an implicit memory, and they can do so almost without thinking about it. This is equivalent to tuning a foundation model using private data. A model trained on private data can be extremely effective at a specific task, but the result is a model that is finely tuned yet limited in scope. Knowing implicitly how to drive from point A to point B is not much use when your destination is point C.

        What is required is to augment foundation models with real-life data and context from enterprise information. One way of doing this is via prompt engineering, a process of providing context to the question as it is asked. Prompts can require the model to provide a response that matches a desired format or to provide specific data or information to be used in the response.

        Although prompt engineering augments the information on which LLMs have been trained, the augmentation is temporary. Since LLMs are stateless, the information contained within the prompt is not retained and needs to be provided every time the question is asked. Prompt engineering provides short-term context within the bounded scope of a single interaction. As such, it can be thought of as the equivalent of short-term working memory – used by the brain to retain information for a short period and soon forgotten if not transferred to long-term, conscious, explicit memory. An analogous example would be a driver remembering where they left their car keys.

        The equivalent of conscious, explicit memory for generative AI can be provided by augmenting foundation models with real-life data and context from enterprise information via vector search and RAG. Vectors — or vector embeddings — are multi-dimensional mathematical representations of features or attributes of raw data, including text, images, audio or video. Vector search utilizes vector embeddings to perform similarity searches by enabling rapid identification and retrieval of similar or related data.

        Vector search supports natural language processing and recommendation systems that find and recommend products similar in function or style, either visually or based on written descriptions. Vectors and vector search can also improve accuracy and trust with generative AI via RAG, which is the process of retrieving vector embeddings representing factually accurate and up-to-date information from a database and combining it with text automatically generated by the LLM. RAG provides an LLM with a constantly updated source of private data and information. It can provide the equivalent of knowing how to drive from point A to point B, plus any combination of routes.

        Whether RAG is best performed using a specialist vector database or a general-purpose database capable of storing and processing vectors is a matter of debate and a subject I will return to in the future. Either way, Ventana_Research_2023_Assertion_Data_GenAI_RAG_63_SI assert that through 2026, almost all organizations developing applications based on generative AI will explore vector search and retrieval-augmented generation to complement foundation models with proprietary data and content. It is likely that organizations will use a combination of approaches to improve trust and accuracy with generative AI, depending on the use case.

        I recommend that all organizations investigate the potential use cases for each approach and seek vendors that can assist in implementing fine-tuning, prompt engineering, vector search and RAG. Different tasks have different levels of reliance on long-term implicit memory, long-term explicit memory or short-term working memory. To complete a journey, a person needs to remember how to drive a car, where they put their keys and the best route to get to their destination. All of these are essential, but each is useless without the others.

        Regards,

        Matt Aslett

        Authors:

        Matt Aslett
        Director of Research, Analytics and Data

        Matt Aslett leads the software research and advisory for Analytics and Data at Ventana Research, now part of ISG, covering software that improves the utilization and value of information. His focus areas of expertise and market coverage include analytics, data intelligence, data operations, data platforms, and streaming and events.

        JOIN OUR COMMUNITY

        Our Analyst Perspective Policy

        • Ventana Research’s Analyst Perspectives are fact-based analysis and guidance on business, industry and technology vendor trends. Each Analyst Perspective presents the view of the analyst who is an established subject matter expert on new developments, business and technology trends, findings from our research, or best practice insights.

          Each is prepared and reviewed in accordance with Ventana Research’s strict standards for accuracy and objectivity and reviewed to ensure it delivers reliable and actionable insights. It is reviewed and edited by research management and is approved by the Chief Research Officer; no individual or organization outside of Ventana Research reviews any Analyst Perspective before it is published. If you have any issue with an Analyst Perspective, please email them to ChiefResearchOfficer@ventanaresearch.com

        View Policy

        Subscribe to Email Updates



        Analyst Perspectives Archive

        See All