Vector Search and RAG Improve Trust in Generative AI

Written by Matt Aslett | Oct 31, 2023 10:00:00 AM

I previously discussed the trust and accuracy limitations of large language models, suggesting that data and analytics vendors provide guidance about potentially inaccurate results and the risks of creating a misplaced level of trust. In the months that have followed, we are seeing some clarity from these vendors about the approaches organizations can take to increase trust and accuracy when developing applications that incorporate generative AI, including fine-tuning and prompt engineering. It is clear that one of the most important approaches will be the use of vector search to augment generative AI with context from enterprise content and data via a concept known as retrieval-augmented generation.

Vector search and RAG have taken the data platform market by storm as providers of both operational and analytic data platforms position products to benefit from the huge surge of interest in generative AI. This focus helps organizations develop applications that combine generative AI and enterprise data. Vendors already supporting vector search have accelerated marketing efforts, while others have fast-tracked capabilities to store and process vectors. Before explaining the technical aspects of vector search and RAG, it is worth recapping some of the previously mentioned limitations of LLMs to understand why vector search and RAG are so important to help overcome them.

As my colleague Dave Menninger previously explained, generative AI creates content such as text, digital images, audio, video or even computer programs and models with artificial intelligence. We expect the adoption of generative AI to grow rapidly, asserting that through 2025, one-quarter of organizations will deploy generative AI embedded in one or more software applications.

The large language models that enable text-based generative AI can increase productivity by improving natural language processing. However, they are not without fault. LLMs generate content that is grammatically valid rather than factually accurate. As a result, the content generated by LLMs can include factual inaccuracies such as fictitious data and source references. The reason is that foundation models only have “knowledge” of the information they are trained on. This could be enormous amounts of public information, but public LLMS do not have access to an organization’s private data and content. A public LLM can provide accurate responses about generic questions for which there is a large corpus of freely available information, but ask it a question that requires private data that it has not been trained on — for instance, about a particular company’s latest sales figures — and it will generate text that is plausible but has no basis in factual data.

A useful analogy for thinking about the limitations of generative AI is human memory. Training and tuning a model’s foundational functionality are akin to creating the implicit memories humans use to carry out functions without conscious thought. An example is learning how to drive a car. Once the functional aspects of operating a vehicle have been embedded in implicit memory, people can drive a car without consciously thinking about how to do so.

But, implicit knowledge of how to operate a vehicle is not enough to complete a journey. Knowing how to drive a car does not equate to knowing which routes to avoid when traveling from point A to point B. This is the job of conscious, explicit memory, which provides the context to complete the journey without making a wrong turn. If a driver makes the same journey enough times, knowledge of the route becomes an implicit memory, and they can do so almost without thinking about it. This is equivalent to tuning a foundation model using private data. A model trained on private data can be extremely effective at a specific task, but the result is a model that is finely tuned yet limited in scope. Knowing implicitly how to drive from point A to point B is not much use when your destination is point C.

What is required is to augment foundation models with real-life data and context from enterprise information. One way of doing this is via prompt engineering, a process of providing context to the question as it is asked. Prompts can require the model to provide a response that matches a desired format or to provide specific data or information to be used in the response.

Although prompt engineering augments the information on which LLMs have been trained, the augmentation is temporary. Since LLMs are stateless, the information contained within the prompt is not retained and needs to be provided every time the question is asked. Prompt engineering provides short-term context within the bounded scope of a single interaction. As such, it can be thought of as the equivalent of short-term working memory – used by the brain to retain information for a short period and soon forgotten if not transferred to long-term, conscious, explicit memory. An analogous example would be a driver remembering where they left their car keys.

The equivalent of conscious, explicit memory for generative AI can be provided by augmenting foundation models with real-life data and context from enterprise information via vector search and RAG. Vectors — or vector embeddings — are multi-dimensional mathematical representations of features or attributes of raw data, including text, images, audio or video. Vector search utilizes vector embeddings to perform similarity searches by enabling rapid identification and retrieval of similar or related data.

Vector search supports natural language processing and recommendation systems that find and recommend products similar in function or style, either visually or based on written descriptions. Vectors and vector search can also improve accuracy and trust with generative AI via RAG, which is the process of retrieving vector embeddings representing factually accurate and up-to-date information from a database and combining it with text automatically generated by the LLM. RAG provides an LLM with a constantly updated source of private data and information. It can provide the equivalent of knowing how to drive from point A to point B, plus any combination of routes.

Whether RAG is best performed using a specialist vector database or a general-purpose database capable of storing and processing vectors is a matter of debate and a subject I will return to in the future. Either way, I assert that through 2026, almost all organizations developing applications based on generative AI will explore vector search and retrieval-augmented generation to complement foundation models with proprietary data and content. It is likely that organizations will use a combination of approaches to improve trust and accuracy with generative AI, depending on the use case.

I recommend that all organizations investigate the potential use cases for each approach and seek vendors that can assist in implementing fine-tuning, prompt engineering, vector search and RAG. Different tasks have different levels of reliance on long-term implicit memory, long-term explicit memory or short-term working memory. To complete a journey, a person needs to remember how to drive a car, where they put their keys and the best route to get to their destination. All of these are essential, but each is useless without the others.

Regards,

Matt Aslett

View full post