Skip to content
Get Started

Vector Stores & RAG

Retrieval-Augmented Generation (RAG) retrieves relevant documents from a data store based on a query and includes them in an LLM prompt, grounding responses in factual information. It reduces hallucinations and lets a model use data that isn’t in its training set. Rig provides RAG building blocks — embeddings, vector stores, and RAG-enabled agents — out of the box.

Two concepts underpin RAG: embeddings and cosine similarity.

  • Embeddings are numerical vectors that carry semantic meaning, produced by an embedding model. Because meaning becomes geometry, you can mathematically measure how related two texts are.
  • Cosine similarity is the default metric for that measurement — a higher score means more semantic similarity. It’s cheap to compute and works well for finding related documents, which is why it’s the default for RAG, recommender, and hybrid-search systems.

A RAG pipeline has two phases:

  1. Ingestion. Split documents into chunks (fixed token sizes like 512–1000, or semantic boundaries like paragraphs), embed each chunk, and insert the embeddings — along with each chunk’s metadata — into a vector store. A store can be a database with a vector plugin (like pgvector) or a dedicated vector database.
  2. Retrieval. When a user asks a question, embed the query with the same model used for the documents (vectors from different models aren’t comparable), run a vector search, and include the top results and their metadata in the LLM prompt.

If you’re building a support bot or chatbot grounded in documentation you own, RAG is essential — grounding the model in up-to-date information rather than hoping it “remembers” carries real reputational risk. But for simple classification where the categories are already well known (“is this a cat or a dog?”), you probably don’t need it.

Rig exposes RAG through two traits:

  • VectorStoreIndex — search a store for documents relevant to a query.
  • InsertDocuments — insert embedded documents into a store.

Rig’s integrations primarily use cosine similarity to measure document similarity. An in-memory store ships by default (ideal for development and small-scale apps with no external dependencies); durable stores like LanceDB, MongoDB, Neo4j, PostgreSQL, Qdrant, and SurrealDB are available for production. See Vector Stores for the full list and setup details.

The quickest way to add RAG is to attach a vector store index to an agent with dynamic_context. The agent automatically retrieves relevant documents for each query and includes them in the context sent to the model.

use rig::client::{CompletionClient, EmbeddingsClient, ProviderClient};
use rig::completion::Prompt;
use rig::embeddings::EmbeddingsBuilder;
use rig::providers::openai::Client;
use rig::vector_store::in_memory_store::InMemoryVectorStore;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let openai_client = Client::from_env()?;
let embed_model = openai_client.embedding_model("text-embedding-3-small");
// Embed the source documents.
let embeddings = EmbeddingsBuilder::new(embed_model.clone())
.documents(vec![
"Rig is a Rust library for building LLM-powered applications.",
"RAG combines retrieval and generation for better accuracy.",
"Vector stores enable semantic search over documents.",
])?
.build()
.await?;
// Insert them into an in-memory store and build an index.
let mut vector_store = InMemoryVectorStore::default();
vector_store.add_documents(embeddings);
let index = vector_store.index(embed_model);
// Attach the index to an agent as dynamic context.
let agent = openai_client
.agent("gpt-5.5")
.preamble("You are a helpful assistant that answers questions using the provided context.")
.dynamic_context(2, index) // retrieve the top 2 relevant documents per query
.build();
let response = agent
.prompt("What is Rig and how does it help with LLM applications?")
.await?;
println!("{response}");
Ok(())
}
Rig is a Rust library for building LLM-powered applications. It helps by providing
ready-made building blocks — agents, embeddings, and vector stores — so you can add
capabilities like RAG (retrieval-augmented generation) without wiring the plumbing
yourself.

dynamic_context(n, index) configures the agent to retrieve n relevant documents for each query and inject them into the model’s context.

If you want the retrieved documents yourself rather than letting an agent inject them, query the index with a VectorSearchRequest and top_n:

use rig::vector_store::{VectorSearchRequest, VectorStoreIndex};
let req = VectorSearchRequest::builder()
.query("What is Rig?")
.samples(2)
.build();
let results = index.top_n::<String>(req).await?;
for (score, id, doc) in results {
println!("score={score} id={id} doc={doc}");
}
score=0.71 id=doc0 doc=Rig is a Rust library.
score=0.28 id=doc1 doc=RAG combines retrieval and generation for better accuracy.

Each result is a (score, id, document) tuple. From here you can feed the documents into a completion request. See the RAG system guide for the full retrieve-then-generate flow.

Modern agents can carry large tool lists, which wastes context and can degrade output. Tool RAG fixes this by storing tool definitions in a vector store and conditionally retrieving only the relevant ones at request time — saving both context budget and token cost.

Tools that should be retrievable implement the ToolEmbedding trait (in addition to Tool) and are registered as dynamic tools:

use rig::tool::ToolSet;
let toolset = ToolSet::builder().dynamic_tool(Adder).build();
// Embed the tool definitions so they can be searched semantically.
let embeddings = EmbeddingsBuilder::new(embed_model.clone())
.documents(toolset.schemas()?)?
.build()
.await?;
let vector_store =
InMemoryVectorStore::from_documents_with_id_f(embeddings, |tool| tool.name.clone());
let index = vector_store.index(embed_model);
let agent = openai_client
.agent("gpt-5.5")
.preamble("You are a calculator. Use the tools provided to answer the user's question.")
.dynamic_tools(2, index, toolset)
.build();

dynamic_tools(n, index, toolset) takes the max number of tools to retrieve, the index, and the toolset. At context-assembly time the agent uses RAG to fetch relevant tool definitions to send to the model; called tools are executed from the toolset.

Re-ranking re-orders initial search results for better relevance. Dedicated re-ranking models score results more deeply than vector search alone, often improving effectiveness dramatically. The fastembed crate provides a TextRerank type to re-rank your results.

Semantic search can miss results that contain a target term but aren’t ranked as “relevant.” Hybrid search combines full-text and semantic search: store documents in a regular database as well as a vector store, query both at retrieval time, and merge the result lists with a method like Reciprocal Rank Fusion or weighted scoring.

RAG is a useful basis for agentic memory — you can store conversation summaries, user- or company-specific facts, and chunked documents, then retrieve them by relevance.

RAG is a cornerstone of LLM application development, but it needs careful design:

  • Split context. Relevant information can span multiple chunks. Use overlapping chunks (a 10–20% overlap captures boundary context) or parent-child chunking (retrieve small chunks but pass the larger parent to the LLM).
  • Contradictory data. Filter by metadata to retrieve only relevant data, apply recency-based weighting, and weight by source authority (prefer official docs over community forums).
  • Stale data. Track created_at / last_updated fields, use versioning and TTL mechanisms, and monitor source data for changes to trigger re-embedding.