Build a RAG system
Retrieval-augmented generation (RAG) grounds a language model in your own data: when a query arrives, you first retrieve the most relevant passages from a knowledge base, then hand them to the model alongside the question. The model answers from that context instead of relying solely on what it memorized during training, which reduces hallucination and lets it use up-to-date, private, or domain-specific information.
This guide builds a complete RAG system that extracts text from PDFs, embeds it, stores the vectors in memory, and wires the store into an agent as dynamic context — all in under 100 lines. For the concepts behind RAG (embeddings, similarity search, re-ranking, tool-RAG), see Vector Stores & RAG.
Project setup
Section titled “Project setup”Create a new project and add the dependencies:
cargo new rag_systemcd rag_system[dependencies]rig = "0.39.0"tokio = { version = "1", features = ["full"] }anyhow = "1"pdf-extract = "0.7"rig— the core Rig library.tokio— async runtime.anyhow— ergonomic error handling.pdf-extract— pulls plain text out of PDF files.
Set your OpenAI API key:
export OPENAI_API_KEY=your_api_key_hereExtract text from PDFs
Section titled “Extract text from PDFs”Rig works with plain text, so first turn each PDF into a String. A small helper wraps
pdf_extract::extract_text and adds error context:
use anyhow::{Context, Result};use pdf_extract::extract_text;use std::path::Path;
fn load_pdf_content<P: AsRef<Path>>(file_path: P) -> Result<String> { extract_text(file_path.as_ref()) .with_context(|| format!("Failed to extract text from PDF: {:?}", file_path.as_ref()))}Embed documents and build the store
Section titled “Embed documents and build the store”Create an embedding model, embed each document with EmbeddingsBuilder, and load the results
into an InMemoryVectorStore. The in-memory store is ideal for small to medium collections; for
larger corpora, swap in a persistent store such as
LanceDB, MongoDB, Neo4j, or Qdrant.
use rig::client::EmbeddingsClient;use rig::embeddings::EmbeddingsBuilder;use rig::providers::openai;use rig::vector_store::in_memory_store::InMemoryVectorStore;
let openai_client = openai::Client::from_env()?;let embedding_model = openai_client.embedding_model("text-embedding-3-small");
let pdf1 = load_pdf_content("documents/Moores_Law_for_Everything.pdf")?;let pdf2 = load_pdf_content("documents/The_Last_Question.pdf")?;
let embeddings = EmbeddingsBuilder::new(embedding_model.clone()) .document(pdf1)? .document(pdf2)? .build() .await?;
let vector_store = InMemoryVectorStore::from_documents(embeddings);EmbeddingsBuilder::new(model).document(text)? queues a document for embedding; .build().await?
calls the embedding API once for the whole batch. InMemoryVectorStore::from_documents builds the
store directly from those embeddings.
Build the RAG agent
Section titled “Build the RAG agent”Turn the store into a searchable index, then attach it to an agent as dynamic context. On every prompt, the agent runs a vector search and injects the top matches into the model’s context automatically:
use rig::client::CompletionClient;
let index = vector_store.index(embedding_model);
let rag_agent = openai_client .agent("gpt-5.5") .preamble("You are a helpful assistant that answers questions using the provided PDF context.") .dynamic_context(2, index) .build();dynamic_context(2, index) tells the agent to retrieve the two most relevant documents for each
query. If you need to run searches yourself rather than through an agent, build a request and call
top_n on the index:
use rig::vector_store::VectorSearchRequest;use rig::vector_store::VectorStoreIndex;
let req = VectorSearchRequest::builder() .query("what did Sam Altman write?") .samples(2) .build();
// Returns (score, id, payload) tuples deserialized into your type.let hits = index.top_n::<String>(req).await?;Run it in a REPL
Section titled “Run it in a REPL”Rig ships a cli_chatbot helper that wraps any agent in an interactive prompt loop with chat
history:
use rig::integrations::cli_chatbot::ChatBotBuilder;
let chatbot = ChatBotBuilder::new().agent(rag_agent).build();chatbot.run().await?;Full example
Section titled “Full example”use anyhow::{Context, Result};use pdf_extract::extract_text;use rig::integrations::cli_chatbot::ChatBotBuilder;use rig::client::{CompletionClient, EmbeddingsClient};use rig::embeddings::EmbeddingsBuilder;use rig::providers::openai;use rig::vector_store::in_memory_store::InMemoryVectorStore;use std::path::Path;
fn load_pdf_content<P: AsRef<Path>>(file_path: P) -> Result<String> { extract_text(file_path.as_ref()) .with_context(|| format!("Failed to extract text from PDF: {:?}", file_path.as_ref()))}
#[tokio::main]async fn main() -> Result<()> { let openai_client = openai::Client::from_env()?; let embedding_model = openai_client.embedding_model("text-embedding-3-small");
let pdf1 = load_pdf_content("documents/Moores_Law_for_Everything.pdf")?; let pdf2 = load_pdf_content("documents/The_Last_Question.pdf")?;
let embeddings = EmbeddingsBuilder::new(embedding_model.clone()) .document(pdf1)? .document(pdf2)? .build() .await?;
let vector_store = InMemoryVectorStore::from_documents(embeddings); let index = vector_store.index(embedding_model);
let rag_agent = openai_client .agent("gpt-5.5") .preamble("You are a helpful assistant that answers questions using the provided PDF context.") .dynamic_context(2, index) .build();
let chatbot = ChatBotBuilder::new().agent(rag_agent).build(); chatbot.run().await?;
Ok(())}Place Moores_Law_for_Everything.pdf and The_Last_Question.pdf in a documents/ folder, then
run:
cargo runYou now have a chatbot that answers questions grounded in your PDFs — summarizing a single document, analyzing themes, or drawing connections across both — because each response is built from the passages the vector search surfaces for that query.
Going to production
Section titled “Going to production”- Persistent storage — replace
InMemoryVectorStorewith a dedicated vector store (LanceDB, MongoDB, Neo4j, Qdrant) for large collections. - Chunking — split large documents into passages before embedding so retrieval is more precise; see Loaders.
- Model selection —
gpt-5.5gives stronger reasoning; use a cheaper model where quality allows. - Observability — Rig integrates with OpenTelemetry and Langfuse; see Observability.
See also
Section titled “See also”- Vector Stores & RAG — the concepts behind retrieval.
- Embeddings — how
EmbeddingsBuilderand embedding models work. - Vector Stores — persistent store integrations.
rigon docs.rs — full API reference.
