Build a RAG system

Retrieval-augmented generation (RAG) grounds a language model in your own data: when a query arrives, you first retrieve the most relevant passages from a knowledge base, then hand them to the model alongside the question. The model answers from that context instead of relying solely on what it memorized during training, which reduces hallucination and lets it use up-to-date, private, or domain-specific information.

This guide builds a complete RAG system that extracts text from PDFs, embeds it, stores the vectors in memory, and wires the store into an agent as dynamic context — all in under 100 lines. For the concepts behind RAG (embeddings, similarity search, re-ranking, tool-RAG), see Vector Stores & RAG.

Project setup

Create a new project and add the dependencies:

cargo new rag_system
cd rag_system

[dependencies]
rig = "0.39.0"
tokio = { version = "1", features = ["full"] }
anyhow = "1"
pdf-extract = "0.7"

rig — the core Rig library.
tokio — async runtime.
anyhow — ergonomic error handling.
pdf-extract — pulls plain text out of PDF files.

Set your OpenAI API key:

export OPENAI_API_KEY=your_api_key_here

Extract text from PDFs

Rig works with plain text, so first turn each PDF into a String. A small helper wraps pdf_extract::extract_text and adds error context:

use anyhow::{Context, Result};
use pdf_extract::extract_text;
use std::path::Path;

fn load_pdf_content<P: AsRef<Path>>(file_path: P) -> Result<String> {
    extract_text(file_path.as_ref())
        .with_context(|| format!("Failed to extract text from PDF: {:?}", file_path.as_ref()))
}

Embed documents and build the store

Create an embedding model, embed each document with EmbeddingsBuilder, and load the results into an InMemoryVectorStore. The in-memory store is ideal for small to medium collections; for larger corpora, swap in a persistent store such as LanceDB, MongoDB, Neo4j, or Qdrant.

use rig::client::EmbeddingsClient;
use rig::embeddings::EmbeddingsBuilder;
use rig::providers::openai;
use rig::vector_store::in_memory_store::InMemoryVectorStore;

let openai_client = openai::Client::from_env()?;
let embedding_model = openai_client.embedding_model("text-embedding-3-small");

let pdf1 = load_pdf_content("documents/Moores_Law_for_Everything.pdf")?;
let pdf2 = load_pdf_content("documents/The_Last_Question.pdf")?;

let embeddings = EmbeddingsBuilder::new(embedding_model.clone())
    .document(pdf1)?
    .document(pdf2)?
    .build()
    .await?;

let vector_store = InMemoryVectorStore::from_documents(embeddings);

EmbeddingsBuilder::new(model).document(text)? queues a document for embedding; .build().await? calls the embedding API once for the whole batch. InMemoryVectorStore::from_documents builds the store directly from those embeddings.

Build the RAG agent

Turn the store into a searchable index, then attach it to an agent as dynamic context. On every prompt, the agent runs a vector search and injects the top matches into the model’s context automatically:

use rig::client::CompletionClient;

let index = vector_store.index(embedding_model);

let rag_agent = openai_client
    .agent("gpt-5.5")
    .preamble("You are a helpful assistant that answers questions using the provided PDF context.")
    .dynamic_context(2, index)
    .build();

dynamic_context(2, index) tells the agent to retrieve the two most relevant documents for each query. If you need to run searches yourself rather than through an agent, build a request and call top_n on the index:

use rig::vector_store::VectorSearchRequest;
use rig::vector_store::VectorStoreIndex;

let req = VectorSearchRequest::builder()
    .query("what did Sam Altman write?")
    .samples(2)
    .build();

// Returns (score, id, payload) tuples deserialized into your type.
let hits = index.top_n::<String>(req).await?;

Run it in a REPL

Rig ships a cli_chatbot helper that wraps any agent in an interactive prompt loop with chat history:

use rig::integrations::cli_chatbot::ChatBotBuilder;

let chatbot = ChatBotBuilder::new().agent(rag_agent).build();
chatbot.run().await?;

Full example

use anyhow::{Context, Result};
use pdf_extract::extract_text;
use rig::integrations::cli_chatbot::ChatBotBuilder;
use rig::client::{CompletionClient, EmbeddingsClient};
use rig::embeddings::EmbeddingsBuilder;
use rig::providers::openai;
use rig::vector_store::in_memory_store::InMemoryVectorStore;
use std::path::Path;

fn load_pdf_content<P: AsRef<Path>>(file_path: P) -> Result<String> {
    extract_text(file_path.as_ref())
        .with_context(|| format!("Failed to extract text from PDF: {:?}", file_path.as_ref()))
}

#[tokio::main]
async fn main() -> Result<()> {
    let openai_client = openai::Client::from_env()?;
    let embedding_model = openai_client.embedding_model("text-embedding-3-small");

    let pdf1 = load_pdf_content("documents/Moores_Law_for_Everything.pdf")?;
    let pdf2 = load_pdf_content("documents/The_Last_Question.pdf")?;

    let embeddings = EmbeddingsBuilder::new(embedding_model.clone())
        .document(pdf1)?
        .document(pdf2)?
        .build()
        .await?;

    let vector_store = InMemoryVectorStore::from_documents(embeddings);
    let index = vector_store.index(embedding_model);

    let rag_agent = openai_client
        .agent("gpt-5.5")
        .preamble("You are a helpful assistant that answers questions using the provided PDF context.")
        .dynamic_context(2, index)
        .build();

    let chatbot = ChatBotBuilder::new().agent(rag_agent).build();
    chatbot.run().await?;

    Ok(())
}

Place Moores_Law_for_Everything.pdf and The_Last_Question.pdf in a documents/ folder, then run:

cargo run

You now have a chatbot that answers questions grounded in your PDFs — summarizing a single document, analyzing themes, or drawing connections across both — because each response is built from the passages the vector search surfaces for that query.

Going to production

Persistent storage — replace InMemoryVectorStore with a dedicated vector store (LanceDB, MongoDB, Neo4j, Qdrant) for large collections.
Chunking — split large documents into passages before embedding so retrieval is more precise; see Loaders.
Model selection — gpt-5.5 gives stronger reasoning; use a cheaper model where quality allows.
Observability — Rig integrates with OpenTelemetry and Langfuse; see Observability.