Memory

Memory is an agent’s ability to retain and reuse information from earlier in a conversation (and across conversations). Without it, every prompt starts from scratch: a user asking “what’s the status of my order?” and then “cancel it” would leave the model with no idea what “it” refers to. Rig gives you both ends of the spectrum — attach a conversation-memory backend and history is loaded and saved for you, or own the Vec<Message> yourself and decide exactly what the model sees on each turn.

There are two layers to think about:

Conversation history (short-term) — the messages of the current conversation, passed back to the model on every turn and bounded so it doesn’t outgrow the context window.
Long-term memory — facts, observations, and user profiles that survive across sessions, usually stored in a database or vector store.

Automatic conversation memory

The lowest-friction option: give the agent a memory backend and a conversation id, and Rig loads the stored history before each prompt and appends the new turn — including any tool calls and their results — after it succeeds.

use rig::memory::InMemoryConversationMemory;

let agent = openai
    .agent("gpt-5.5")
    .preamble("You are a helpful assistant.")
    .memory(InMemoryConversationMemory::new())
    .build();

// Each conversation id keeps its own isolated history.
let _ = agent.prompt("My name is Ada.").conversation("user-42").await?;
let reply = agent.prompt("What's my name?").conversation("user-42").await?;
println!("{reply}");

Your name is Ada.

A few rules govern how memory interacts with the rest of the prompt API:

The conversation id can be set per request (.conversation("...")) or as a builder default (AgentBuilder::conversation_id(...)). No id, no memory.
Passing explicit history with .with_history(...) bypasses memory entirely for that request — nothing is loaded and nothing is saved.
.without_memory() disables memory for a single request.

InMemoryConversationMemory lives in process memory, so it’s ideal for tests and short-lived agents but forgets everything on restart. For durable sessions, implement the ConversationMemory trait over your own store. It has three async methods:

pub trait ConversationMemory {
    /// Load the full history for a conversation (empty Vec if none).
    async fn load(&self, conversation_id: &str) -> Result<Vec<Message>, MemoryError>;
    /// Append the new turn's messages after a successful prompt.
    async fn append(&self, conversation_id: &str, messages: Vec<Message>) -> Result<(), MemoryError>;
    /// Remove all stored messages for a conversation.
    async fn clear(&self, conversation_id: &str) -> Result<(), MemoryError>;
}

Message is Serialize/Deserialize, so persisting a conversation is ordinary serde work — a JSON column per conversation id is a perfectly good first backend. Keep append cheap: it runs inline before the agent returns its response.

Bounding history with policies

Whatever the backend, raw history grows without bound — and long histories are worse than just expensive. Models get distracted by stale, off-topic content, and eventually the conversation exceeds the context window outright. The fix is managed forgetting: shape what load returns so the model sees a bounded, relevant window.

The rig-memory companion crate ships reusable policies — add it alongside rig with cargo add rig-memory. The simplest policy is a sliding window over the most recent messages:

use rig::memory::InMemoryConversationMemory;
use rig_memory::{IntoFilter, SlidingWindowMemory};

let memory = InMemoryConversationMemory::new()
    .with_filter(SlidingWindowMemory::last_messages(20).into_filter());

TokenWindowMemory bounds by estimated token cost instead of message count, which tracks what you actually pay for:

use rig::memory::InMemoryConversationMemory;
use rig_memory::{HeuristicTokenCounter, IntoFilter, TokenWindowMemory};

let memory = InMemoryConversationMemory::new().with_filter(
    TokenWindowMemory::new(4_000, HeuristicTokenCounter::openai()).into_filter(),
);

Both policies drop a leading orphaned tool result when its paired tool call is truncated away, since most providers reject unpaired tool results.

Truncation silently discards the dropped turns. Two composing adapters in rig-memory turn that loss into something useful:

DemotingPolicyMemory hands evicted messages to a DemotionHook, so you can archive them into a long-tail store (a vector store for semantic recall, cold storage for audit) instead of losing them.
CompactingMemory replaces evicted messages with a summary artifact spliced back into the history, so the model keeps a compressed view of the whole conversation — the rolling-summary pattern:

use rig::memory::InMemoryConversationMemory;
use rig_memory::{CompactingMemory, SlidingWindowMemory, TemplateCompactor};

let memory = CompactingMemory::new(
    InMemoryConversationMemory::new(),
    SlidingWindowMemory::last_messages(20),
    TemplateCompactor::new(), // deterministic textual rollup
);

let agent = openai
    .agent("gpt-5.5")
    .preamble("You are a helpful assistant.")
    .memory(memory)
    .build();

TemplateCompactor produces a plain-text rollup without any model call. For higher-quality summaries, implement the Compactor trait with an LLM call — its carry_over parameter hands you the previous summary so each compaction can fold in what came before. Compactors run inline on the load path, so a slow one delays the agent’s next turn.

Managing history by hand

When you want full control — custom storage, custom shaping, or no framework involvement at all — own the history yourself. The history is a Vec<Message>, where each exchange is stored as a Message:

use rig::completion::Message;
use rig::OneOrMany;
use rig::message::{AssistantContent, UserContent};

let mut conversation_history: Vec<Message> = Vec::new();

// Add a user message
conversation_history.push(Message::User {
    content: OneOrMany::one(UserContent::text(
        "Do you know what the weather is like today?",
    )),
});

// Add the assistant's response
conversation_history.push(Message::Assistant {
    id: None,
    content: OneOrMany::one(AssistantContent::text(
        "I don't have access to real-time weather data...",
    )),
});

Rig also exposes the Message::user and Message::assistant constructors for the common text-only case, which is what most conversation loops use:

history.push(Message::user("What is the Rust programming language?"));
history.push(Message::assistant("Rust is a systems programming language..."));

The helper below prompts an agent with the stored history and then records the new turn manually:

use rig::agent::Agent;
use rig::completion::Message;
use rig::prelude::*;

async fn call_agent_with_chat_history(
    prompt: &str,
    history: &mut Vec<Message>,
) -> Result<String, Box<dyn std::error::Error>> {
    let openai_client = openai::Client::from_env()?;

    let agent = openai_client
        .agent("gpt-5.5")
        .preamble("You are a helpful assistant. Be concise.")
        .name("Bob") // used in logging
        .build();

    let response_text = agent.prompt(prompt).with_history(history.iter()).await?;

    // As of Rig 0.38, `with_history` no longer appends the new turn to the
    // caller's message list, so we record it ourselves.
    history.push(Message::user(prompt));
    history.push(Message::assistant(&response_text));

    Ok(response_text)
}

For a complete interactive REPL built on this pattern, see Build a CLI chatbot.

Rolling your own compaction

The same summarize-and-replace idea behind CompactingMemory is easy to apply to a hand-managed history: once it grows past a threshold, ask the model for a summary and start a fresh history seeded with it.

use rig::agent::Text;
use rig::completion::{CompletionModel, Message};
use rig::message::{AssistantContent, UserContent};

/// Summarize `history` and return a fresh history seeded with the summary.
async fn compact_history<M: CompletionModel>(
    model: &M,
    history: &[Message],
) -> Result<Vec<Message>, Box<dyn std::error::Error>> {
    // Render the conversation as plain text for the summarizer.
    let transcript = history
        .iter()
        .filter_map(|msg| match msg {
            Message::User { content } => content.iter().find_map(|c| match c {
                UserContent::Text(Text { text, .. }) => Some(format!("User: {text}")),
                _ => None,
            }),
            Message::Assistant { content, .. } => content.iter().find_map(|c| match c {
                AssistantContent::Text(Text { text, .. }) => Some(format!("Assistant: {text}")),
                _ => None,
            }),
            _ => None,
        })
        .collect::<Vec<_>>()
        .join("\n");

    let summary_prompt = format!(
        "Provide a concise summary of the following conversation, capturing \
         key points, decisions, and context:\n\n{transcript}"
    );

    let response = model.completion_request(&summary_prompt).send().await?;
    let AssistantContent::Text(Text { text, .. }) = response.choice.first() else {
        return Err("Model returned non-text response".into());
    };

    Ok(vec![Message::user(format!(
        "Context from the previous conversation:\n{text}"
    ))])
}

Trigger compaction however you like: after every turn, on a schedule, or once a token budget is exceeded (which requires a tokenizer — or rig-memory’s HeuristicTokenCounter — to estimate costs). Alternatively, fold the summary into the system prompt instead of the message list; both work, but keep it to one place so the model doesn’t see the summary twice.

Long-term memory

Bounded history keeps a single conversation healthy, but many applications need memory that survives across sessions. The common strategies:

Conversation observations — insights extracted from an exchange (decisions made, open questions, topics of strong interest).
User observations / profile — persistent facts about the user (stated preferences, context like location, preferred communication style). Keep these separate from conversation history, update them incrementally, and re-verify before use.
Grounded facts — objectively verifiable data pulled from external sources during a session (retrieved documents, computed results, API responses), stored with their source and timestamp.

The mechanics are the same for all three: after a significant exchange, use an extractor or a plain prompt to distill the relevant information, then persist it. When a new conversation starts, retrieve the most relevant items and add them to the system prompt or the opening messages. For semantic retrieval — embedding memories and fetching the closest matches — use a vector store as described in Vector Stores & RAG. A DemotionHook (above) is a natural place to feed evicted conversation turns into such a store.

Next steps

AgentsAttach a memory backend to an agent and prompt it with conversation ids.

Vector Stores & RAGEmbed long-term memories and retrieve the most relevant ones semantically.

TestingMock completions to test compaction and history handling without live API calls.

Previous
Vector Stores & RAG Next
Workflows