Memory
Memory is an agent’s ability to retain and reuse information from earlier in a
conversation (and across conversations). Without it, every prompt starts from
scratch: a user asking “what’s the status of my order?” and then “cancel it”
would leave the model with no idea what “it” refers to. Rig gives you both ends
of the spectrum — attach a conversation-memory backend and history is loaded and
saved for you, or own the Vec<Message> yourself and decide exactly what the
model sees on each turn.
There are two layers to think about:
- Conversation history (short-term) — the messages of the current conversation, passed back to the model on every turn and bounded so it doesn’t outgrow the context window.
- Long-term memory — facts, observations, and user profiles that survive across sessions, usually stored in a database or vector store.
Automatic conversation memory
Section titled “Automatic conversation memory”The lowest-friction option: give the agent a memory backend and a conversation id, and Rig loads the stored history before each prompt and appends the new turn — including any tool calls and their results — after it succeeds.
use rig::memory::InMemoryConversationMemory;
let agent = openai .agent("gpt-5.5") .preamble("You are a helpful assistant.") .memory(InMemoryConversationMemory::new()) .build();
// Each conversation id keeps its own isolated history.let _ = agent.prompt("My name is Ada.").conversation("user-42").await?;let reply = agent.prompt("What's my name?").conversation("user-42").await?;println!("{reply}");Your name is Ada.A few rules govern how memory interacts with the rest of the prompt API:
- The conversation id can be set per request (
.conversation("...")) or as a builder default (AgentBuilder::conversation_id(...)). No id, no memory. - Passing explicit history with
.with_history(...)bypasses memory entirely for that request — nothing is loaded and nothing is saved. .without_memory()disables memory for a single request.
InMemoryConversationMemory lives in process memory, so it’s ideal for tests
and short-lived agents but forgets everything on restart. For durable sessions,
implement the ConversationMemory
trait over your own store. It has three async methods:
pub trait ConversationMemory { /// Load the full history for a conversation (empty Vec if none). async fn load(&self, conversation_id: &str) -> Result<Vec<Message>, MemoryError>; /// Append the new turn's messages after a successful prompt. async fn append(&self, conversation_id: &str, messages: Vec<Message>) -> Result<(), MemoryError>; /// Remove all stored messages for a conversation. async fn clear(&self, conversation_id: &str) -> Result<(), MemoryError>;}Message is Serialize/Deserialize, so persisting a conversation is ordinary
serde work — a JSON column per conversation id is a perfectly good first
backend. Keep append cheap: it runs inline before the agent returns its
response.
Bounding history with policies
Section titled “Bounding history with policies”Whatever the backend, raw history grows without bound — and long histories are
worse than just expensive. Models get distracted by stale, off-topic content,
and eventually the conversation exceeds the context window outright. The fix is
managed forgetting: shape what load returns so the model sees a bounded,
relevant window.
The rig-memory companion crate ships reusable
policies — add it alongside rig with cargo add rig-memory. The simplest
policy is a sliding window over the most recent messages:
use rig::memory::InMemoryConversationMemory;use rig_memory::{IntoFilter, SlidingWindowMemory};
let memory = InMemoryConversationMemory::new() .with_filter(SlidingWindowMemory::last_messages(20).into_filter());TokenWindowMemory bounds by estimated token cost instead of message count,
which tracks what you actually pay for:
use rig::memory::InMemoryConversationMemory;use rig_memory::{HeuristicTokenCounter, IntoFilter, TokenWindowMemory};
let memory = InMemoryConversationMemory::new().with_filter( TokenWindowMemory::new(4_000, HeuristicTokenCounter::openai()).into_filter(),);Both policies drop a leading orphaned tool result when its paired tool call is truncated away, since most providers reject unpaired tool results.
Truncation silently discards the dropped turns. Two composing adapters in
rig-memory turn that loss into something useful:
DemotingPolicyMemoryhands evicted messages to aDemotionHook, so you can archive them into a long-tail store (a vector store for semantic recall, cold storage for audit) instead of losing them.CompactingMemoryreplaces evicted messages with a summary artifact spliced back into the history, so the model keeps a compressed view of the whole conversation — the rolling-summary pattern:
use rig::memory::InMemoryConversationMemory;use rig_memory::{CompactingMemory, SlidingWindowMemory, TemplateCompactor};
let memory = CompactingMemory::new( InMemoryConversationMemory::new(), SlidingWindowMemory::last_messages(20), TemplateCompactor::new(), // deterministic textual rollup);
let agent = openai .agent("gpt-5.5") .preamble("You are a helpful assistant.") .memory(memory) .build();TemplateCompactor produces a plain-text rollup without any model call. For
higher-quality summaries, implement the
Compactor trait
with an LLM call — its carry_over parameter hands you the previous summary so
each compaction can fold in what came before. Compactors run inline on the load
path, so a slow one delays the agent’s next turn.
Managing history by hand
Section titled “Managing history by hand”When you want full control — custom storage, custom shaping, or no framework
involvement at all — own the history yourself. The history is a Vec<Message>,
where each exchange is stored as a Message:
use rig::completion::Message;use rig::OneOrMany;use rig::message::{AssistantContent, UserContent};
let mut conversation_history: Vec<Message> = Vec::new();
// Add a user messageconversation_history.push(Message::User { content: OneOrMany::one(UserContent::text( "Do you know what the weather is like today?", )),});
// Add the assistant's responseconversation_history.push(Message::Assistant { id: None, content: OneOrMany::one(AssistantContent::text( "I don't have access to real-time weather data...", )),});Rig also exposes the Message::user and Message::assistant constructors for
the common text-only case, which is what most conversation loops use:
history.push(Message::user("What is the Rust programming language?"));history.push(Message::assistant("Rust is a systems programming language..."));The helper below prompts an agent with the stored history and then records the new turn manually:
use rig::agent::Agent;use rig::completion::Message;use rig::prelude::*;
async fn call_agent_with_chat_history( prompt: &str, history: &mut Vec<Message>,) -> Result<String, Box<dyn std::error::Error>> { let openai_client = openai::Client::from_env()?;
let agent = openai_client .agent("gpt-5.5") .preamble("You are a helpful assistant. Be concise.") .name("Bob") // used in logging .build();
let response_text = agent.prompt(prompt).with_history(history.iter()).await?;
// As of Rig 0.38, `with_history` no longer appends the new turn to the // caller's message list, so we record it ourselves. history.push(Message::user(prompt)); history.push(Message::assistant(&response_text));
Ok(response_text)}For a complete interactive REPL built on this pattern, see Build a CLI chatbot.
Rolling your own compaction
Section titled “Rolling your own compaction”The same summarize-and-replace idea behind CompactingMemory is easy to apply
to a hand-managed history: once it grows past a threshold, ask the model for a
summary and start a fresh history seeded with it.
use rig::agent::Text;use rig::completion::{CompletionModel, Message};use rig::message::{AssistantContent, UserContent};
/// Summarize `history` and return a fresh history seeded with the summary.async fn compact_history<M: CompletionModel>( model: &M, history: &[Message],) -> Result<Vec<Message>, Box<dyn std::error::Error>> { // Render the conversation as plain text for the summarizer. let transcript = history .iter() .filter_map(|msg| match msg { Message::User { content } => content.iter().find_map(|c| match c { UserContent::Text(Text { text, .. }) => Some(format!("User: {text}")), _ => None, }), Message::Assistant { content, .. } => content.iter().find_map(|c| match c { AssistantContent::Text(Text { text, .. }) => Some(format!("Assistant: {text}")), _ => None, }), _ => None, }) .collect::<Vec<_>>() .join("\n");
let summary_prompt = format!( "Provide a concise summary of the following conversation, capturing \ key points, decisions, and context:\n\n{transcript}" );
let response = model.completion_request(&summary_prompt).send().await?; let AssistantContent::Text(Text { text, .. }) = response.choice.first() else { return Err("Model returned non-text response".into()); };
Ok(vec![Message::user(format!( "Context from the previous conversation:\n{text}" ))])}Trigger compaction however you like: after every turn, on a schedule, or once a
token budget is exceeded (which requires a tokenizer — or rig-memory’s
HeuristicTokenCounter — to estimate costs). Alternatively, fold the summary
into the system prompt instead of the message list; both work, but keep it to
one place so the model doesn’t see the summary twice.
Long-term memory
Section titled “Long-term memory”Bounded history keeps a single conversation healthy, but many applications need memory that survives across sessions. The common strategies:
- Conversation observations — insights extracted from an exchange (decisions made, open questions, topics of strong interest).
- User observations / profile — persistent facts about the user (stated preferences, context like location, preferred communication style). Keep these separate from conversation history, update them incrementally, and re-verify before use.
- Grounded facts — objectively verifiable data pulled from external sources during a session (retrieved documents, computed results, API responses), stored with their source and timestamp.
The mechanics are the same for all three: after a significant exchange, use an
extractor or a plain prompt to distill the relevant
information, then persist it. When a new conversation starts, retrieve the most
relevant items and add them to the system prompt or the opening messages. For
semantic retrieval — embedding memories and fetching the closest matches — use a
vector store as described in Vector Stores & RAG. A
DemotionHook (above) is a natural place to feed evicted conversation turns
into such a store.
See also
Section titled “See also”- Agents — where memory,
.with_history(...), andchatplug in. - Vector Stores & RAG — retrieve long-term memories semantically.
- Build a CLI chatbot — a working conversational loop.
- Observability — trace what your agent remembers each turn.
