Skip to content
Get Started

Completion & Generation

Completions are the layer beneath agents: the traits and types for sending a single request to a language model and handling what comes back. Rig layers this API so you can work at whatever altitude the task needs — one-line prompting at the top, full request control at the bottom — and every layer speaks the same Message and response types.

Start from what you need, not from the trait list:

You want…Use
a text answer to a one-off promptPrompt (.prompt(...))
a conversation that carries historyChat (.chat(...))
a typed struct instead of a stringTypedPrompt (.prompt_typed(...))
tokens as they arriveStreaming equivalents
to configure the request before it’s sentCompletion / CompletionRequestBuilder
to bypass the agent loop entirelyCompletionModel directly

Most applications should reach for an Agent, which implements the high-level traits and runs the agent loop for you. Drop down to a bare CompletionModel when you need control over individual requests — for example, writing your own loop that decides per tool result whether to return it or feed it back to the model.

Prompt is the simplest interface: one prompt in, one String out.

async fn prompt(&self, prompt: &str) -> Result<String, PromptError>;

Chat is the conversation-aware variant: it takes the prior messages alongside the new prompt, so the model can answer in context, and appends the new turn to the history you pass. (See Memory for how this relates to .with_history(...) — which does not append — and for automatic conversation memory.)

async fn chat(&self, prompt: impl Into<Message>, chat_history: &mut Vec<Message>) -> Result<String, PromptError>;

TypedPrompt returns deserialized structured data instead of a string. Give it a target type that derives serde::Deserialize and schemars::JsonSchema, and Rig constrains the model to answer with JSON matching that type’s schema:

use rig::schemars::JsonSchema;
use serde::Deserialize;
#[derive(Deserialize, JsonSchema)]
struct SentimentAnalysis {
/// The sentiment score from -1.0 to 1.0
score: f64,
/// The sentiment label
label: String,
}
let result: SentimentAnalysis = agent
.prompt_typed("Analyze the sentiment of: 'I love this product!'")
.await?;

If structured extraction is the whole job rather than one step, use an Extractor instead — the Structured Output page compares the two. The full TypedPrompt signature lives in the API reference.

Each of these traits has a streaming mirror (StreamingPrompt, StreamingChat, StreamingCompletion) that yields output incrementally — see Streaming.

Completion sits one level down: instead of sending a request, it hands you a CompletionRequestBuilder so you can adjust anything before dispatch. Fields pre-populated by the implementing type (an agent’s preamble, for instance) can be overwritten on the builder.

pub trait Completion<M: CompletionModel> {
/// Generates a completion request builder for the given `prompt` and `chat_history`.
fn completion(
&self,
prompt: &str,
chat_history: Vec<Message>,
) -> impl Future<Output = Result<CompletionRequestBuilder<M>, CompletionError>> + Send;
}

CompletionModel is the provider interface itself — the trait each LLM backend implements with a completion method (and a stream counterpart). You rarely implement it, but calling one directly is how you take full control of a single request. Create the model from a client, build a request, and send it:

use rig::client::{CompletionClient, ProviderClient};
use rig::providers::openai::Client;
let openai_client = Client::from_env()?;
let model = openai_client.completion_model("gpt-5.5");
let response = model
.completion_request("What is the Rust programming language?")
.preamble("You are a helpful assistant.".to_string())
.temperature(0.7)
.max_tokens(1000)
.send()
.await?;

The builder also accepts context documents(...) and tools(...) definitions. If you’d rather separate construction from dispatch, call .build() to get a CompletionRequest and pass it to CompletionModel::completion() yourself — .send() is just those two steps fused. The full trait (including the associated Response and StreamingResponse types providers must supply) is in the API reference.

A response wraps the model’s content along with the raw provider-specific payload:

pub struct CompletionResponse<T> {
/// One or more assistant content items (text, tool calls, reasoning, etc.)
pub choice: OneOrMany<AssistantContent>,
/// The raw response from the provider
pub raw_response: T,
}

raw_response keeps the provider’s untranslated payload — log it when debugging provider quirks that Rig’s normalized types don’t surface. (Models themselves are cheap handles over a shared client; build them once and reuse them across requests.)

choice holds AssistantContent values — the three things a model can answer with:

pub enum AssistantContent {
/// Plain text response
Text(Text),
/// A tool call requested by the model
ToolCall(ToolCall),
/// Reasoning/chain-of-thought content (for models that support it)
Reasoning(Reasoning),
}

Text wraps a string; ToolCall carries the call id plus the function name and JSON arguments the model chose. When you use an agent, tool calls are executed for you; at this layer you decide what to do with them.

The Message enum represents conversation messages with rich content support:

pub enum Message {
User { content: OneOrMany<UserContent> },
Assistant { content: OneOrMany<AssistantContent> },
}

UserContent supports text, images, audio, documents, video, and tool results:

pub enum UserContent {
Text(Text),
ToolResult(ToolResult),
Image(Image),
Audio(Audio),
Document(Document),
Video(Video),
}

For the common text-only case, use the Message::user(...) and Message::assistant(...) constructors rather than building the enum by hand.

Rig provides a Usage struct and the GetTokenUsage trait for tracking token consumption. Every completion response carries one, and the agent loop aggregates them across turns:

pub struct Usage {
/// Input ("prompt") tokens used by the request.
pub input_tokens: u64,
/// Output ("completion") tokens generated.
pub output_tokens: u64,
/// Stored separately — some providers only report a total.
pub total_tokens: u64,
/// Input tokens read from a provider-managed prompt cache.
pub cached_input_tokens: u64,
/// Input tokens written to a provider-managed prompt cache.
pub cache_creation_input_tokens: u64,
/// Tokens spent on tool-use prompts.
pub tool_use_prompt_tokens: u64,
/// Tokens spent on internal reasoning by reasoning-capable models.
pub reasoning_tokens: u64,
}

To read aggregated usage for a whole agent run (across every turn of the tool loop), call .extended_details() on a prompt request — see Token usage & run details on the Agents page. When implementing a provider, implement GetTokenUsage on your raw response type to expose these metrics.

Completion failures surface as CompletionError, which separates transport problems (HTTP, JSON) from provider-reported ones:

pub enum CompletionError {
HttpError(reqwest::Error),
JsonError(serde_json::Error),
RequestError(Box<dyn Error>),
ResponseError(String),
ProviderError(String),
}

Typed-output paths add StructuredOutputError, which wraps a PromptError (itself carrying a CompletionError, MaxTurnsError, or tool failure) or a deserialization failure. Error Handling covers telling transient errors from fatal ones and retrying safely.

Adding a new backend means implementing CompletionModel (and friends) for your provider type — the Extending Rig guide walks through it end to end.