Completion & Generation
Completions are the layer beneath agents: the traits and types for sending a
single request to a language model and handling what comes back. Rig layers this API so you can work
at whatever altitude the task needs — one-line prompting at the top, full request control at the
bottom — and every layer speaks the same Message and response types.
Choosing an interface
Section titled “Choosing an interface”Start from what you need, not from the trait list:
| You want… | Use |
|---|---|
| a text answer to a one-off prompt | Prompt (.prompt(...)) |
| a conversation that carries history | Chat (.chat(...)) |
| a typed struct instead of a string | TypedPrompt (.prompt_typed(...)) |
| tokens as they arrive | Streaming equivalents |
| to configure the request before it’s sent | Completion / CompletionRequestBuilder |
| to bypass the agent loop entirely | CompletionModel directly |
Most applications should reach for an Agent, which implements the high-level
traits and runs the agent loop for you. Drop down to a bare CompletionModel when you need
control over individual requests — for example, writing your own loop that decides per tool result
whether to return it or feed it back to the model.
The high-level traits
Section titled “The high-level traits”Prompt is the simplest interface: one prompt in, one String out.
async fn prompt(&self, prompt: &str) -> Result<String, PromptError>;Chat is the conversation-aware variant: it takes the prior messages alongside the new prompt,
so the model can answer in context, and appends the new turn to the history you pass. (See
Memory for how this relates to .with_history(...) — which does not append —
and for automatic conversation memory.)
async fn chat(&self, prompt: impl Into<Message>, chat_history: &mut Vec<Message>) -> Result<String, PromptError>;TypedPrompt returns deserialized structured data instead of a string. Give it a target type
that derives serde::Deserialize and schemars::JsonSchema, and Rig constrains the model to answer
with JSON matching that type’s schema:
use rig::schemars::JsonSchema;use serde::Deserialize;
#[derive(Deserialize, JsonSchema)]struct SentimentAnalysis { /// The sentiment score from -1.0 to 1.0 score: f64, /// The sentiment label label: String,}
let result: SentimentAnalysis = agent .prompt_typed("Analyze the sentiment of: 'I love this product!'") .await?;If structured extraction is the whole job rather than one step, use an
Extractor instead — the
Structured Output page compares the two. The
full TypedPrompt signature lives in the
API reference.
Each of these traits has a streaming mirror (StreamingPrompt, StreamingChat,
StreamingCompletion) that yields output incrementally — see Streaming.
Low-level control
Section titled “Low-level control”The Completion trait
Section titled “The Completion trait”Completion sits one level down: instead of sending a request, it hands you a
CompletionRequestBuilder so you can adjust anything before dispatch. Fields pre-populated by the
implementing type (an agent’s preamble, for instance) can be overwritten on the builder.
pub trait Completion<M: CompletionModel> { /// Generates a completion request builder for the given `prompt` and `chat_history`. fn completion( &self, prompt: &str, chat_history: Vec<Message>, ) -> impl Future<Output = Result<CompletionRequestBuilder<M>, CompletionError>> + Send;}Calling a CompletionModel directly
Section titled “Calling a CompletionModel directly”CompletionModel is the provider interface itself — the trait each LLM backend implements with a
completion method (and a stream counterpart). You rarely implement it, but calling one directly
is how you take full control of a single request. Create the model from a client, build a request,
and send it:
use rig::client::{CompletionClient, ProviderClient};use rig::providers::openai::Client;
let openai_client = Client::from_env()?;let model = openai_client.completion_model("gpt-5.5");
let response = model .completion_request("What is the Rust programming language?") .preamble("You are a helpful assistant.".to_string()) .temperature(0.7) .max_tokens(1000) .send() .await?;The builder also accepts context documents(...) and tools(...) definitions. If you’d rather
separate construction from dispatch, call .build() to get a CompletionRequest and pass it to
CompletionModel::completion() yourself — .send() is just those two steps fused. The full trait
(including the associated Response and StreamingResponse types providers must supply) is in the
API reference.
Responses
Section titled “Responses”CompletionResponse and AssistantContent
Section titled “CompletionResponse and AssistantContent”A response wraps the model’s content along with the raw provider-specific payload:
pub struct CompletionResponse<T> { /// One or more assistant content items (text, tool calls, reasoning, etc.) pub choice: OneOrMany<AssistantContent>, /// The raw response from the provider pub raw_response: T,}raw_response keeps the provider’s untranslated payload — log it when debugging provider quirks
that Rig’s normalized types don’t surface. (Models themselves are cheap handles over a shared
client; build them once and reuse them across requests.)
choice holds AssistantContent values — the three things a model can answer with:
pub enum AssistantContent { /// Plain text response Text(Text), /// A tool call requested by the model ToolCall(ToolCall), /// Reasoning/chain-of-thought content (for models that support it) Reasoning(Reasoning),}Text wraps a string; ToolCall carries the call id plus the function name and JSON arguments the
model chose. When you use an agent, tool calls are executed for you; at this layer you decide what to
do with them.
Messages
Section titled “Messages”The Message enum represents conversation messages with rich content support:
pub enum Message { User { content: OneOrMany<UserContent> }, Assistant { content: OneOrMany<AssistantContent> },}UserContent supports text, images, audio, documents, video, and tool results:
pub enum UserContent { Text(Text), ToolResult(ToolResult), Image(Image), Audio(Audio), Document(Document), Video(Video),}For the common text-only case, use the Message::user(...) and Message::assistant(...)
constructors rather than building the enum by hand.
Token usage
Section titled “Token usage”Rig provides a Usage struct and the GetTokenUsage trait for tracking token consumption. Every
completion response carries one, and the agent loop aggregates them across turns:
pub struct Usage { /// Input ("prompt") tokens used by the request. pub input_tokens: u64, /// Output ("completion") tokens generated. pub output_tokens: u64, /// Stored separately — some providers only report a total. pub total_tokens: u64, /// Input tokens read from a provider-managed prompt cache. pub cached_input_tokens: u64, /// Input tokens written to a provider-managed prompt cache. pub cache_creation_input_tokens: u64, /// Tokens spent on tool-use prompts. pub tool_use_prompt_tokens: u64, /// Tokens spent on internal reasoning by reasoning-capable models. pub reasoning_tokens: u64,}To read aggregated usage for a whole agent run (across every turn of the tool loop), call
.extended_details() on a prompt request — see
Token usage & run details on the Agents page. When
implementing a provider, implement GetTokenUsage on your raw response type to expose these metrics.
Errors
Section titled “Errors”Completion failures surface as CompletionError, which separates transport problems (HTTP, JSON)
from provider-reported ones:
pub enum CompletionError { HttpError(reqwest::Error), JsonError(serde_json::Error), RequestError(Box<dyn Error>), ResponseError(String), ProviderError(String),}Typed-output paths add StructuredOutputError, which wraps a PromptError (itself carrying a
CompletionError, MaxTurnsError, or tool failure) or a deserialization failure.
Error Handling covers telling transient errors from fatal ones and
retrying safely.
Provider integration
Section titled “Provider integration”Adding a new backend means implementing CompletionModel (and friends) for your provider type — the
Extending Rig guide walks through it end to end.
See also
Section titled “See also”- Agents — the layer above, with the agent loop
- Tools — what a
ToolCallresponse turns into - Streaming — the streaming mirrors of these traits
- Structured Output — extractors vs
TypedPrompt - Model Providers — supported backends
