Structured Output

An Extractor turns unstructured text into a strongly-typed Rust value. You give it a target type, and Rig drives an LLM to parse text into that type with type-safe deserialization and almost no boilerplate — useful for pulling entities, fields, or records out of free-form input.

Minimal example

Any target type must derive serde::Deserialize, serde::Serialize, and schemars::JsonSchema. Build an extractor for that type from a client, then call extract.

use rig::client::ProviderClient;
use rig::providers::openai;

// Define the target structure
#[derive(serde::Deserialize, serde::Serialize, rig::schemars::JsonSchema)]
struct Person {
    name: Option<String>,
    age: Option<u8>,
    profession: Option<String>,
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let openai = openai::Client::from_env()?;
    let extractor = openai.extractor::<Person>("gpt-5.5").build();

    let person = extractor
        .extract("John Doe is a 30 year old doctor.")
        .await?;

    println!(
        "{} is a {}",
        person.name.unwrap_or_default(),
        person.profession.unwrap_or_default()
    );
    Ok(())
}

John Doe is a doctor

How it works

Under the hood an extractor combines an Agent with a private “submit” tool whose arguments are your target type. Rig generates a JSON schema from your struct (via schemars), the model calls the submit tool with data matching that schema, and Rig deserializes the tool arguments back into your type. Because the schema is derived at compile time, you get compile-time type checking and automatic schema generation for free.

Adding context and instructions

The extractor builder lets you steer the model with a custom preamble and extra context before building:

let extractor = openai
    .extractor::<Person>("gpt-5.5")
    .preamble("Extract person details with high precision.")
    .context("Ages are given in years; ignore honorifics like 'Dr.'")
    .build();

Error handling

extract returns an ExtractionError, which distinguishes the failure modes you’ll want to handle:

NoData — the model never called the submit tool, so nothing was extracted.
DeserializationError — the submitted JSON didn’t match your type.
PromptError — the underlying completion request failed.

use rig::extractor::ExtractionError;

match extractor.extract("...").await {
    Ok(person) => { /* use person */ }
    Err(ExtractionError::NoData) => {
        eprintln!("Model did not produce structured data");
    }
    Err(err) => return Err(err.into()),
}

Batch processing

Extractors are cheap to reuse across many inputs — build once, extract in a loop:

use rig::completion::CompletionModel;
use rig::extractor::{Extractor, ExtractionError};

async fn process_documents<M: CompletionModel, T>(
    extractor: &Extractor<M, T>,
    docs: Vec<String>,
) -> Vec<Result<T, ExtractionError>>
where
    T: serde::de::DeserializeOwned + serde::Serialize + rig::schemars::JsonSchema + Send + Sync,
{
    let mut results = Vec::new();
    for doc in docs {
        results.push(extractor.extract(&doc).await);
    }
    results
}

You can also feed extractors from document loaders, reading files and extracting structured records from each one:

use rig::loaders::FileLoader;

let docs = FileLoader::with_glob("*.txt")?.read().ignore_errors();
let extractor = openai.extractor::<Person>("gpt-5.5").build();

for doc in docs {
    let structured = extractor.extract(&doc).await?;
    // process structured
}

Extractor vs. `TypedPrompt`

An Extractor wraps an agent and a submit tool specifically for parsing text into a type. If you already have an Agent and just want a single structured response, the TypedPrompt trait gives you the same typed-output behavior directly on the agent. Reach for extractors when structured extraction is the whole job; reach for TypedPrompt when structured output is one step in a broader agent workflow.

Next steps

ToolsSee how the submit tool underneath extractors works, and expose your own.

EvalsMeasure extraction accuracy across a dataset instead of eyeballing single runs.

Text extraction & classificationA full guide to pulling structured data and labels out of raw text.

Previous
Tools Next
Streaming