Text extraction & classification

Classification assigns text to predefined categories (sentiment, topic, spam), while extraction pulls structured fields out of free-form text (named entities, key-value pairs). Both map cleanly onto Rig’s Extractor: you define a Rust type for the result, and Rig drives an LLM to produce a value of that type with type-safe deserialization.

This guide builds up from a sentiment classifier to a combined analyzer that classifies and extracts in a single call.

Project setup

cargo new text_analysis
cd text_analysis

[dependencies]
rig = "0.39.0"
tokio = { version = "1", features = ["full"] }
anyhow = "1"
serde = { version = "1", features = ["derive"] }
schemars = "0.8"

Set your OpenAI API key:

export OPENAI_API_KEY=your_api_key_here

Classify text

Any extractor target must derive serde::Deserialize, serde::Serialize, and schemars::JsonSchema. Rig turns the type into a JSON schema, the model fills it in, and Rig deserializes the result back into your struct.

use anyhow::Result;
use rig::client::ProviderClient;
use rig::providers::openai;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};

#[derive(Debug, Deserialize, Serialize, JsonSchema)]
enum Sentiment {
    Positive,
    Negative,
    Neutral,
}

#[derive(Debug, Deserialize, Serialize, JsonSchema)]
struct SentimentClassification {
    sentiment: Sentiment,
    confidence: f32,
}

#[tokio::main]
async fn main() -> Result<()> {
    let openai_client = openai::Client::from_env()?;

    let classifier = openai_client
        .extractor::<SentimentClassification>("gpt-5.5")
        .preamble("You are a sentiment analysis AI. Classify the sentiment of the given text.")
        .build();

    let text = "I absolutely loved the new restaurant. The food was amazing!";
    let result = classifier.extract(text).await?;

    println!("Sentiment: {:?} ({:.2})", result.sentiment, result.confidence);
    Ok(())
}

extractor::<T>(model) returns a builder, .preamble(...) sets the model’s instructions, and .build() produces the extractor. .extract(text).await? runs it and returns a SentimentClassification.

Extract structured entities

The same abstraction handles extraction — only the target type changes. Here we pull named entities out of a sentence into a Vec<Entity>:

#[derive(Debug, Deserialize, Serialize, JsonSchema)]
enum EntityType {
    Person,
    Organization,
    Location,
}

#[derive(Debug, Deserialize, Serialize, JsonSchema)]
struct Entity {
    text: String,
    entity_type: EntityType,
}

#[derive(Debug, Deserialize, Serialize, JsonSchema)]
struct ExtractedEntities {
    entities: Vec<Entity>,
}

let ner = openai_client
    .extractor::<ExtractedEntities>("gpt-5.5")
    .preamble("Identify and extract people, organizations, and locations from the text.")
    .build();

let text = "Apple Inc., based in Cupertino, was founded by Steve Jobs and Steve Wozniak.";
let result = ner.extract(text).await?;

for entity in result.entities {
    println!("{:?}: {}", entity.entity_type, entity.text);
}

Classify and extract together

Because the target is an ordinary Rust type, you can nest classification and extraction structs and get both from a single API call. This is cheaper and more consistent than running two extractors:

#[derive(Debug, Deserialize, Serialize, JsonSchema)]
struct TextAnalysis {
    sentiment: SentimentClassification,
    entities: Vec<Entity>,
}

let analyzer = openai_client
    .extractor::<TextAnalysis>("gpt-5.5")
    .preamble(
        "For the given text: classify the overall sentiment with a confidence score, \
         and extract named entities (Person, Organization, Location).",
    )
    .build();

let text = "I had a great time visiting Google's headquarters in Mountain View. \
            Sundar Pichai's leadership has been impressive.";
let result = analyzer.extract(text).await?;

println!(
    "Sentiment: {:?} ({:.2})",
    result.sentiment.sentiment, result.sentiment.confidence
);
for entity in &result.entities {
    println!("- {:?}: {}", entity.entity_type, entity.text);
}

The pattern scales to richer schemas. To analyze a news article, add fields for topic and key points and the extractor fills them all in one pass:

#[derive(Debug, Deserialize, Serialize, JsonSchema)]
enum Topic {
    Politics,
    Technology,
    Sports,
    Entertainment,
    Other(String),
}

#[derive(Debug, Deserialize, Serialize, JsonSchema)]
struct NewsArticleAnalysis {
    topic: Topic,
    sentiment: SentimentClassification,
    entities: Vec<Entity>,
    key_points: Vec<String>,
}

Tips

Use Option<T> for fields that may be absent so the model can omit them cleanly.
Keep structs focused. Smaller, well-named types extract more reliably than one sprawling schema.
Guide with examples. A couple of labeled examples in the preamble sharply improves consistency.
Pick the right model. gpt-5.5 handles nuanced, multi-field analysis; a smaller model is fine for simple classification.