Text extraction & classification
Classification assigns text to predefined categories (sentiment, topic, spam), while extraction
pulls structured fields out of free-form text (named entities, key-value pairs). Both map cleanly
onto Rig’s Extractor: you define a Rust type for the result, and Rig
drives an LLM to produce a value of that type with type-safe deserialization.
This guide builds up from a sentiment classifier to a combined analyzer that classifies and extracts in a single call.
Project setup
Section titled “Project setup”cargo new text_analysiscd text_analysis[dependencies]rig = "0.39.0"tokio = { version = "1", features = ["full"] }anyhow = "1"serde = { version = "1", features = ["derive"] }schemars = "0.8"Set your OpenAI API key:
export OPENAI_API_KEY=your_api_key_hereClassify text
Section titled “Classify text”Any extractor target must derive serde::Deserialize, serde::Serialize, and
schemars::JsonSchema. Rig turns the type into a JSON schema, the model fills it in, and Rig
deserializes the result back into your struct.
use anyhow::Result;use rig::client::ProviderClient;use rig::providers::openai;use schemars::JsonSchema;use serde::{Deserialize, Serialize};
#[derive(Debug, Deserialize, Serialize, JsonSchema)]enum Sentiment { Positive, Negative, Neutral,}
#[derive(Debug, Deserialize, Serialize, JsonSchema)]struct SentimentClassification { sentiment: Sentiment, confidence: f32,}
#[tokio::main]async fn main() -> Result<()> { let openai_client = openai::Client::from_env()?;
let classifier = openai_client .extractor::<SentimentClassification>("gpt-5.5") .preamble("You are a sentiment analysis AI. Classify the sentiment of the given text.") .build();
let text = "I absolutely loved the new restaurant. The food was amazing!"; let result = classifier.extract(text).await?;
println!("Sentiment: {:?} ({:.2})", result.sentiment, result.confidence); Ok(())}extractor::<T>(model) returns a builder, .preamble(...) sets the model’s instructions, and
.build() produces the extractor. .extract(text).await? runs it and returns a
SentimentClassification.
Extract structured entities
Section titled “Extract structured entities”The same abstraction handles extraction — only the target type changes. Here we pull named
entities out of a sentence into a Vec<Entity>:
#[derive(Debug, Deserialize, Serialize, JsonSchema)]enum EntityType { Person, Organization, Location,}
#[derive(Debug, Deserialize, Serialize, JsonSchema)]struct Entity { text: String, entity_type: EntityType,}
#[derive(Debug, Deserialize, Serialize, JsonSchema)]struct ExtractedEntities { entities: Vec<Entity>,}
let ner = openai_client .extractor::<ExtractedEntities>("gpt-5.5") .preamble("Identify and extract people, organizations, and locations from the text.") .build();
let text = "Apple Inc., based in Cupertino, was founded by Steve Jobs and Steve Wozniak.";let result = ner.extract(text).await?;
for entity in result.entities { println!("{:?}: {}", entity.entity_type, entity.text);}Classify and extract together
Section titled “Classify and extract together”Because the target is an ordinary Rust type, you can nest classification and extraction structs and get both from a single API call. This is cheaper and more consistent than running two extractors:
#[derive(Debug, Deserialize, Serialize, JsonSchema)]struct TextAnalysis { sentiment: SentimentClassification, entities: Vec<Entity>,}
let analyzer = openai_client .extractor::<TextAnalysis>("gpt-5.5") .preamble( "For the given text: classify the overall sentiment with a confidence score, \ and extract named entities (Person, Organization, Location).", ) .build();
let text = "I had a great time visiting Google's headquarters in Mountain View. \ Sundar Pichai's leadership has been impressive.";let result = analyzer.extract(text).await?;
println!( "Sentiment: {:?} ({:.2})", result.sentiment.sentiment, result.sentiment.confidence);for entity in &result.entities { println!("- {:?}: {}", entity.entity_type, entity.text);}The pattern scales to richer schemas. To analyze a news article, add fields for topic and key points and the extractor fills them all in one pass:
#[derive(Debug, Deserialize, Serialize, JsonSchema)]enum Topic { Politics, Technology, Sports, Entertainment, Other(String),}
#[derive(Debug, Deserialize, Serialize, JsonSchema)]struct NewsArticleAnalysis { topic: Topic, sentiment: SentimentClassification, entities: Vec<Entity>, key_points: Vec<String>,}- Use
Option<T>for fields that may be absent so the model can omit them cleanly. - Keep structs focused. Smaller, well-named types extract more reliably than one sprawling schema.
- Guide with examples. A couple of labeled examples in the preamble sharply improves consistency.
- Pick the right model.
gpt-5.5handles nuanced, multi-field analysis; a smaller model is fine for simple classification.
See also
Section titled “See also”- Structured Output — how the
Extractorworks under the hood. - Agents — extractors are built on agents and tools.
rig::extractoron docs.rs — full API reference.
