Skip to content
Get Started

Text extraction & classification

Classification assigns text to predefined categories (sentiment, topic, spam), while extraction pulls structured fields out of free-form text (named entities, key-value pairs). Both map cleanly onto Rig’s Extractor: you define a Rust type for the result, and Rig drives an LLM to produce a value of that type with type-safe deserialization.

This guide builds up from a sentiment classifier to a combined analyzer that classifies and extracts in a single call.

Terminal window
cargo new text_analysis
cd text_analysis
[dependencies]
rig = "0.39.0"
tokio = { version = "1", features = ["full"] }
anyhow = "1"
serde = { version = "1", features = ["derive"] }
schemars = "0.8"

Set your OpenAI API key:

Terminal window
export OPENAI_API_KEY=your_api_key_here

Any extractor target must derive serde::Deserialize, serde::Serialize, and schemars::JsonSchema. Rig turns the type into a JSON schema, the model fills it in, and Rig deserializes the result back into your struct.

use anyhow::Result;
use rig::client::ProviderClient;
use rig::providers::openai;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
#[derive(Debug, Deserialize, Serialize, JsonSchema)]
enum Sentiment {
Positive,
Negative,
Neutral,
}
#[derive(Debug, Deserialize, Serialize, JsonSchema)]
struct SentimentClassification {
sentiment: Sentiment,
confidence: f32,
}
#[tokio::main]
async fn main() -> Result<()> {
let openai_client = openai::Client::from_env()?;
let classifier = openai_client
.extractor::<SentimentClassification>("gpt-5.5")
.preamble("You are a sentiment analysis AI. Classify the sentiment of the given text.")
.build();
let text = "I absolutely loved the new restaurant. The food was amazing!";
let result = classifier.extract(text).await?;
println!("Sentiment: {:?} ({:.2})", result.sentiment, result.confidence);
Ok(())
}

extractor::<T>(model) returns a builder, .preamble(...) sets the model’s instructions, and .build() produces the extractor. .extract(text).await? runs it and returns a SentimentClassification.

The same abstraction handles extraction — only the target type changes. Here we pull named entities out of a sentence into a Vec<Entity>:

#[derive(Debug, Deserialize, Serialize, JsonSchema)]
enum EntityType {
Person,
Organization,
Location,
}
#[derive(Debug, Deserialize, Serialize, JsonSchema)]
struct Entity {
text: String,
entity_type: EntityType,
}
#[derive(Debug, Deserialize, Serialize, JsonSchema)]
struct ExtractedEntities {
entities: Vec<Entity>,
}
let ner = openai_client
.extractor::<ExtractedEntities>("gpt-5.5")
.preamble("Identify and extract people, organizations, and locations from the text.")
.build();
let text = "Apple Inc., based in Cupertino, was founded by Steve Jobs and Steve Wozniak.";
let result = ner.extract(text).await?;
for entity in result.entities {
println!("{:?}: {}", entity.entity_type, entity.text);
}

Because the target is an ordinary Rust type, you can nest classification and extraction structs and get both from a single API call. This is cheaper and more consistent than running two extractors:

#[derive(Debug, Deserialize, Serialize, JsonSchema)]
struct TextAnalysis {
sentiment: SentimentClassification,
entities: Vec<Entity>,
}
let analyzer = openai_client
.extractor::<TextAnalysis>("gpt-5.5")
.preamble(
"For the given text: classify the overall sentiment with a confidence score, \
and extract named entities (Person, Organization, Location).",
)
.build();
let text = "I had a great time visiting Google's headquarters in Mountain View. \
Sundar Pichai's leadership has been impressive.";
let result = analyzer.extract(text).await?;
println!(
"Sentiment: {:?} ({:.2})",
result.sentiment.sentiment, result.sentiment.confidence
);
for entity in &result.entities {
println!("- {:?}: {}", entity.entity_type, entity.text);
}

The pattern scales to richer schemas. To analyze a news article, add fields for topic and key points and the extractor fills them all in one pass:

#[derive(Debug, Deserialize, Serialize, JsonSchema)]
enum Topic {
Politics,
Technology,
Sports,
Entertainment,
Other(String),
}
#[derive(Debug, Deserialize, Serialize, JsonSchema)]
struct NewsArticleAnalysis {
topic: Topic,
sentiment: SentimentClassification,
entities: Vec<Entity>,
key_points: Vec<String>,
}
  • Use Option<T> for fields that may be absent so the model can omit them cleanly.
  • Keep structs focused. Smaller, well-named types extract more reliably than one sprawling schema.
  • Guide with examples. A couple of labeled examples in the preamble sharply improves consistency.
  • Pick the right model. gpt-5.5 handles nuanced, multi-field analysis; a smaller model is fine for simple classification.