Skip to content
Get Started

MongoDB

The rig-mongodb crate backs Rig’s vector store with MongoDB Atlas Vector Search. It runs similarity search natively inside MongoDB via an aggregation pipeline, giving you persistence, rich metadata filtering, and cursor-based streaming of results.

[dependencies]
rig = "0.39.0"
rig-mongodb = "0.39.0"
mongodb = "3"
tokio = { version = "1", features = ["full"] }

You need a MongoDB Atlas cluster (Vector Search is an Atlas feature) and its connection string.

Derive Embed on your document type, mark the field to embed with #[embed], and include a field for the stored embedding vector. Map your identifier to Mongo’s _id.

use rig::Embed;
use serde::Deserialize;
#[derive(Embed, Clone, Deserialize, Debug)]
struct Document {
#[serde(rename = "_id")]
id: String,
#[embed]
content: String,
embedding: Vec<f64>,
}

Before you can query, the collection must have a vector search index. Create it in the Atlas UI or via the API, matching numDimensions to your embedding model (1536 for text-embedding-3-small) and path to your embedding field:

{
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 1536,
"similarity": "cosine"
}
]
}

MongoDbVectorIndex::new wraps a collection with an embedding model, the name of the Atlas vector index, and search parameters. It validates the index on construction (existence, dimensions, and similarity metric).

use rig_mongodb::{MongoDbVectorIndex, SearchParams};
let index = MongoDbVectorIndex::new(
collection, // mongodb::Collection<Document>
model, // an EmbeddingModel, used to embed queries
"vector_index", // the Atlas vector search index name
SearchParams::new(),
)
.await?;

SearchParams lets you configure the embedding field name and the number of candidates the aggregation considers; see the docs.rs API for the options.

use std::env;
use mongodb::{bson, options::ClientOptions, Client as MongoClient, Collection};
use rig::Embed;
use rig::client::{EmbeddingsClient, ProviderClient};
use rig::providers::openai;
use rig::vector_store::{VectorStoreIndex, VectorSearchRequest};
use rig_mongodb::{MongoDbVectorIndex, SearchParams};
use serde::Deserialize;
#[derive(Embed, Clone, Deserialize, Debug)]
struct Document {
#[serde(rename = "_id")]
id: String,
#[embed]
content: String,
embedding: Vec<f64>,
}
#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
// Embedding model, used both to embed documents and to embed queries.
let openai = openai::Client::from_env()?;
let model = openai.embedding_model(openai::TEXT_EMBEDDING_3_SMALL);
// Connect to MongoDB Atlas.
let conn = env::var("MONGODB_CONNECTION_STRING")?;
let options = ClientOptions::parse(conn).await?;
let mongo = MongoClient::with_options(options)?;
let collection: Collection<bson::Document> =
mongo.database("knowledgebase").collection("context");
// Wrap the collection in a Rig vector index.
let index = MongoDbVectorIndex::new(
collection,
model,
"vector_index",
SearchParams::new(),
)
.await?;
// Search.
let req = VectorSearchRequest::builder()
.query("What does \"glarb-glarb\" mean?")
.samples(1)
.build();
let results = index.top_n::<Document>(req).await?;
for (score, id, doc) in results {
println!("{score:.3} {id}");
}
Ok(())
}

To insert documents, generate embeddings with EmbeddingsBuilder and write them to the collection (or use the shared InsertDocuments trait); see the overview.

Queries compile to a MongoDB aggregation pipeline with three stages: a $vectorSearch stage that finds nearest neighbors, a score stage that normalizes similarity scores, and a projection stage that shapes the returned documents. Errors surface as Rig’s VectorStoreError.