MongoDB

The rig-mongodb crate backs Rig’s vector store with MongoDB Atlas Vector Search. It runs similarity search natively inside MongoDB via an aggregation pipeline, giving you persistence, rich metadata filtering, and cursor-based streaming of results.

Setup

[dependencies]
rig = "0.39.0"
rig-mongodb = "0.39.0"
mongodb = "3"
tokio = { version = "1", features = ["full"] }

You need a MongoDB Atlas cluster (Vector Search is an Atlas feature) and its connection string.

Document schema

Derive Embed on your document type, mark the field to embed with #[embed], and include a field for the stored embedding vector. Map your identifier to Mongo’s _id.

use rig::Embed;
use serde::Deserialize;

#[derive(Embed, Clone, Deserialize, Debug)]
struct Document {
    #[serde(rename = "_id")]
    id: String,
    #[embed]
    content: String,
    embedding: Vec<f64>,
}

Atlas vector search index

Before you can query, the collection must have a vector search index. Create it in the Atlas UI or via the API, matching numDimensions to your embedding model (1536 for text-embedding-3-small) and path to your embedding field:

{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 1536,
      "similarity": "cosine"
    }
  ]
}

Creating the index

MongoDbVectorIndex::new wraps a collection with an embedding model, the name of the Atlas vector index, and search parameters. It validates the index on construction (existence, dimensions, and similarity metric).

use rig_mongodb::{MongoDbVectorIndex, SearchParams};

let index = MongoDbVectorIndex::new(
    collection,       // mongodb::Collection<Document>
    model,            // an EmbeddingModel, used to embed queries
    "vector_index",   // the Atlas vector search index name
    SearchParams::new(),
)
.await?;

SearchParams lets you configure the embedding field name and the number of candidates the aggregation considers; see the docs.rs API for the options.

Connecting and querying

use std::env;
use mongodb::{bson, options::ClientOptions, Client as MongoClient, Collection};
use rig::Embed;
use rig::client::{EmbeddingsClient, ProviderClient};
use rig::providers::openai;
use rig::vector_store::{VectorStoreIndex, VectorSearchRequest};
use rig_mongodb::{MongoDbVectorIndex, SearchParams};
use serde::Deserialize;

#[derive(Embed, Clone, Deserialize, Debug)]
struct Document {
    #[serde(rename = "_id")]
    id: String,
    #[embed]
    content: String,
    embedding: Vec<f64>,
}

#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
    // Embedding model, used both to embed documents and to embed queries.
    let openai = openai::Client::from_env()?;
    let model = openai.embedding_model(openai::TEXT_EMBEDDING_3_SMALL);

    // Connect to MongoDB Atlas.
    let conn = env::var("MONGODB_CONNECTION_STRING")?;
    let options = ClientOptions::parse(conn).await?;
    let mongo = MongoClient::with_options(options)?;
    let collection: Collection<bson::Document> =
        mongo.database("knowledgebase").collection("context");

    // Wrap the collection in a Rig vector index.
    let index = MongoDbVectorIndex::new(
        collection,
        model,
        "vector_index",
        SearchParams::new(),
    )
    .await?;

    // Search.
    let req = VectorSearchRequest::builder()
        .query("What does \"glarb-glarb\" mean?")
        .samples(1)
        .build();

    let results = index.top_n::<Document>(req).await?;
    for (score, id, doc) in results {
        println!("{score:.3} {id}");
    }

    Ok(())
}

To insert documents, generate embeddings with EmbeddingsBuilder and write them to the collection (or use the shared InsertDocuments trait); see the overview.

How the search runs

Queries compile to a MongoDB aggregation pipeline with three stages: a $vectorSearch stage that finds nearest neighbors, a score stage that normalizes similarity scores, and a projection stage that shapes the returned documents. Errors surface as Rig’s VectorStoreError.