LanceDB
The rig-lancedb crate backs Rig’s vector store with LanceDB, a serverless vector database built on Apache Arrow. It stores embeddings in a columnar format and runs either embedded on local disk or against cloud object storage (S3, GCS, Azure).
[dependencies]rig = "0.39.0"rig-lancedb = "0.39.0"lancedb = "0.30"tokio = { version = "1", features = ["full"] }Connecting
Section titled “Connecting”lancedb::connect opens (or creates) a database at a URI. Use a path for local storage or an s3:// / gs:// / az:// URI for cloud storage.
// Local, on-disk store.let db = lancedb::connect("data/lancedb-store").execute().await?;
// Cloud storage on S3 (see the LanceDB storage guide for IAM requirements).let db = lancedb::connect("s3://my-lancedb-bucket").execute().await?;Table schema
Section titled “Table schema”LanceDB tables are typed. Each table needs an id column, your document columns, and a fixed-size list column for the embedding whose length matches your model’s dimensions.
use std::sync::Arc;use lancedb::arrow::arrow_schema::{DataType, Field, Fields, Schema};
fn schema(dims: usize) -> Schema { Schema::new(Fields::from(vec![ Field::new("id", DataType::Utf8, false), Field::new("definition", DataType::Utf8, false), Field::new( "embedding", DataType::FixedSizeList( Arc::new(Field::new("item", DataType::Float64, true)), dims as i32, ), false, ), ]))}You populate the table by converting your embedded documents into Arrow RecordBatches and calling LanceDB’s create_table / add. See the full local example in the repo for the batch-building code.
Creating the index
Section titled “Creating the index”LanceDbVectorIndex::new wraps a LanceDB table together with an embedding model, the id column name, and search parameters.
use rig_lancedb::{LanceDbVectorIndex, SearchParams};
let table = db.open_table("documents").execute().await?;
let index = LanceDbVectorIndex::new( table, model, // an EmbeddingModel, used to embed queries "id", // id column SearchParams::default(),).await?;Search parameters
Section titled “Search parameters”SearchParams configures how queries run — the distance metric (Cosine, L2) and the number of candidates to consider. SearchParams::default() is a sensible starting point; see the docs.rs API for the full builder.
Index types
Section titled “Index types”LanceDB supports two nearest-neighbor strategies:
- IVF-PQ (Inverted File with Product Quantization) — approximate search (ANN). Faster on large tables but approximate; creating an IVF-PQ index requires at least 256 rows.
- Exact Nearest Neighbors (ENN) — exact results, slower. Good for small tables where an ANN index isn’t warranted.
Querying
Section titled “Querying”Queries use the shared VectorSearchRequest and top_n:
use rig::vector_store::{VectorStoreIndex, VectorSearchRequest};
let req = VectorSearchRequest::builder() .query("search query") .samples(5) .build();
let results = index.top_n::<Document>(req).await?;for (score, id, doc) in results { println!("{score:.3} {id}");}LanceDB also supports metadata filtering; pass a Filter on the request to narrow results (see Filters).
- Distance metric — pick Cosine or L2 to match how your embeddings were produced.
- Index size — keep local storage for smaller datasets; move to cloud storage (S3/GCS/Azure) for large-scale deployments.
- ANN vs ENN — IVF-PQ trades a little accuracy for speed on large tables; ENN gives exact results on small ones.
See also
Section titled “See also”- Vector Stores overview — shared traits,
VectorSearchRequest, filters. - Deploy with LanceDB — a deployment walkthrough.
- LanceDB documentation — storage, indexing, and tuning.
