Embeddings are the bridge between text and vector search. An embedding model converts text into a fixed-size vector (list of numbers) where similar meanings produce nearby vectors. The quality of your embeddings directly determines the quality of your retrieval.
How Embedding Models Work
Embedding models are neural networks trained on massive text pairs (question-answer, paraphrase, similar documents). They learn to map semantically similar text to nearby points in vector space. At inference time, they convert any text to a vector in milliseconds.
Comparing Embedding Models
from sentence_transformers import SentenceTransformer
# Small, fast — good for prototyping
model_small = SentenceTransformer('all-MiniLM-L6-v2') # 384 dims, 22M params
# Large, accurate — good for production
model_large = SentenceTransformer('all-mpnet-base-v2') # 768 dims, 109M params
# Domain-specific options:
# nomic-embed-text — strong general purpose
# voyage-3 — high quality, API-based
# text-embedding-3-large — OpenAI, 3072 dims
Embedding Optimization
- Dimensionality: Higher dims capture more nuance but cost more storage and compute
- Batch processing: Embed documents in batches for throughput
- Caching: Cache embeddings — do not re-embed unchanged documents
- Quantization: Reduce vector precision (float32 to int8) for 4x storage savings
- Matryoshka embeddings: Models that work at variable dimensions (truncate for speed)