Vector Databases Explained Guide

Traditional databases find exact matches: “find all users where email = alice@example.com.” Vector databases find similar matches: “find documents most similar to this question.” This capability powers every RAG pipeline, semantic search engine, recommendation system, and image similarity feature built with AI.

What Are Embeddings?

An embedding is a list of numbers (a vector) that represents the meaning of text, images, or any data. Similar meanings produce similar vectors. The magic is that “How do I reset my password?” and “I forgot my login credentials” produce nearby vectors, even though they share no words.

# Generate embeddings with OpenAI or sentence-transformers
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

texts = [
    "How do I reset my password?",
    "I forgot my login credentials",
    "What is the weather today?",
]

embeddings = model.encode(texts)
# embeddings[0].shape = (384,)  # 384-dimensional vector

# Similarity between password questions: ~0.85 (very similar)
# Similarity between password and weather: ~0.12 (very different)

from sklearn.metrics.pairwise import cosine_similarity
print(cosine_similarity([embeddings[0]], [embeddings[1]]))  # ~0.85
print(cosine_similarity([embeddings[0]], [embeddings[2]]))  # ~0.12

How Similarity Search Works

Given a query vector, find the K nearest vectors in a database of millions. The naive approach (compare against every vector) is O(n) and too slow. Vector databases use approximate nearest neighbor (ANN) algorithms.

HNSW: The Algorithm Behind Most Vector DBs

HNSW (Hierarchical Navigable Small World) builds a multi-layer graph where each layer is progressively sparser. Search starts at the top layer (coarse navigation) and descends to lower layers (fine-grained search).

# Conceptual HNSW structure:
# Layer 2 (sparse):   A ---- D ---- G
# Layer 1 (medium):   A -- B -- D -- F -- G
# Layer 0 (dense):    A-B-C-D-E-F-G-H-I-J

# Search for a vector near E:
# 1. Start at layer 2: jump to closest node (D)
# 2. Drop to layer 1: navigate D -> F or D -> B
# 3. Drop to layer 0: navigate to E (found!)

# Time complexity: O(log n) instead of O(n)
# Accuracy: 95-99% recall (misses ~1-5% of true nearest neighbors)
# Trade-off: more memory for higher recall

Vector Database Options

Database	Type	Best For	Pricing
pgvector	PostgreSQL extension	Small-medium datasets, existing PG users	Free (self-hosted)
ChromaDB	Embedded / client-server	Prototyping, small RAG apps	Free (open source)
Pinecone	Managed cloud	Production at scale, zero ops	Pay per use
Weaviate	Self-hosted / cloud	Multi-modal (text + images)	Free (self-hosted) / paid cloud
Qdrant	Self-hosted / cloud	High performance, filtering	Free (self-hosted) / paid cloud
Milvus	Self-hosted / cloud	Billion-scale datasets	Free (self-hosted) / paid cloud

pgvector: Start Here

# Install pgvector extension
CREATE EXTENSION vector;

# Create a table with a vector column
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    title TEXT,
    content TEXT,
    embedding vector(384)   -- 384 dimensions
);

# Insert a document with its embedding
INSERT INTO documents (title, content, embedding)
VALUES ('Password Reset', 'How to reset your password...',
        '[0.1, -0.3, 0.5, ...]');  -- 384 floats

# Find the 5 most similar documents
SELECT id, title, embedding <=> '[0.2, -0.1, 0.4, ...]' AS distance
FROM documents
ORDER BY embedding <=> '[0.2, -0.1, 0.4, ...]'  -- cosine distance
LIMIT 5;

# Create an HNSW index for fast search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

# With index: searches 1M vectors in ~5ms
# Without index: searches 1M vectors in ~500ms

Chunking Strategies for RAG

Documents must be split into chunks before embedding. Chunk size dramatically affects retrieval quality.

# Strategy 1: Fixed-size chunks (simple, often good enough)
def chunk_by_size(text: str, chunk_size: int = 500, overlap: int = 50) -> list[str]:
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunks.append(text[start:end])
        start = end - overlap  # Overlap prevents splitting mid-sentence
    return chunks

# Strategy 2: Semantic chunking (split on headings/paragraphs)
def chunk_by_structure(text: str) -> list[str]:
    # Split on markdown headers or double newlines
    import re
    sections = re.split(r'\n#{1,3} |\n\n', text)
    return [s.strip() for s in sections if len(s.strip()) > 50]

# Strategy 3: Recursive chunking (LangChain approach)
# Split on paragraphs first, then sentences, then words
# Keep chunks under max_size while preserving semantic boundaries

# Chunk size guidelines:
# Too small (< 100 tokens): loses context, retrieval misses meaning
# Too large (> 1000 tokens): dilutes relevance, wastes context window
# Sweet spot: 200-500 tokens with 10-20% overlap

Complete RAG Pipeline

import chromadb
from sentence_transformers import SentenceTransformer

# Setup
embedder = SentenceTransformer('all-MiniLM-L6-v2')
client = chromadb.PersistentClient(path="./vectordb")
collection = client.get_or_create_collection(
    "docs",
    metadata={"hnsw:space": "cosine"}
)

# Index documents
def index_documents(docs: list[dict]):
    for doc in docs:
        chunks = chunk_by_size(doc["content"])
        embeddings = embedder.encode(chunks).tolist()
        collection.add(
            ids=[f"{doc['id']}_chunk_{i}" for i in range(len(chunks))],
            embeddings=embeddings,
            documents=chunks,
            metadatas=[{"source": doc["title"], "chunk": i} for i in range(len(chunks))],
        )

# Query: find relevant chunks
def search(query: str, top_k: int = 5) -> list[str]:
    query_embedding = embedder.encode(query).tolist()
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=top_k,
    )
    return results["documents"][0]

# Generate answer with context
def rag_answer(question: str) -> str:
    context_chunks = search(question, top_k=5)
    context = "\n\n".join(context_chunks)

    response = client.messages.create(
        model="claude-sonnet-4-6",
        system="Answer using ONLY the provided context. Cite sources.",
        messages=[{
            "role": "user",
            "content": f"Context:\n{context}\n\nQuestion: {question}"
        }],
    )
    return response.content[0].text

When You Need a Vector Database

RAG pipeline: Retrieve relevant documents to ground LLM responses
Semantic search: Search by meaning, not just keywords
Recommendation engine: Find similar products, articles, or users
Image similarity: Reverse image search, duplicate detection
Anomaly detection: Find data points that are far from any cluster

When You Do NOT Need One

Less than 10,000 documents: Brute-force cosine similarity in NumPy is fast enough
Keyword search is sufficient: Elasticsearch with BM25 handles keyword queries well
Exact match only: Regular database with full-text search
Already using PostgreSQL: pgvector extension avoids adding a new database

Key Takeaways

Embeddings convert meaning to numbers - similar meanings produce nearby vectors
HNSW is the dominant algorithm for approximate nearest neighbor search - O(log n) with 95-99% recall
Start with pgvector if you already use PostgreSQL - it handles millions of vectors well
Chunk size matters for RAG: 200-500 tokens with overlap is the sweet spot
Use managed services (Pinecone) for production at scale - self-hosting vector databases requires tuning
You might not need a vector database - for small datasets, NumPy cosine similarity works fine
Combine vector search with keyword search (hybrid search) for best results

Vector databases are infrastructure, not magic. They store numbers and find nearest neighbors efficiently. The magic is in the embeddings - how you convert your data into meaningful vectors. Get the embeddings and chunking right, and any vector database will serve you well. Get them wrong, and the fanciest database cannot save your search quality.

Vector Databases Explained: Embeddings, Similarity Search, and When You Need One

What Are Embeddings?

How Similarity Search Works

HNSW: The Algorithm Behind Most Vector DBs

Vector Database Options

pgvector: Start Here

Chunking Strategies for RAG

Complete RAG Pipeline

When You Need a Vector Database

When You Do NOT Need One

Key Takeaways

Stuck on implementation?

Related Production Resources

Free learning tracks

Interactive engineering labs

Production cheatsheets

Key terms

Discussion

Discussion is unavailable

What Are Embeddings?

How Similarity Search Works

HNSW: The Algorithm Behind Most Vector DBs

Vector Database Options

pgvector: Start Here

Chunking Strategies for RAG

Complete RAG Pipeline

When You Need a Vector Database

When You Do NOT Need One

Key Takeaways

Stuck on implementation?

Related Production Resources

Free learning tracks

Interactive engineering labs

Production cheatsheets

Key terms

Discussion

Discussion is unavailable

Continue Reading

MCP Security in Production: How to Safely Run AI Agents with Tools, OAuth, and Gateways

Fine-Tuning vs RAG vs Prompt Engineering: Which AI Strategy Do You Need?