Module 7 of 16

Advanced Retrieval Engineering

Hybrid search, reranking, query expansion, and Graph RAG for production-quality retrieval

4 hours2 labsFree

Watch as Slides Course overview Lab code

Start here

Learning objectives

Implement hybrid search (BM25 + vector)
Add reranking with cross-encoder models
Design query expansion and transformation strategies
Understand Graph RAG for relationship-aware retrieval

Basic RAG uses single-mode retrieval. Production RAG uses hybrid search (BM25 + vectors), reranking (cross-encoder models), and query transformation. These techniques can improve retrieval quality by 20-40% - which directly translates to better answers.

Hybrid Search

Combine keyword search (BM25) with vector search, then merge results using Reciprocal Rank Fusion (RRF). BM25 catches exact terms that vector search misses (product codes, acronyms). Vectors catch meaning that BM25 misses (synonyms, paraphrases).

Reranking

Initial retrieval (BM25 + vector) is fast but coarse. A cross-encoder reranker takes the top-K results and reorders them by computing a relevance score using full cross-attention between query and document. Slower but much more accurate.

from sentence_transformers import CrossEncoder

reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

# Rerank top-20 results to get top-5
pairs = [(query, doc.content) for doc in initial_results[:20]]
scores = reranker.predict(pairs)
reranked = sorted(zip(initial_results, scores), key=lambda x: -x[1])[:5]

Query Expansion

Sometimes the user query is ambiguous or too short. Query expansion generates multiple variations to improve recall: "python performance" might expand to "python performance optimization", "python speed improvement", "python profiling".

Graph RAG

Traditional RAG retrieves independent chunks. Graph RAG builds a knowledge graph of relationships between entities and concepts, enabling multi-hop reasoning: "What are the dependencies of Service A?" can follow relationship edges across the graph.

Common mistakes

What usually breaks

Reranking all results (too slow) - rerank top-20 only
Not tuning the BM25/vector weight ratio for hybrid search
Using query expansion without controlling result diversity
Implementing Graph RAG before basic RAG is working well

Key terms

Vocabulary used in this module

Hybrid Search

Combining keyword (BM25) and semantic (vector) search

RRF

Reciprocal Rank Fusion - merging ranked results from multiple sources

Cross-Encoder

Model that scores query-document relevance with full cross-attention

Graph RAG

Retrieval using knowledge graph relationships between entities

Labs

Hands-on labs

35 minIntermediate

Implement Hybrid Retrieval

Combine BM25 and vector search with RRF.

Add BM25 index alongside vector index
Implement Reciprocal Rank Fusion
Compare hybrid vs single-mode on test queries
Measure precision/recall improvement

View lab on GitHub

30 minIntermediate

Add Cross-Encoder Reranking

Rerank retrieval results for better relevance.

Load a cross-encoder reranking model
Rerank top-20 hybrid results to top-5
Compare answer quality with and without reranking
Measure latency impact

View lab on GitHub

Recap

Key takeaways

Hybrid search (BM25 + vectors + RRF) improves recall by 20-40%
Cross-encoder reranking improves precision at the cost of latency
Query expansion handles ambiguous or short queries
Graph RAG enables multi-hop reasoning across entity relationships
Advanced retrieval is the highest-ROI investment in RAG quality

Related resources

Advanced Retrieval Engineering

Learning objectives

Hybrid Search

Reranking

Query Expansion

Graph RAG

What usually breaks

Vocabulary used in this module

Hybrid Search

RRF

Cross-Encoder

Graph RAG

Hands-on labs

Implement Hybrid Retrieval

Add Cross-Encoder Reranking

Key takeaways

Keep learning across CodersSecret

Related guides

Cheatsheets

Interactive labs

Glossary terms