Module 7: Advanced Retrieval Engineering
Hybrid search, reranking, query expansion, and Graph RAG for production-quality retrieval
4 hours. 2 hands-on labs. Free course module.
Learning Objectives
- Implement hybrid search (BM25 + vector)
- Add reranking with cross-encoder models
- Design query expansion and transformation strategies
- Understand Graph RAG for relationship-aware retrieval
Why This Matters
This is where your RAG system goes from "works in demos" to "works in production." The difference between 70% and 90% retrieval accuracy is the difference between a useful tool and an unreliable one. Hybrid search + reranking is the production standard.
Lesson Content
Basic RAG uses single-mode retrieval. Production RAG uses hybrid search (BM25 + vectors), reranking (cross-encoder models), and query transformation. These techniques can improve retrieval quality by 20-40% — which directly translates to better answers.
Hybrid Search
Combine keyword search (BM25) with vector search, then merge results using Reciprocal Rank Fusion (RRF). BM25 catches exact terms that vector search misses (product codes, acronyms). Vectors catch meaning that BM25 misses (synonyms, paraphrases).
Reranking
Initial retrieval (BM25 + vector) is fast but coarse. A cross-encoder reranker takes the top-K results and reorders them by computing a relevance score using full cross-attention between query and document. Slower but much more accurate.
from sentence_transformers import CrossEncoder
reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
# Rerank top-20 results to get top-5
pairs = [(query, doc.content) for doc in initial_results[:20]]
scores = reranker.predict(pairs)
reranked = sorted(zip(initial_results, scores), key=lambda x: -x[1])[:5]
Query Expansion
Sometimes the user query is ambiguous or too short. Query expansion generates multiple variations to improve recall: "python performance" might expand to "python performance optimization", "python speed improvement", "python profiling".
Graph RAG
Traditional RAG retrieves independent chunks. Graph RAG builds a knowledge graph of relationships between entities and concepts, enabling multi-hop reasoning: "What are the dependencies of Service A?" can follow relationship edges across the graph.
Common Mistakes
- Reranking all results (too slow) — rerank top-20 only
- Not tuning the BM25/vector weight ratio for hybrid search
- Using query expansion without controlling result diversity
- Implementing Graph RAG before basic RAG is working well
Key Terms
- Hybrid Search
- Combining keyword (BM25) and semantic (vector) search
- RRF
- Reciprocal Rank Fusion — merging ranked results from multiple sources
- Cross-Encoder
- Model that scores query-document relevance with full cross-attention
- Graph RAG
- Retrieval using knowledge graph relationships between entities
Hands-On Labs
-
Implement Hybrid Retrieval
Combine BM25 and vector search with RRF.
35 min - Intermediate
- Add BM25 index alongside vector index
- Implement Reciprocal Rank Fusion
- Compare hybrid vs single-mode on test queries
- Measure precision/recall improvement
-
Add Cross-Encoder Reranking
Rerank retrieval results for better relevance.
30 min - Intermediate
- Load a cross-encoder reranking model
- Rerank top-20 hybrid results to top-5
- Compare answer quality with and without reranking
- Measure latency impact
Key Takeaways
- Hybrid search (BM25 + vectors + RRF) improves recall by 20-40%
- Cross-encoder reranking improves precision at the cost of latency
- Query expansion handles ambiguous or short queries
- Graph RAG enables multi-hop reasoning across entity relationships
- Advanced retrieval is the highest-ROI investment in RAG quality