Skip to main content

Module 7: Advanced Retrieval Engineering

Hybrid search, reranking, query expansion, and Graph RAG for production-quality retrieval

4 hours. 2 hands-on labs. Free course module.

Learning Objectives

  • Implement hybrid search (BM25 + vector)
  • Add reranking with cross-encoder models
  • Design query expansion and transformation strategies
  • Understand Graph RAG for relationship-aware retrieval

Why This Matters

This is where your RAG system goes from "works in demos" to "works in production." The difference between 70% and 90% retrieval accuracy is the difference between a useful tool and an unreliable one. Hybrid search + reranking is the production standard.

ADVANCED RETRIEVAL PIPELINEQueryBM25 SearchVector SearchRRF Mergecombine resultsCross-Encoderrerank by relevanceTop ResultsWhy Hybrid + Reranking?BM25 catches exact terms (product codes, names)Vectors catch meaning (synonyms, paraphrases)Reranker reorders by true relevance (cross-attention between query and document)
Architecture diagram for Module 7: Advanced Retrieval Engineering.

Lesson Content

Basic RAG uses single-mode retrieval. Production RAG uses hybrid search (BM25 + vectors), reranking (cross-encoder models), and query transformation. These techniques can improve retrieval quality by 20-40% — which directly translates to better answers.

Hybrid Search

Combine keyword search (BM25) with vector search, then merge results using Reciprocal Rank Fusion (RRF). BM25 catches exact terms that vector search misses (product codes, acronyms). Vectors catch meaning that BM25 misses (synonyms, paraphrases).

Reranking

Initial retrieval (BM25 + vector) is fast but coarse. A cross-encoder reranker takes the top-K results and reorders them by computing a relevance score using full cross-attention between query and document. Slower but much more accurate.

from sentence_transformers import CrossEncoder

reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

# Rerank top-20 results to get top-5
pairs = [(query, doc.content) for doc in initial_results[:20]]
scores = reranker.predict(pairs)
reranked = sorted(zip(initial_results, scores), key=lambda x: -x[1])[:5]

Query Expansion

Sometimes the user query is ambiguous or too short. Query expansion generates multiple variations to improve recall: "python performance" might expand to "python performance optimization", "python speed improvement", "python profiling".

Graph RAG

Traditional RAG retrieves independent chunks. Graph RAG builds a knowledge graph of relationships between entities and concepts, enabling multi-hop reasoning: "What are the dependencies of Service A?" can follow relationship edges across the graph.

Common Mistakes

  • Reranking all results (too slow) — rerank top-20 only
  • Not tuning the BM25/vector weight ratio for hybrid search
  • Using query expansion without controlling result diversity
  • Implementing Graph RAG before basic RAG is working well

Key Terms

Hybrid Search
Combining keyword (BM25) and semantic (vector) search
RRF
Reciprocal Rank Fusion — merging ranked results from multiple sources
Cross-Encoder
Model that scores query-document relevance with full cross-attention
Graph RAG
Retrieval using knowledge graph relationships between entities

Hands-On Labs

  1. Implement Hybrid Retrieval

    Combine BM25 and vector search with RRF.

    35 min - Intermediate

    • Add BM25 index alongside vector index
    • Implement Reciprocal Rank Fusion
    • Compare hybrid vs single-mode on test queries
    • Measure precision/recall improvement

    View lab files on GitHub

  2. Add Cross-Encoder Reranking

    Rerank retrieval results for better relevance.

    30 min - Intermediate

    • Load a cross-encoder reranking model
    • Rerank top-20 hybrid results to top-5
    • Compare answer quality with and without reranking
    • Measure latency impact

    View lab files on GitHub

Key Takeaways

  • Hybrid search (BM25 + vectors + RRF) improves recall by 20-40%
  • Cross-encoder reranking improves precision at the cost of latency
  • Query expansion handles ambiguous or short queries
  • Graph RAG enables multi-hop reasoning across entity relationships
  • Advanced retrieval is the highest-ROI investment in RAG quality