Module 6: Building Basic RAG Systems
The complete retrieve-augment-generate pipeline with source attribution and citations
3.5 hours. 2 hands-on labs. Free course module.
Learning Objectives
- Build a complete RAG pipeline from scratch
- Implement context injection and prompt augmentation
- Add source attribution and citations
- Handle edge cases: no results, conflicting sources, long context
Why This Matters
This is the core skill. Every RAG application — customer support, legal AI, medical AI, code assistant — uses this pipeline. Master it here, then optimize with advanced retrieval, agents, and production patterns in later modules.
Lesson Content
This is the module where everything comes together. You build the complete RAG pipeline: take a user question, embed it, retrieve relevant chunks, inject them into the prompt, and generate a grounded answer with citations.
The RAG Pipeline
import anthropic
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer
embedder = SentenceTransformer('all-MiniLM-L6-v2')
qdrant = QdrantClient(url="http://localhost:6333")
claude = anthropic.Anthropic()
def rag_answer(question: str) -> dict:
# 1. Embed the query
query_vector = embedder.encode(question).tolist()
# 2. Retrieve relevant chunks
results = qdrant.search(collection_name="docs", query_vector=query_vector, limit=5)
# 3. Build context from retrieved chunks
context_chunks = []
for r in results:
context_chunks.append(f"[Source: {r.payload['title']}]\n{r.payload['content']}")
context = "\n\n---\n\n".join(context_chunks)
# 4. Augment prompt with context
response = claude.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="Answer using ONLY the provided context. Cite sources. If the context does not contain the answer, say so.",
messages=[{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}],
)
return {
"answer": response.content[0].text,
"sources": [r.payload['title'] for r in results],
}
Source Attribution
Production RAG must cite its sources. This builds user trust and enables verification. Include source titles, page numbers, and relevance scores in the response.
Edge Cases
- No relevant results: When retrieval returns nothing above the similarity threshold, say "I don't have information about this" instead of hallucinating
- Conflicting sources: When retrieved documents disagree, present both perspectives with citations
- Context overflow: When retrieved chunks exceed the context window, prioritize by relevance score
Common Mistakes
- Not setting a similarity threshold — returning irrelevant chunks degrades quality
- Including too many chunks — more context is not always better, it dilutes focus
- Not instructing the model to cite sources — users cannot verify answers
- Forgetting to handle the "no results" case — the model will hallucinate to fill the gap
Key Terms
- Context Injection
- Adding retrieved document chunks to the LLM prompt
- Source Attribution
- Citing which documents the answer was based on
- Similarity Threshold
- Minimum relevance score for a chunk to be included
Hands-On Labs
-
Build a Complete RAG Chatbot
Build an end-to-end RAG system with FastAPI.
40 min - Intermediate
- Ingest a document corpus into Qdrant
- Build the retrieve-augment-generate pipeline
- Expose as a FastAPI endpoint
- Test with domain-specific questions
-
Add Citations and Source Attribution
Make your RAG system cite its sources.
25 min - Intermediate
- Include source metadata in the prompt
- Parse citations from the LLM response
- Return sources with relevance scores
- Handle "no relevant information" gracefully
Key Takeaways
- RAG pipeline: embed query → retrieve chunks → augment prompt → generate answer
- Always include "answer ONLY from context" in the system prompt to reduce hallucination
- Source attribution builds trust — cite document title, section, and relevance score
- Handle edge cases: no results, conflicting sources, context overflow
- This basic pipeline is the foundation — advanced techniques (Module 7+) improve quality