Module 6 of 16

Building Basic RAG Systems

The complete retrieve-augment-generate pipeline with source attribution and citations

3.5 hours2 labsFree

Watch as Slides Course overview Lab code

Start here

Learning objectives

Build a complete RAG pipeline from scratch
Implement context injection and prompt augmentation
Add source attribution and citations
Handle edge cases: no results, conflicting sources, long context

This is the module where everything comes together. You build the complete RAG pipeline: take a user question, embed it, retrieve relevant chunks, inject them into the prompt, and generate a grounded answer with citations.

The RAG Pipeline

import anthropic
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer

embedder = SentenceTransformer('all-MiniLM-L6-v2')
qdrant = QdrantClient(url="http://localhost:6333")
claude = anthropic.Anthropic()

def rag_answer(question: str) -> dict:
    # 1. Embed the query
    query_vector = embedder.encode(question).tolist()

    # 2. Retrieve relevant chunks
    results = qdrant.search(collection_name="docs", query_vector=query_vector, limit=5)

    # 3. Build context from retrieved chunks
    context_chunks = []
    for r in results:
        context_chunks.append(f"[Source: {r.payload['title']}]\n{r.payload['content']}")
    context = "\n\n---\n\n".join(context_chunks)

    # 4. Augment prompt with context
    response = claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system="Answer using ONLY the provided context. Cite sources. If the context does not contain the answer, say so.",
        messages=[{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}],
    )

    return {
        "answer": response.content[0].text,
        "sources": [r.payload['title'] for r in results],
    }

Source Attribution

Production RAG must cite its sources. This builds user trust and enables verification. Include source titles, page numbers, and relevance scores in the response.

Edge Cases

No relevant results: When retrieval returns nothing above the similarity threshold, say "I don't have information about this" instead of hallucinating
Conflicting sources: When retrieved documents disagree, present both perspectives with citations
Context overflow: When retrieved chunks exceed the context window, prioritize by relevance score

Common mistakes

What usually breaks

Not setting a similarity threshold - returning irrelevant chunks degrades quality
Including too many chunks - more context is not always better, it dilutes focus
Not instructing the model to cite sources - users cannot verify answers
Forgetting to handle the "no results" case - the model will hallucinate to fill the gap

Key terms

Vocabulary used in this module

Context Injection

Adding retrieved document chunks to the LLM prompt

Source Attribution

Citing which documents the answer was based on

Similarity Threshold

Minimum relevance score for a chunk to be included

Labs

Hands-on labs

40 minIntermediate

Build a Complete RAG Chatbot

Build an end-to-end RAG system with FastAPI.

Ingest a document corpus into Qdrant
Build the retrieve-augment-generate pipeline
Expose as a FastAPI endpoint
Test with domain-specific questions

View lab on GitHub

25 minIntermediate

Add Citations and Source Attribution

Make your RAG system cite its sources.

Include source metadata in the prompt
Parse citations from the LLM response
Return sources with relevance scores
Handle "no relevant information" gracefully

View lab on GitHub

Recap

Key takeaways

RAG pipeline: embed query → retrieve chunks → augment prompt → generate answer
Always include "answer ONLY from context" in the system prompt to reduce hallucination
Source attribution builds trust - cite document title, section, and relevance score
Handle edge cases: no results, conflicting sources, context overflow
This basic pipeline is the foundation - advanced techniques (Module 7+) improve quality

Related resources

Building Basic RAG Systems

Learning objectives

The RAG Pipeline

Source Attribution

Edge Cases

What usually breaks

Vocabulary used in this module

Context Injection

Source Attribution

Similarity Threshold

Hands-on labs

Build a Complete RAG Chatbot

Add Citations and Source Attribution

Key takeaways

Keep learning across CodersSecret

Related guides

Cheatsheets

Interactive labs

Glossary terms