Skip to main content

Module 6: Building Basic RAG Systems

The complete retrieve-augment-generate pipeline with source attribution and citations

3.5 hours. 2 hands-on labs. Free course module.

Learning Objectives

  • Build a complete RAG pipeline from scratch
  • Implement context injection and prompt augmentation
  • Add source attribution and citations
  • Handle edge cases: no results, conflicting sources, long context

Why This Matters

This is the core skill. Every RAG application — customer support, legal AI, medical AI, code assistant — uses this pipeline. Master it here, then optimize with advanced retrieval, agents, and production patterns in later modules.

RAG PIPELINE: RETRIEVE → AUGMENT → GENERATEUser Querynatural languageEmbed Querysame model as docsRetrieve Top-Kvector DB searchAugment Promptinject contextGenerateLLM + citationsAnswer grounded in YOUR documents, with source citationsNot hallucinated. Verifiable. Domain-specific.
Architecture diagram for Module 6: Building Basic RAG Systems.

Lesson Content

This is the module where everything comes together. You build the complete RAG pipeline: take a user question, embed it, retrieve relevant chunks, inject them into the prompt, and generate a grounded answer with citations.

The RAG Pipeline

import anthropic
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer

embedder = SentenceTransformer('all-MiniLM-L6-v2')
qdrant = QdrantClient(url="http://localhost:6333")
claude = anthropic.Anthropic()

def rag_answer(question: str) -> dict:
    # 1. Embed the query
    query_vector = embedder.encode(question).tolist()

    # 2. Retrieve relevant chunks
    results = qdrant.search(collection_name="docs", query_vector=query_vector, limit=5)

    # 3. Build context from retrieved chunks
    context_chunks = []
    for r in results:
        context_chunks.append(f"[Source: {r.payload['title']}]\n{r.payload['content']}")
    context = "\n\n---\n\n".join(context_chunks)

    # 4. Augment prompt with context
    response = claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system="Answer using ONLY the provided context. Cite sources. If the context does not contain the answer, say so.",
        messages=[{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}],
    )

    return {
        "answer": response.content[0].text,
        "sources": [r.payload['title'] for r in results],
    }

Source Attribution

Production RAG must cite its sources. This builds user trust and enables verification. Include source titles, page numbers, and relevance scores in the response.

Edge Cases

  • No relevant results: When retrieval returns nothing above the similarity threshold, say "I don't have information about this" instead of hallucinating
  • Conflicting sources: When retrieved documents disagree, present both perspectives with citations
  • Context overflow: When retrieved chunks exceed the context window, prioritize by relevance score

Common Mistakes

  • Not setting a similarity threshold — returning irrelevant chunks degrades quality
  • Including too many chunks — more context is not always better, it dilutes focus
  • Not instructing the model to cite sources — users cannot verify answers
  • Forgetting to handle the "no results" case — the model will hallucinate to fill the gap

Key Terms

Context Injection
Adding retrieved document chunks to the LLM prompt
Source Attribution
Citing which documents the answer was based on
Similarity Threshold
Minimum relevance score for a chunk to be included

Hands-On Labs

  1. Build a Complete RAG Chatbot

    Build an end-to-end RAG system with FastAPI.

    40 min - Intermediate

    • Ingest a document corpus into Qdrant
    • Build the retrieve-augment-generate pipeline
    • Expose as a FastAPI endpoint
    • Test with domain-specific questions

    View lab files on GitHub

  2. Add Citations and Source Attribution

    Make your RAG system cite its sources.

    25 min - Intermediate

    • Include source metadata in the prompt
    • Parse citations from the LLM response
    • Return sources with relevance scores
    • Handle "no relevant information" gracefully

    View lab files on GitHub

Key Takeaways

  • RAG pipeline: embed query → retrieve chunks → augment prompt → generate answer
  • Always include "answer ONLY from context" in the system prompt to reduce hallucination
  • Source attribution builds trust — cite document title, section, and relevance score
  • Handle edge cases: no results, conflicting sources, context overflow
  • This basic pipeline is the foundation — advanced techniques (Module 7+) improve quality