Module 1 of 16

Introduction to AI & RAG Systems

LLM fundamentals, hallucinations, and why retrieval-augmented generation changes everything

3 hours2 labsFree

Start here

Learning objectives

  • Understand how LLMs work at a high level
  • Learn about tokens, context windows, and their limitations
  • Understand why LLMs hallucinate and how RAG solves it
  • Compare vanilla LLM vs RAG responses
VANILLA LLM vs RAGWithout RAGUser QueryLLMtraining data onlyHallucinated answeroutdated, inaccurate, genericWith RAGQueryRetrievevector DBLLMGrounded answeraccurate, cited, domain-specific

Large Language Models generate text by predicting the next token. They are remarkably capable but fundamentally limited: they can only draw from their training data, which is static, potentially outdated, and lacks your domain-specific knowledge.

Retrieval-Augmented Generation (RAG) solves this by retrieving relevant documents at query time and injecting them into the prompt. The model reads your data before answering — grounding its response in facts rather than patterns.

How LLMs Work (The 5-Minute Version)

An LLM is a neural network trained on massive text corpora. Given a sequence of tokens, it predicts the most likely next token. It does not "know" things — it has learned statistical patterns of how words relate. This is why it can write fluent text but also confidently state falsehoods.

Tokens and Context Windows

Text is split into tokens (roughly 3-4 characters each). Each model has a context window — the maximum tokens it can process in one request. GPT-4o has 128K tokens. Claude has 200K. Everything you send (system prompt, conversation history, retrieved documents, your question) must fit within this window.

Why LLMs Hallucinate

  • Knowledge cutoff: Training data is frozen at a point in time
  • No access to private data: The model does not know about your company docs
  • Statistical generation: It generates plausible-sounding text, not verified facts
  • No source grounding: Without retrieval, there is nothing to cite

What RAG Changes

RAG adds a retrieval step before generation. When a user asks a question, the system searches a vector database for relevant document chunks, injects them into the prompt as context, and the model generates an answer grounded in those documents. The result: accurate, citable, domain-specific answers instead of hallucinated guesses.

Types of RAG Systems

  • Naive RAG: Simple retrieve-and-generate. Adequate for demos, fragile in production.
  • Advanced RAG: Hybrid search, reranking, query transformation. Production-viable.
  • Agentic RAG: AI agents that decide what to retrieve, when, and how. Multi-step reasoning.

Real world

Where this shows up

  • Customer support bots answering from product documentation
  • Legal AI searching case law and contracts
  • Medical AI retrieving clinical guidelines
  • Code assistants searching internal repositories
  • Enterprise search across thousands of documents

Common mistakes

What usually breaks

  • Building a chatbot without RAG and hoping the LLM knows your domain
  • Stuffing the entire document into the prompt instead of retrieving relevant chunks
  • Ignoring context window limits — overfilling the prompt degrades quality
  • Not evaluating retrieval quality — bad retrieval means bad answers regardless of the model

Key terms

Vocabulary used in this module

LLM

Large Language Model — neural network trained on text to predict tokens

RAG

Retrieval-Augmented Generation — retrieve relevant docs before generating

Hallucination

When an LLM generates plausible but factually incorrect information

Context Window

Maximum tokens an LLM can process in one request

Token

Smallest unit of text processed by an LLM (~3-4 characters)

Labs

Hands-on labs

20 minBeginner

Run Your First LLM Application

Build a simple Python app that calls an LLM API.

  1. Install the Anthropic Python SDK
  2. Send a basic prompt to Claude
  3. Observe the response and token usage
  4. Ask a question about recent events and observe hallucination
View lab on GitHub
25 minBeginner

Compare Vanilla LLM vs RAG

See the difference RAG makes on answer quality.

  1. Ask the LLM a domain-specific question (without context)
  2. Provide the same question with relevant document context
  3. Compare accuracy, citations, and confidence
  4. Discuss when RAG is necessary vs when vanilla LLM suffices
View lab on GitHub

Recap

Key takeaways

  • LLMs predict tokens based on training data — they do not know facts
  • Hallucinations happen because the model generates plausible text without verification
  • RAG retrieves relevant documents and injects them into the prompt before generation
  • Context windows limit how much data you can include — retrieval selects the most relevant
  • Three RAG levels: naive (demo), advanced (production), agentic (autonomous)

Related resources

Keep learning across CodersSecret