Basic RAG retrieves and generates. Agentic RAG reasons about WHAT to retrieve, WHEN to use tools, and HOW to break complex questions into steps. The AI agent decides the retrieval strategy dynamically based on the query.
From Pipeline to Agent
In basic RAG, the pipeline is fixed: embed → retrieve → generate. In agentic RAG, the LLM decides: should I search the knowledge base? Query the database? Search the web? Calculate something? The agent orchestrates tools based on reasoning.
Tool Calling
tools = [
{"name": "search_docs", "description": "Search internal documentation"},
{"name": "query_database", "description": "Run SQL query on the product database"},
{"name": "web_search", "description": "Search the internet for recent information"},
]
# The agent decides which tool to use based on the question
# "What is our refund policy?" → search_docs
# "How many orders last month?" → query_database
# "What is the latest Python version?" → web_search
Multi-Agent Architectures
Complex tasks are split across specialized agents: a Research Agent retrieves information, an Analysis Agent processes data, a Writing Agent generates the final response. LangGraph orchestrates the flow between agents.
Memory Systems
- Short-term memory: Conversation history within a session
- Long-term memory: Persistent storage of user preferences, past interactions
- Working memory: Intermediate results during multi-step reasoning