Module 11: AI Observability Engineering
LLM tracing, token monitoring, cost tracking, and production AI telemetry
3 hours. 2 hands-on labs. Free course module.
Learning Objectives
- Instrument RAG systems with OpenTelemetry
- Trace requests through the full RAG pipeline
- Monitor token usage and LLM costs
- Build AI-specific observability dashboards
Why This Matters
AI systems are expensive to run and hard to debug without observability. A single misconfigured query expansion can 10x your token costs. A model update can silently degrade quality. Observability catches these before users do.
Lesson Content
Production AI without observability is like driving blindfolded. You need to see: how long each pipeline step takes, how many tokens each request consumes, how much each request costs, and whether quality is holding up over time.
LLM Tracing
Trace every RAG request through: embed query (5ms) → vector search (10ms) → context assembly (1ms) → LLM call (2000ms) → total (2016ms). This tells you WHERE time is spent and WHERE to optimize.
Token Monitoring
# Track token usage per request
def monitored_rag(question: str) -> dict:
response = claude.messages.create(...)
metrics = {
"input_tokens": response.usage.input_tokens,
"output_tokens": response.usage.output_tokens,
"cost_usd": (response.usage.input_tokens * 0.003 + response.usage.output_tokens * 0.015) / 1000,
"model": "claude-sonnet-4-6",
}
prometheus_counter.inc(metrics["cost_usd"])
return {"answer": ..., "metrics": metrics}
Cost Monitoring
LLM API costs are the largest expense in RAG. Track cost per request, per tenant, per day. Set budgets and alerts. A runaway agent or a suddenly popular query can blow through your monthly budget in hours.
Key Terms
- LLM Tracing
- Distributed tracing for AI pipeline steps (embed, retrieve, generate)
- Token Monitoring
- Tracking input/output token counts and costs per request
- AI Telemetry
- Metrics, traces, and logs specific to AI system performance
Hands-On Labs
-
Add Tracing to Your RAG Pipeline
Instrument with OpenTelemetry for full request tracing.
30 min - Intermediate
- Add OpenTelemetry SDK to your RAG service
- Create spans for embed, retrieve, generate steps
- Export to Jaeger for trace visualization
- Identify latency bottlenecks
-
Build Cost and Quality Dashboards
Monitor token usage, costs, and quality metrics.
30 min - Intermediate
- Export token metrics to Prometheus
- Build Grafana dashboards for cost per request and per tenant
- Add quality score tracking over time
- Set up alerts for cost spikes and quality drops
Key Takeaways
- Trace every RAG step: embed, retrieve, generate — know where time is spent
- Monitor token usage per request — LLM costs are your largest expense
- Track cost per tenant for multi-tenant systems
- Quality metrics (retrieval precision, groundedness) should be continuous
- Alert on cost spikes, latency degradation, and quality drops