Skip to main content

Module 11: AI Observability Engineering

LLM tracing, token monitoring, cost tracking, and production AI telemetry

3 hours. 2 hands-on labs. Free course module.

Learning Objectives

  • Instrument RAG systems with OpenTelemetry
  • Trace requests through the full RAG pipeline
  • Monitor token usage and LLM costs
  • Build AI-specific observability dashboards

Why This Matters

AI systems are expensive to run and hard to debug without observability. A single misconfigured query expansion can 10x your token costs. A model update can silently degrade quality. Observability catches these before users do.

AI OBSERVABILITY PIPELINERAG RequestOTel Tracesspans per stepToken Metricsinput/output/costQuality Scoresretrieval + genGrafanaAlertsTrace every request: embed latency → retrieval latency → LLM latency → total costAlert on: cost spikes, latency degradation, quality drops, error rates
Architecture diagram for Module 11: AI Observability Engineering.

Lesson Content

Production AI without observability is like driving blindfolded. You need to see: how long each pipeline step takes, how many tokens each request consumes, how much each request costs, and whether quality is holding up over time.

LLM Tracing

Trace every RAG request through: embed query (5ms) → vector search (10ms) → context assembly (1ms) → LLM call (2000ms) → total (2016ms). This tells you WHERE time is spent and WHERE to optimize.

Token Monitoring

# Track token usage per request
def monitored_rag(question: str) -> dict:
    response = claude.messages.create(...)
    metrics = {
        "input_tokens": response.usage.input_tokens,
        "output_tokens": response.usage.output_tokens,
        "cost_usd": (response.usage.input_tokens * 0.003 + response.usage.output_tokens * 0.015) / 1000,
        "model": "claude-sonnet-4-6",
    }
    prometheus_counter.inc(metrics["cost_usd"])
    return {"answer": ..., "metrics": metrics}

Cost Monitoring

LLM API costs are the largest expense in RAG. Track cost per request, per tenant, per day. Set budgets and alerts. A runaway agent or a suddenly popular query can blow through your monthly budget in hours.

Key Terms

LLM Tracing
Distributed tracing for AI pipeline steps (embed, retrieve, generate)
Token Monitoring
Tracking input/output token counts and costs per request
AI Telemetry
Metrics, traces, and logs specific to AI system performance

Hands-On Labs

  1. Add Tracing to Your RAG Pipeline

    Instrument with OpenTelemetry for full request tracing.

    30 min - Intermediate

    • Add OpenTelemetry SDK to your RAG service
    • Create spans for embed, retrieve, generate steps
    • Export to Jaeger for trace visualization
    • Identify latency bottlenecks

    View lab files on GitHub

  2. Build Cost and Quality Dashboards

    Monitor token usage, costs, and quality metrics.

    30 min - Intermediate

    • Export token metrics to Prometheus
    • Build Grafana dashboards for cost per request and per tenant
    • Add quality score tracking over time
    • Set up alerts for cost spikes and quality drops

    View lab files on GitHub

Key Takeaways

  • Trace every RAG step: embed, retrieve, generate — know where time is spent
  • Monitor token usage per request — LLM costs are your largest expense
  • Track cost per tenant for multi-tenant systems
  • Quality metrics (retrieval precision, groundedness) should be continuous
  • Alert on cost spikes, latency degradation, and quality drops