Module 11 of 16

AI Observability Engineering

LLM tracing, token monitoring, cost tracking, and production AI telemetry

3 hours2 labsFree

Start here

Learning objectives

  • Instrument RAG systems with OpenTelemetry
  • Trace requests through the full RAG pipeline
  • Monitor token usage and LLM costs
  • Build AI-specific observability dashboards
AI OBSERVABILITY PIPELINERAG RequestOTel Tracesspans per stepToken Metricsinput/output/costQuality Scoresretrieval + genGrafanaAlertsTrace every request: embed latency → retrieval latency → LLM latency → total costAlert on: cost spikes, latency degradation, quality drops, error rates

Production AI without observability is like driving blindfolded. You need to see: how long each pipeline step takes, how many tokens each request consumes, how much each request costs, and whether quality is holding up over time.

LLM Tracing

Trace every RAG request through: embed query (5ms) → vector search (10ms) → context assembly (1ms) → LLM call (2000ms) → total (2016ms). This tells you WHERE time is spent and WHERE to optimize.

Token Monitoring

# Track token usage per request
def monitored_rag(question: str) -> dict:
    response = claude.messages.create(...)
    metrics = {
        "input_tokens": response.usage.input_tokens,
        "output_tokens": response.usage.output_tokens,
        "cost_usd": (response.usage.input_tokens * 0.003 + response.usage.output_tokens * 0.015) / 1000,
        "model": "claude-sonnet-4-6",
    }
    prometheus_counter.inc(metrics["cost_usd"])
    return {"answer": ..., "metrics": metrics}

Cost Monitoring

LLM API costs are the largest expense in RAG. Track cost per request, per tenant, per day. Set budgets and alerts. A runaway agent or a suddenly popular query can blow through your monthly budget in hours.

Key terms

Vocabulary used in this module

LLM Tracing

Distributed tracing for AI pipeline steps (embed, retrieve, generate)

Token Monitoring

Tracking input/output token counts and costs per request

AI Telemetry

Metrics, traces, and logs specific to AI system performance

Labs

Hands-on labs

30 minIntermediate

Add Tracing to Your RAG Pipeline

Instrument with OpenTelemetry for full request tracing.

  1. Add OpenTelemetry SDK to your RAG service
  2. Create spans for embed, retrieve, generate steps
  3. Export to Jaeger for trace visualization
  4. Identify latency bottlenecks
View lab on GitHub
30 minIntermediate

Build Cost and Quality Dashboards

Monitor token usage, costs, and quality metrics.

  1. Export token metrics to Prometheus
  2. Build Grafana dashboards for cost per request and per tenant
  3. Add quality score tracking over time
  4. Set up alerts for cost spikes and quality drops
View lab on GitHub

Recap

Key takeaways

  • Trace every RAG step: embed, retrieve, generate — know where time is spent
  • Monitor token usage per request — LLM costs are your largest expense
  • Track cost per tenant for multi-tenant systems
  • Quality metrics (retrieval precision, groundedness) should be continuous
  • Alert on cost spikes, latency degradation, and quality drops

Related resources

Keep learning across CodersSecret