Module 11 of 16

AI Observability Engineering

LLM tracing, token monitoring, cost tracking, and production AI telemetry

3 hours2 labsFree

Watch as Slides Course overview Lab code

Start here

Learning objectives

Instrument RAG systems with OpenTelemetry
Trace requests through the full RAG pipeline
Monitor token usage and LLM costs
Build AI-specific observability dashboards

Production AI without observability is like driving blindfolded. You need to see: how long each pipeline step takes, how many tokens each request consumes, how much each request costs, and whether quality is holding up over time.

LLM Tracing

Trace every RAG request through: embed query (5ms) → vector search (10ms) → context assembly (1ms) → LLM call (2000ms) → total (2016ms). This tells you WHERE time is spent and WHERE to optimize.

Token Monitoring

# Track token usage per request
def monitored_rag(question: str) -> dict:
    response = claude.messages.create(...)
    metrics = {
        "input_tokens": response.usage.input_tokens,
        "output_tokens": response.usage.output_tokens,
        "cost_usd": (response.usage.input_tokens * 0.003 + response.usage.output_tokens * 0.015) / 1000,
        "model": "claude-sonnet-4-6",
    }
    prometheus_counter.inc(metrics["cost_usd"])
    return {"answer": ..., "metrics": metrics}

Cost Monitoring

LLM API costs are the largest expense in RAG. Track cost per request, per tenant, per day. Set budgets and alerts. A runaway agent or a suddenly popular query can blow through your monthly budget in hours.

Key terms

Vocabulary used in this module

LLM Tracing

Distributed tracing for AI pipeline steps (embed, retrieve, generate)

Token Monitoring

Tracking input/output token counts and costs per request

AI Telemetry

Metrics, traces, and logs specific to AI system performance

Labs

Hands-on labs

30 minIntermediate

Add Tracing to Your RAG Pipeline

Instrument with OpenTelemetry for full request tracing.

Add OpenTelemetry SDK to your RAG service
Create spans for embed, retrieve, generate steps
Export to Jaeger for trace visualization
Identify latency bottlenecks

View lab on GitHub

30 minIntermediate

Build Cost and Quality Dashboards

Monitor token usage, costs, and quality metrics.

Export token metrics to Prometheus
Build Grafana dashboards for cost per request and per tenant
Add quality score tracking over time
Set up alerts for cost spikes and quality drops

View lab on GitHub

Recap

Key takeaways

Trace every RAG step: embed, retrieve, generate - know where time is spent
Monitor token usage per request - LLM costs are your largest expense
Track cost per tenant for multi-tenant systems
Quality metrics (retrieval precision, groundedness) should be continuous
Alert on cost spikes, latency degradation, and quality drops

Related resources

AI Observability Engineering

Learning objectives

LLM Tracing

Token Monitoring

Cost Monitoring

Vocabulary used in this module

LLM Tracing

Token Monitoring

AI Telemetry

Hands-on labs

Add Tracing to Your RAG Pipeline

Build Cost and Quality Dashboards

Key takeaways

Keep learning across CodersSecret

Related guides

Cheatsheets

Interactive labs

Glossary terms