Module 14 of 16

Data Incidents and Debugging

Debug wrong revenue, stale data, broken joins, and schema drift like an engineer.

110 minutes1 exercisesFree

Watch as Slides Course overviewInline lab below

Start here

Learning objectives

Classify common data incidents
Use tests and lineage during debugging
Write a useful data incident review

The Mental Model

Data incidents are production incidents. A wrong dashboard can be as damaging as a down API when leaders use it to make decisions.

When a number is wrong, do not randomly edit SQL. Scope the issue, trace upstream, find the first bad layer, fix it, and write what prevented detection.

Tiny Example

We will use a small ecommerce dataset throughout the course. Think of these as the only tables in your first warehouse:

Table	Grain	Example columns
`raw_orders`	one row per order event	`order_id`, `customer_id`, `amount`, `status`, `created_at`
`raw_order_items`	one row per item inside an order	`order_id`, `product_id`, `quantity`, `item_price`
`raw_customers`	one row per customer	`customer_id`, `email`, `country`, `created_at`

Interactive Check

Question: Revenue drops 40% but order count is normal. What should you check first?

Reveal the answer

Check payment/refund amount logic, currency/unit conversion, filters on successful orders, and recent changes in models feeding the revenue metric.

Inline Practice Lab

This lab is intentionally small. You can solve it by reading the table, writing the SQL/YAML mentally, or pasting the snippet into any SQL scratchpad later.

-- Example starter table
select
  order_id,
  customer_id,
  amount,
  status,
  created_at
from raw_orders;

The goal is not tooling setup. The goal is learning the production habit: state the grain, clean one thing, test one assumption, and explain the downstream impact.

Self-Check Quiz

What is the grain of the table you are building?
Which downstream metric or dashboard would be wrong if this model broke?
What test would catch the most likely beginner mistake here?

Real world

Where this shows up

Reliable executive dashboards that do not disagree across teams
AI analytics agents that query governed metrics instead of guessing SQL
Auditable metric changes where owners can see downstream impact before merge

Production notes

Keep these close

Maintain a data incident template: symptom, impact, first bad layer, detection gap, fix, prevention.

Common mistakes

What usually breaks

Fixing the dashboard instead of the model
Skipping incident review after numbers recover
Not notifying metric owners and consumers

Think like an engineer

Questions to answer before shipping

Can you explain the grain of this model in one sentence?
What breaks downstream if this field becomes null tomorrow?
Where should this logic live so it is reused instead of copied?

Key terms

Vocabulary used in this module

Data incident

A reliability event where data is wrong, late, incomplete, or misleading.

Blast radius

The set of downstream users, models, or metrics affected by a change or failure.

Exercises

Practice inside the lesson

30-45 minutesIntermediate

Debug a Wrong Metric

Use a fake incident timeline to identify the most likely failing model.

Read the symptoms
List affected metrics
Trace upstream models
Pick the first layer where values diverge
Write one test that would have caught it

Recap

Key takeaways

Data debugging needs scope, lineage, and tests
Incidents should produce prevention work
Wrong data is a reliability problem

Related resources

Data Incidents and Debugging

Learning objectives

The Mental Model

Tiny Example

Interactive Check

Inline Practice Lab

Self-Check Quiz

Where this shows up

Keep these close

What usually breaks

Questions to answer before shipping

Vocabulary used in this module

Data incident

Blast radius

Practice inside the lesson

Debug a Wrong Metric

Key takeaways

Keep learning across CodersSecret

Related guides

Cheatsheets

Interactive labs

Glossary terms