Module 7: Testing and Data Quality

Use tests to catch broken assumptions before users lose trust.

110 minutes. 1 inline exercise. Free course module.

Learning Objectives

  • Use not_null, unique, relationships, and accepted_values tests
  • Write testable assumptions in model YAML
  • Connect data quality to user trust

Why This Matters

Data tests are executable assumptions. They do not prove data is perfect, but they catch the breakages you know would make the model unsafe.

Testing and Data Quality Follow the arrows. Each box is one idea you will practice in this module. Assumption step 1 Test step 2 Fail step 3 Fix step 4 Trust step 5 Production analytics engineering turns raw records into governed, trusted business meaning.
Architecture diagram for Module 7: Testing and Data Quality.

Lesson Content

The Mental Model

Data tests are executable assumptions. They do not prove data is perfect, but they catch the breakages you know would make the model unsafe.

A test is a smoke alarm. It does not stop every fire, but it tells you when a known danger is happening.

Tiny Example

We will use a small ecommerce dataset throughout the course. Think of these as the only tables in your first warehouse:

TableGrainExample columns
raw_ordersone row per order eventorder_id, customer_id, amount, status, created_at
raw_order_itemsone row per item inside an orderorder_id, product_id, quantity, item_price
raw_customersone row per customercustomer_id, email, country, created_at

Interactive Check

Question: Which test should protect customer_id in dim_customers?

Reveal the answer

Use unique and not_null. A customer dimension needs exactly one non-null row per customer_id.

Inline Practice Lab

This lab is intentionally small. You can solve it by reading the table, writing the SQL/YAML mentally, or pasting the snippet into any SQL scratchpad later.

-- Example starter table
select
  order_id,
  customer_id,
  amount,
  status,
  created_at
from raw_orders;

The goal is not tooling setup. The goal is learning the production habit: state the grain, clean one thing, test one assumption, and explain the downstream impact.

Self-Check Quiz

  1. What is the grain of the table you are building?
  2. Which downstream metric or dashboard would be wrong if this model broke?
  3. What test would catch the most likely beginner mistake here?

Real-World Use Cases

  • Reliable executive dashboards that do not disagree across teams
  • AI analytics agents that query governed metrics instead of guessing SQL
  • Auditable metric changes where owners can see downstream impact before merge

Production Notes

  • Every model should have at least one grain-protecting test. For facts, test the event key. For dimensions, test the entity key.

Common Mistakes

  • Testing only columns that already look clean
  • Adding hundreds of noisy tests nobody investigates
  • Treating warnings and failures without a clear policy

Think Like an Engineer

  • Can you explain the grain of this model in one sentence?
  • What breaks downstream if this field becomes null tomorrow?
  • Where should this logic live so it is reused instead of copied?

Career Relevance

Analytics engineering is the bridge between SQL skill and production data ownership. Freshers who learn tests, lineage, metrics, and semantic modeling early stand out because they can reason about trust, not just queries.

Key Terms

Data test
A check that validates an expected property of a dataset.
Relationship test
A test that checks whether foreign key values exist in a referenced table.

Inline Exercises

  1. Add the First Tests

    Add basic dbt-style tests to fct_orders and dim_customers.

    30-45 minutes - Beginner to Intermediate

    • Mark customer_id in dim_customers as unique and not_null
    • Mark order_id in fct_orders as unique and not_null
    • Add accepted_values for order_status
    • Add a relationship test from fct_orders.customer_id to dim_customers.customer_id

    Inline lab: complete the exercise directly in the course page.

Key Takeaways

  • Tests turn assumptions into automated checks
  • Basic tests catch many expensive dashboard failures
  • Quality is an engineering workflow, not a cleanup sprint