Module 4 of 16

Staging Models

Clean source data gently: rename, cast, standardize, and expose a stable base layer.

100 minutes1 exercisesFree

Watch as Slides Course overviewInline lab below

Start here

Learning objectives

Build staging models that stay close to the source
Apply safe renaming and type casting
Avoid burying business logic too early

The Mental Model

Staging models are the clean mirror of raw sources. They should make data easier to use without making heavy business decisions.

A staging model is like rewriting messy notes into clean handwriting. You are not changing the story yet; you are making it readable.

Tiny Example

We will use a small ecommerce dataset throughout the course. Think of these as the only tables in your first warehouse:

Table	Grain	Example columns
`raw_orders`	one row per order event	`order_id`, `customer_id`, `amount`, `status`, `created_at`
`raw_order_items`	one row per item inside an order	`order_id`, `product_id`, `quantity`, `item_price`
`raw_customers`	one row per customer	`customer_id`, `email`, `country`, `created_at`

Interactive Check

Question: Should a staging model calculate lifetime customer value?

Reveal the answer

No. That is business logic across many events and belongs later. Staging should focus on source cleanup: names, types, null handling, and basic standardization.

Inline Practice Lab

This lab is intentionally small. You can solve it by reading the table, writing the SQL/YAML mentally, or pasting the snippet into any SQL scratchpad later.

-- Example starter table
select
  order_id,
  customer_id,
  amount,
  status,
  created_at
from raw_orders;

The goal is not tooling setup. The goal is learning the production habit: state the grain, clean one thing, test one assumption, and explain the downstream impact.

Self-Check Quiz

What is the grain of the table you are building?
Which downstream metric or dashboard would be wrong if this model broke?
What test would catch the most likely beginner mistake here?

Real world

Where this shows up

Reliable executive dashboards that do not disagree across teams
AI analytics agents that query governed metrics instead of guessing SQL
Auditable metric changes where owners can see downstream impact before merge

Production notes

Keep these close

Use one staging model per source table. It gives every raw table one official cleaned interface.

Common mistakes

What usually breaks

Joining multiple sources in staging
Adding metrics to staging models
Leaving cryptic source column names unchanged

Think like an engineer

Questions to answer before shipping

Can you explain the grain of this model in one sentence?
What breaks downstream if this field becomes null tomorrow?
Where should this logic live so it is reused instead of copied?

Key terms

Vocabulary used in this module

Staging model

A dbt model that cleans and standardizes one raw source table.

Source

An upstream table that dbt reads but does not create.

Exercises

Practice inside the lesson

30-45 minutesBeginner

Fix stg_orders

Turn a messy raw_orders table into a clean staging model.

Rename id to order_id
Cast created_at to a timestamp
Standardize status values to lowercase
Keep source-level fields only
Write the model grain in one sentence

Recap

Key takeaways

Staging models are stable cleaned source interfaces
Keep business logic out of staging unless it is source-specific cleanup
Good staging makes every downstream model simpler

Related resources

Staging Models

Learning objectives

The Mental Model

Tiny Example

Interactive Check

Inline Practice Lab

Self-Check Quiz

Where this shows up

Keep these close

What usually breaks

Questions to answer before shipping

Vocabulary used in this module

Staging model

Source

Practice inside the lesson

Fix stg_orders

Key takeaways

Keep learning across CodersSecret

Related guides

Cheatsheets

Interactive labs

Glossary terms