Module 13: Lineage with dbt Artifacts

Trace impact from source columns to models, metrics, dashboards, and AI answers.

120 minutes. 1 inline exercise. Free course module.

Learning Objectives

  • Explain table, column, metric, and operational lineage
  • Know what dbt manifest, run_results, and catalog artifacts contain
  • Use lineage to reason about blast radius

Why This Matters

Lineage is the map of how data flows. It helps you debug wrong numbers, assess change impact, and explain how a metric was produced.

Lineage with dbt Artifacts Follow the arrows. Each box is one idea you will practice in this module. Source step 1 Model step 2 Column step 3 Metric step 4 Dashboard step 5 Production analytics engineering turns raw records into governed, trusted business meaning.
Architecture diagram for Module 13: Lineage with dbt Artifacts.

Lesson Content

The Mental Model

Lineage is the map of how data flows. It helps you debug wrong numbers, assess change impact, and explain how a metric was produced.

Lineage is like a family tree for data. If one parent changes, you can see which children may be affected.

Tiny Example

We will use a small ecommerce dataset throughout the course. Think of these as the only tables in your first warehouse:

TableGrainExample columns
raw_ordersone row per order eventorder_id, customer_id, amount, status, created_at
raw_order_itemsone row per item inside an orderorder_id, product_id, quantity, item_price
raw_customersone row per customercustomer_id, email, country, created_at

Interactive Check

Question: raw_orders.amount changes from dollars to cents. Which downstream objects might be impacted?

Reveal the answer

Any staging model using amount, any fact table deriving revenue, any revenue metric, and all dashboards or AI tools consuming that metric.

Inline Practice Lab

This lab is intentionally small. You can solve it by reading the table, writing the SQL/YAML mentally, or pasting the snippet into any SQL scratchpad later.

-- Example starter table
select
  order_id,
  customer_id,
  amount,
  status,
  created_at
from raw_orders;

The goal is not tooling setup. The goal is learning the production habit: state the grain, clean one thing, test one assumption, and explain the downstream impact.

Self-Check Quiz

  1. What is the grain of the table you are building?
  2. Which downstream metric or dashboard would be wrong if this model broke?
  3. What test would catch the most likely beginner mistake here?

Real-World Use Cases

  • Reliable executive dashboards that do not disagree across teams
  • AI analytics agents that query governed metrics instead of guessing SQL
  • Auditable metric changes where owners can see downstream impact before merge

Production Notes

  • Use lineage during code review. Ask "what downstream object changes if this column changes meaning?" before merge.

Common Mistakes

  • Treating lineage as a pretty graph only
  • Ignoring dashboards and metrics as lineage endpoints
  • Not capturing run status and freshness alongside structural lineage

Think Like an Engineer

  • Can you explain the grain of this model in one sentence?
  • What breaks downstream if this field becomes null tomorrow?
  • Where should this logic live so it is reused instead of copied?

Career Relevance

Analytics engineering is the bridge between SQL skill and production data ownership. Freshers who learn tests, lineage, metrics, and semantic modeling early stand out because they can reason about trust, not just queries.

Key Terms

Lineage
Metadata describing how data flows from upstream inputs to downstream outputs.
Manifest
A dbt artifact containing project graph and resource metadata.

Inline Exercises

  1. Trace the Blast Radius

    Follow one changed source column through models, metrics, and consumers.

    30-45 minutes - Intermediate

    • Start at raw_orders.amount
    • Map it to stg_orders.order_amount
    • Map it to fct_orders.gross_revenue
    • Map it to net_revenue
    • List impacted dashboards and owners

    Inline lab: complete the exercise directly in the course page.

Key Takeaways

  • Lineage makes data changes safer
  • dbt artifacts already contain useful dependency metadata
  • Column and metric lineage are more useful than table lineage alone