Module 2: Tables, Grain, and Why Dashboards Lie
Learn the most important beginner concept: one row per what?
90 minutes. 1 inline exercise. Free course module.
Learning Objectives
- Define table grain accurately
- Spot double-counting bugs before they reach dashboards
- Understand facts, dimensions, and event tables
Why This Matters
Grain means what one row represents. Most bad metrics come from joining tables with different grains and then aggregating without noticing the duplication.
Lesson Content
The Mental Model
Grain means what one row represents. Most bad metrics come from joining tables with different grains and then aggregating without noticing the duplication.
Before writing any SQL, ask: one row per what? If you cannot answer, you are not ready to aggregate.
Tiny Example
We will use a small ecommerce dataset throughout the course. Think of these as the only tables in your first warehouse:
| Table | Grain | Example columns |
|---|---|---|
raw_orders | one row per order event | order_id, customer_id, amount, status, created_at |
raw_order_items | one row per item inside an order | order_id, product_id, quantity, item_price |
raw_customers | one row per customer | customer_id, email, country, created_at |
Interactive Check
Question: You join orders to order_items and then sum order amount. Why might revenue become too high?
Reveal the answer
Each order can have many items. The order amount repeats once per item after the join, so summing it counts the same order multiple times.
Inline Practice Lab
This lab is intentionally small. You can solve it by reading the table, writing the SQL/YAML mentally, or pasting the snippet into any SQL scratchpad later.
-- Example starter table
select
order_id,
customer_id,
amount,
status,
created_at
from raw_orders;
The goal is not tooling setup. The goal is learning the production habit: state the grain, clean one thing, test one assumption, and explain the downstream impact.
Self-Check Quiz
- What is the grain of the table you are building?
- Which downstream metric or dashboard would be wrong if this model broke?
- What test would catch the most likely beginner mistake here?
Real-World Use Cases
- Reliable executive dashboards that do not disagree across teams
- AI analytics agents that query governed metrics instead of guessing SQL
- Auditable metric changes where owners can see downstream impact before merge
Production Notes
- Add model descriptions that start with grain: "One row per..." This prevents many review mistakes.
Common Mistakes
- Summing order-level values after item-level joins
- Assuming unique IDs without testing them
- Mixing event time and reporting time without naming the difference
Think Like an Engineer
- Can you explain the grain of this model in one sentence?
- What breaks downstream if this field becomes null tomorrow?
- Where should this logic live so it is reused instead of copied?
Career Relevance
Analytics engineering is the bridge between SQL skill and production data ownership. Freshers who learn tests, lineage, metrics, and semantic modeling early stand out because they can reason about trust, not just queries.
Key Terms
- Grain
- The real-world entity or event represented by one row.
- Fact table
- A table containing measurable business events such as orders or payments.
Inline Exercises
-
Find the Grain
Identify the grain of five sample tables and decide whether each can be safely joined before aggregation.
30-45 minutes - Beginner
- Label raw_orders as one row per order
- Label raw_order_items as one row per order item
- Label raw_customers as one row per customer
- Explain why orders to order_items is one-to-many
- Write the safe aggregation rule
Inline lab: complete the exercise directly in the course page.
Key Takeaways
- Always state grain before aggregating
- One-to-many joins are the main source of dashboard lies
- Facts and dimensions are useful because they make grain explicit