You added an index. The query is still slow. Sound familiar? Most developers treat indexes like magic — add one and hope for the best. But indexes are data structures with specific rules, and violating those rules means your “indexed” query is still doing a full table scan.
This guide covers what your senior dev never explained: how indexes actually work internally, why column order in composite indexes matters more than you think, and how to read EXPLAIN ANALYZE output like a database engineer.
How Databases Find Your Data
When you run a query, the database has two fundamental strategies:
- Sequential Scan: Read every single row in the table and check if it matches your WHERE clause. Simple, but devastating on large tables.
- Index Scan: Use a pre-built data structure to jump directly to matching rows. Like using a book index instead of reading every page.
Here is the difference in practice:
-- Table: orders (10 million rows)
-- No index on customer_id
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
-- Seq Scan on orders (cost=0.00..185432.00 rows=52 width=96)
-- Filter: (customer_id = 42)
-- Rows Removed by Filter: 9999948
-- Planning Time: 0.085 ms
-- Execution Time: 1247.531 ms ← Over 1 second!
-- Now with an index:
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
-- Index Scan using idx_orders_customer_id on orders
-- Index Cond: (customer_id = 42)
-- Planning Time: 0.092 ms
-- Execution Time: 0.128 ms ← 10,000x faster!
The sequential scan examined 10 million rows to find 52 matches. The index scan went straight to those 52 rows. That is the power of proper indexing.
B-Tree Indexes: The Workhorse
B-tree (balanced tree) is the default index type in PostgreSQL, MySQL, and virtually every relational database. Understanding its structure explains most indexing behavior.
A B-tree is a sorted, self-balancing tree structure:
[50, 100] ← Root node
/ | \
[20,35] [60,80] [120,150] ← Internal nodes
/ | \ / | \ / | \
... ... ... ... ... ... ... ← Leaf nodes (actual row pointers)
- Root and internal nodes act as signposts, directing the search left or right
- Leaf nodes contain the indexed values and pointers (ctid) to the actual table rows
- All leaf nodes are linked in a doubly-linked list for efficient range scans
- Tree depth is typically 3-4 levels even for millions of rows
This means finding any single value requires only 3-4 disk reads, regardless of table size. A table with 100 rows and a table with 100 million rows both need roughly the same number of index lookups.
What B-Trees Are Great At
-- Equality: direct lookup
WHERE email = 'user@example.com'
-- Range: walk the linked leaf nodes
WHERE created_at BETWEEN '2026-01-01' AND '2026-03-31'
-- Sorting: leaf nodes are already sorted
ORDER BY created_at DESC LIMIT 20
-- Prefix matching
WHERE name LIKE 'John%' -- Uses index
WHERE name LIKE '%John' -- Cannot use index (no prefix)
Hash Indexes: The Specialist
Hash indexes use a hash function to map values to buckets. They are faster than B-trees for exact equality lookups but useless for everything else.
CREATE INDEX idx_users_email_hash ON users USING hash(email);
-- Uses the hash index (equality only):
WHERE email = 'user@example.com' -- Yes
-- Cannot use hash index:
WHERE email LIKE 'user%' -- No (pattern)
WHERE email > 'a' AND email < 'm' -- No (range)
ORDER BY email -- No (sorting)
When to use hash indexes: Only when you exclusively do exact-match lookups on high-cardinality columns (like UUIDs or email addresses) and never need range queries or sorting. In practice, B-tree is almost always the better choice because the performance difference is marginal and B-trees are far more versatile.
Composite Indexes: Column Order Is Everything
A composite index indexes multiple columns together. The column order determines which queries can use the index. This is the leftmost prefix rule — the most misunderstood concept in database indexing.
CREATE INDEX idx_orders_composite ON orders(customer_id, status, created_at);
This single index can satisfy these queries:
-- Uses index (all three columns, left to right):
WHERE customer_id = 42 AND status = 'shipped' AND created_at > '2026-01-01'
-- Uses index (first two columns):
WHERE customer_id = 42 AND status = 'shipped'
-- Uses index (first column only):
WHERE customer_id = 42
-- CANNOT use index (skips first column):
WHERE status = 'shipped' -- Skips customer_id!
-- CANNOT use index (skips middle column):
WHERE customer_id = 42 AND created_at > '2026-01-01' -- Only uses customer_id part
Think of it like a phone book sorted by last name, then first name, then city. You can look up everyone named “Smith” (last name). You can look up “Smith, John” (last + first). But you cannot efficiently look up everyone named “John” without a last name — the book is not sorted that way.
Ordering Strategy
Put columns in this order:
- Equality conditions first (WHERE status = 'active')
- Range conditions last (WHERE created_at > '2026-01-01')
- High-selectivity columns before low-selectivity (customer_id before status)
Covering Indexes: Skip the Table Entirely
Normally, an index scan finds the matching row pointers, then fetches the actual rows from the table (a “heap fetch”). A covering index includes all the columns your query needs, eliminating the heap fetch entirely.
-- Query that needs customer_id, status, and total_amount:
SELECT customer_id, status, total_amount
FROM orders
WHERE customer_id = 42 AND status = 'shipped';
-- Regular index: finds rows, then fetches total_amount from table
CREATE INDEX idx_orders_cust_status ON orders(customer_id, status);
-- Covering index: includes total_amount, no table fetch needed
CREATE INDEX idx_orders_covering ON orders(customer_id, status)
INCLUDE (total_amount);
In EXPLAIN output, a covering index shows “Index Only Scan” instead of “Index Scan” — this is significantly faster because it avoids random I/O to the heap.
EXPLAIN ANALYZE SELECT customer_id, status, total_amount
FROM orders WHERE customer_id = 42 AND status = 'shipped';
-- Index Only Scan using idx_orders_covering on orders
-- Index Cond: ((customer_id = 42) AND (status = 'shipped'))
-- Heap Fetches: 0 ← Zero table access!
-- Execution Time: 0.045 ms
Partial Indexes: Index Only What Matters
Why index 10 million rows when your query only ever looks at 50,000 of them? A partial index includes a WHERE clause that limits which rows are indexed.
-- Only 2% of orders are 'pending', but you query them constantly
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';
-- This index is 50x smaller than a full index
-- Faster to scan, faster to maintain, less disk space
-- Uses the partial index:
SELECT * FROM orders WHERE status = 'pending' ORDER BY created_at;
-- Cannot use it:
SELECT * FROM orders WHERE status = 'shipped' ORDER BY created_at;
Partial indexes are perfect for: active/inactive flags, unprocessed queues, soft-deleted records, and any column where you only query a small subset of values.
Index Bloat and Maintenance
PostgreSQL uses MVCC (Multi-Version Concurrency Control), which means UPDATE and DELETE operations leave dead tuples in indexes. Over time, indexes bloat — they grow larger without holding more useful data.
-- Check index bloat using pg_stat_user_indexes:
SELECT
schemaname || '.' || relname AS table,
indexrelname AS index,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan AS times_used,
idx_tup_read AS rows_read
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 10;
If an index is large but idx_scan is zero, you are paying storage and write overhead for an index nobody uses. Drop it.
-- Rebuild a bloated index (locks the table briefly):
REINDEX INDEX idx_orders_customer_id;
-- Non-blocking rebuild (PostgreSQL 12+):
REINDEX INDEX CONCURRENTLY idx_orders_customer_id;
Common Anti-Patterns
1. Functions on Indexed Columns
-- This CANNOT use an index on created_at:
WHERE YEAR(created_at) = 2026
WHERE LOWER(email) = 'user@example.com'
WHERE amount * 1.1 > 100
-- Fix: rewrite to keep the column bare
WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01'
-- Or create an expression index:
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
2. Indexing Low-Cardinality Columns
-- Bad: boolean column with only 2 possible values
CREATE INDEX idx_users_active ON users(is_active);
-- The index doesn't help much: 50% of rows match each value
-- The planner will choose a sequential scan anyway
-- Better: partial index if you only query one side
CREATE INDEX idx_users_active ON users(created_at) WHERE is_active = true;
3. Over-Indexing
Every index slows down INSERT, UPDATE, and DELETE operations because the database must update every affected index. A table with 15 indexes means every single write operation triggers 15 index updates. Only create indexes that serve actual query patterns.
4. Implicit Type Casting
-- Column user_id is INTEGER, but you pass a string:
WHERE user_id = '42'
-- PostgreSQL may cast and still use the index, but MySQL often will not.
-- Always match types exactly.
EXPLAIN ANALYZE: Reading the Execution Plan
Here is a real optimization walkthrough. Suppose you have this slow query:
EXPLAIN ANALYZE
SELECT o.id, o.total_amount, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'pending'
AND o.created_at > '2026-01-01'
ORDER BY o.created_at DESC
LIMIT 20;
-- BEFORE optimization:
-- Sort (cost=45123.45..45123.50 rows=20)
-- Sort Key: o.created_at DESC
-- Sort Method: top-N heapsort Memory: 27kB
-- -> Hash Join (cost=1234.56..45000.00 rows=48576)
-- Hash Cond: (o.customer_id = c.id)
-- -> Seq Scan on orders o (cost=0.00..43210.00 rows=48576)
-- Filter: ((status = 'pending') AND (created_at > '2026-01-01'))
-- Rows Removed by Filter: 9951424
-- -> Hash (cost=1000.00..1000.00 rows=50000)
-- -> Seq Scan on customers c
-- Planning Time: 0.234 ms
-- Execution Time: 892.456 ms
The bottleneck is the Seq Scan on orders — examining 10 million rows. Let us fix it:
-- Create a targeted composite index:
CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC)
INCLUDE (total_amount, customer_id);
-- AFTER optimization:
-- Limit (cost=0.56..45.23 rows=20)
-- -> Nested Loop (cost=0.56..109.45 rows=48576)
-- -> Index Only Scan Backward using idx_orders_status_created on orders o
-- Index Cond: ((status = 'pending') AND (created_at > '2026-01-01'))
-- Heap Fetches: 0
-- -> Index Scan using customers_pkey on customers c
-- Index Cond: (id = o.customer_id)
-- Planning Time: 0.187 ms
-- Execution Time: 0.342 ms
From 892ms to 0.3ms — a 2,600x improvement. The key changes: the composite index matches the WHERE + ORDER BY, and the INCLUDE clause makes it a covering index (Index Only Scan, zero heap fetches).
Index Type Cheat Sheet
| Index Type | Best For | Supports Range | Supports Sort | Size |
|---|---|---|---|---|
| B-tree (default) | Almost everything | Yes | Yes | Medium |
| Hash | Equality-only lookups | No | No | Small |
| GIN | Full-text search, arrays, JSONB | No | No | Large |
| GiST | Geospatial, range types, nearest-neighbor | Yes | No | Medium |
| BRIN | Large tables with natural ordering (timestamps) | Yes | No | Tiny |
| Partial | Queries targeting a small subset of rows | Depends on base type | Depends on base type | Tiny |
| Covering | Avoiding heap fetches for known queries | Yes (B-tree) | Yes (B-tree) | Larger |
Key Takeaways
- Always check EXPLAIN ANALYZE before and after adding indexes — do not guess
- Column order in composite indexes matters — equality columns first, range columns last
- Covering indexes eliminate heap fetches — use INCLUDE for frequently selected columns
- Partial indexes save space and speed — index only the rows you actually query
- Audit unused indexes regularly — every index has a write cost
- Never apply functions to indexed columns in WHERE clauses — use expression indexes instead
- B-tree is the right choice 95% of the time — only reach for specialized types when you have a specific need
The difference between a junior and senior database engineer is not knowing that indexes exist — it is knowing which index to create, in what order, with which columns included. Master these fundamentals and you will never fear a slow query again.