You added an index. The query is still slow. Sound familiar? Most developers treat indexes like magic — add one and hope for the best. But indexes are data structures with specific rules, and violating those rules means your “indexed” query is still doing a full table scan.

This guide covers what your senior dev never explained: how indexes actually work internally, why column order in composite indexes matters more than you think, and how to read EXPLAIN ANALYZE output like a database engineer.

How Databases Find Your Data

When you run a query, the database has two fundamental strategies:

  • Sequential Scan: Read every single row in the table and check if it matches your WHERE clause. Simple, but devastating on large tables.
  • Index Scan: Use a pre-built data structure to jump directly to matching rows. Like using a book index instead of reading every page.

Here is the difference in practice:

-- Table: orders (10 million rows)
-- No index on customer_id

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

-- Seq Scan on orders  (cost=0.00..185432.00 rows=52 width=96)
--   Filter: (customer_id = 42)
--   Rows Removed by Filter: 9999948
--   Planning Time: 0.085 ms
--   Execution Time: 1247.531 ms   ← Over 1 second!

-- Now with an index:
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

-- Index Scan using idx_orders_customer_id on orders
--   Index Cond: (customer_id = 42)
--   Planning Time: 0.092 ms
--   Execution Time: 0.128 ms      ← 10,000x faster!

The sequential scan examined 10 million rows to find 52 matches. The index scan went straight to those 52 rows. That is the power of proper indexing.

B-Tree Indexes: The Workhorse

B-tree (balanced tree) is the default index type in PostgreSQL, MySQL, and virtually every relational database. Understanding its structure explains most indexing behavior.

A B-tree is a sorted, self-balancing tree structure:

          [50, 100]              ← Root node
         /    |     \
   [20,35] [60,80] [120,150]    ← Internal nodes
   /  |  \   / | \   /  |  \
  ... ... ... ... ... ... ...   ← Leaf nodes (actual row pointers)
  • Root and internal nodes act as signposts, directing the search left or right
  • Leaf nodes contain the indexed values and pointers (ctid) to the actual table rows
  • All leaf nodes are linked in a doubly-linked list for efficient range scans
  • Tree depth is typically 3-4 levels even for millions of rows

This means finding any single value requires only 3-4 disk reads, regardless of table size. A table with 100 rows and a table with 100 million rows both need roughly the same number of index lookups.

What B-Trees Are Great At

-- Equality: direct lookup
WHERE email = 'user@example.com'

-- Range: walk the linked leaf nodes
WHERE created_at BETWEEN '2026-01-01' AND '2026-03-31'

-- Sorting: leaf nodes are already sorted
ORDER BY created_at DESC LIMIT 20

-- Prefix matching
WHERE name LIKE 'John%'    -- Uses index
WHERE name LIKE '%John'    -- Cannot use index (no prefix)

Hash Indexes: The Specialist

Hash indexes use a hash function to map values to buckets. They are faster than B-trees for exact equality lookups but useless for everything else.

CREATE INDEX idx_users_email_hash ON users USING hash(email);

-- Uses the hash index (equality only):
WHERE email = 'user@example.com'     -- Yes

-- Cannot use hash index:
WHERE email LIKE 'user%'             -- No (pattern)
WHERE email > 'a' AND email < 'm'   -- No (range)
ORDER BY email                       -- No (sorting)

When to use hash indexes: Only when you exclusively do exact-match lookups on high-cardinality columns (like UUIDs or email addresses) and never need range queries or sorting. In practice, B-tree is almost always the better choice because the performance difference is marginal and B-trees are far more versatile.

Composite Indexes: Column Order Is Everything

A composite index indexes multiple columns together. The column order determines which queries can use the index. This is the leftmost prefix rule — the most misunderstood concept in database indexing.

CREATE INDEX idx_orders_composite ON orders(customer_id, status, created_at);

This single index can satisfy these queries:

-- Uses index (all three columns, left to right):
WHERE customer_id = 42 AND status = 'shipped' AND created_at > '2026-01-01'

-- Uses index (first two columns):
WHERE customer_id = 42 AND status = 'shipped'

-- Uses index (first column only):
WHERE customer_id = 42

-- CANNOT use index (skips first column):
WHERE status = 'shipped'                        -- Skips customer_id!

-- CANNOT use index (skips middle column):
WHERE customer_id = 42 AND created_at > '2026-01-01'  -- Only uses customer_id part

Think of it like a phone book sorted by last name, then first name, then city. You can look up everyone named “Smith” (last name). You can look up “Smith, John” (last + first). But you cannot efficiently look up everyone named “John” without a last name — the book is not sorted that way.

Ordering Strategy

Put columns in this order:

  1. Equality conditions first (WHERE status = 'active')
  2. Range conditions last (WHERE created_at > '2026-01-01')
  3. High-selectivity columns before low-selectivity (customer_id before status)

Covering Indexes: Skip the Table Entirely

Normally, an index scan finds the matching row pointers, then fetches the actual rows from the table (a “heap fetch”). A covering index includes all the columns your query needs, eliminating the heap fetch entirely.

-- Query that needs customer_id, status, and total_amount:
SELECT customer_id, status, total_amount
FROM orders
WHERE customer_id = 42 AND status = 'shipped';

-- Regular index: finds rows, then fetches total_amount from table
CREATE INDEX idx_orders_cust_status ON orders(customer_id, status);

-- Covering index: includes total_amount, no table fetch needed
CREATE INDEX idx_orders_covering ON orders(customer_id, status)
  INCLUDE (total_amount);

In EXPLAIN output, a covering index shows “Index Only Scan” instead of “Index Scan” — this is significantly faster because it avoids random I/O to the heap.

EXPLAIN ANALYZE SELECT customer_id, status, total_amount
FROM orders WHERE customer_id = 42 AND status = 'shipped';

-- Index Only Scan using idx_orders_covering on orders
--   Index Cond: ((customer_id = 42) AND (status = 'shipped'))
--   Heap Fetches: 0            ← Zero table access!
--   Execution Time: 0.045 ms

Partial Indexes: Index Only What Matters

Why index 10 million rows when your query only ever looks at 50,000 of them? A partial index includes a WHERE clause that limits which rows are indexed.

-- Only 2% of orders are 'pending', but you query them constantly
CREATE INDEX idx_orders_pending ON orders(created_at)
  WHERE status = 'pending';

-- This index is 50x smaller than a full index
-- Faster to scan, faster to maintain, less disk space

-- Uses the partial index:
SELECT * FROM orders WHERE status = 'pending' ORDER BY created_at;

-- Cannot use it:
SELECT * FROM orders WHERE status = 'shipped' ORDER BY created_at;

Partial indexes are perfect for: active/inactive flags, unprocessed queues, soft-deleted records, and any column where you only query a small subset of values.

Index Bloat and Maintenance

PostgreSQL uses MVCC (Multi-Version Concurrency Control), which means UPDATE and DELETE operations leave dead tuples in indexes. Over time, indexes bloat — they grow larger without holding more useful data.

-- Check index bloat using pg_stat_user_indexes:
SELECT
  schemaname || '.' || relname AS table,
  indexrelname AS index,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan AS times_used,
  idx_tup_read AS rows_read
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 10;

If an index is large but idx_scan is zero, you are paying storage and write overhead for an index nobody uses. Drop it.

-- Rebuild a bloated index (locks the table briefly):
REINDEX INDEX idx_orders_customer_id;

-- Non-blocking rebuild (PostgreSQL 12+):
REINDEX INDEX CONCURRENTLY idx_orders_customer_id;

Common Anti-Patterns

1. Functions on Indexed Columns

-- This CANNOT use an index on created_at:
WHERE YEAR(created_at) = 2026
WHERE LOWER(email) = 'user@example.com'
WHERE amount * 1.1 > 100

-- Fix: rewrite to keep the column bare
WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01'

-- Or create an expression index:
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

2. Indexing Low-Cardinality Columns

-- Bad: boolean column with only 2 possible values
CREATE INDEX idx_users_active ON users(is_active);
-- The index doesn't help much: 50% of rows match each value
-- The planner will choose a sequential scan anyway

-- Better: partial index if you only query one side
CREATE INDEX idx_users_active ON users(created_at) WHERE is_active = true;

3. Over-Indexing

Every index slows down INSERT, UPDATE, and DELETE operations because the database must update every affected index. A table with 15 indexes means every single write operation triggers 15 index updates. Only create indexes that serve actual query patterns.

4. Implicit Type Casting

-- Column user_id is INTEGER, but you pass a string:
WHERE user_id = '42'
-- PostgreSQL may cast and still use the index, but MySQL often will not.
-- Always match types exactly.

EXPLAIN ANALYZE: Reading the Execution Plan

Here is a real optimization walkthrough. Suppose you have this slow query:

EXPLAIN ANALYZE
SELECT o.id, o.total_amount, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'pending'
  AND o.created_at > '2026-01-01'
ORDER BY o.created_at DESC
LIMIT 20;

-- BEFORE optimization:
-- Sort  (cost=45123.45..45123.50 rows=20)
--   Sort Key: o.created_at DESC
--   Sort Method: top-N heapsort  Memory: 27kB
--   -> Hash Join  (cost=1234.56..45000.00 rows=48576)
--        Hash Cond: (o.customer_id = c.id)
--        -> Seq Scan on orders o  (cost=0.00..43210.00 rows=48576)
--              Filter: ((status = 'pending') AND (created_at > '2026-01-01'))
--              Rows Removed by Filter: 9951424
--        -> Hash  (cost=1000.00..1000.00 rows=50000)
--              -> Seq Scan on customers c
-- Planning Time: 0.234 ms
-- Execution Time: 892.456 ms

The bottleneck is the Seq Scan on orders — examining 10 million rows. Let us fix it:

-- Create a targeted composite index:
CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC)
  INCLUDE (total_amount, customer_id);

-- AFTER optimization:
-- Limit  (cost=0.56..45.23 rows=20)
--   -> Nested Loop  (cost=0.56..109.45 rows=48576)
--        -> Index Only Scan Backward using idx_orders_status_created on orders o
--              Index Cond: ((status = 'pending') AND (created_at > '2026-01-01'))
--              Heap Fetches: 0
--        -> Index Scan using customers_pkey on customers c
--              Index Cond: (id = o.customer_id)
-- Planning Time: 0.187 ms
-- Execution Time: 0.342 ms

From 892ms to 0.3ms — a 2,600x improvement. The key changes: the composite index matches the WHERE + ORDER BY, and the INCLUDE clause makes it a covering index (Index Only Scan, zero heap fetches).

Index Type Cheat Sheet

Index Type Best For Supports Range Supports Sort Size
B-tree (default) Almost everything Yes Yes Medium
Hash Equality-only lookups No No Small
GIN Full-text search, arrays, JSONB No No Large
GiST Geospatial, range types, nearest-neighbor Yes No Medium
BRIN Large tables with natural ordering (timestamps) Yes No Tiny
Partial Queries targeting a small subset of rows Depends on base type Depends on base type Tiny
Covering Avoiding heap fetches for known queries Yes (B-tree) Yes (B-tree) Larger

Key Takeaways

  • Always check EXPLAIN ANALYZE before and after adding indexes — do not guess
  • Column order in composite indexes matters — equality columns first, range columns last
  • Covering indexes eliminate heap fetches — use INCLUDE for frequently selected columns
  • Partial indexes save space and speed — index only the rows you actually query
  • Audit unused indexes regularly — every index has a write cost
  • Never apply functions to indexed columns in WHERE clauses — use expression indexes instead
  • B-tree is the right choice 95% of the time — only reach for specialized types when you have a specific need

The difference between a junior and senior database engineer is not knowing that indexes exist — it is knowing which index to create, in what order, with which columns included. Master these fundamentals and you will never fear a slow query again.