Module 4 of 12

Distributed Data Management

How modern systems split, replicate, and reconcile data across many machines - replication, sharding, quorums, consistency models, and the distributed databases that implement them.

5 hours3 labsFree

Watch as Slides Course overview Lab code

Start here

Learning objectives

Pick between hash and range partitioning based on access patterns
Design replication strategies (single-leader, multi-leader, leaderless) and their failover behaviour
Apply quorum math (W + R > N) to choose consistency levels
Read a Cassandra / DynamoDB / PostgreSQL replication topology and predict its failure modes
Avoid the classic distributed-data anti-patterns: hot partitions, replication lag, write conflicts

Before

Single Postgres primary; vertical scaling until the box maxes out
Hot keys cause unexplained latency; team scales the cluster, no improvement
Multi-region by accident; cross-region writes drag p99 beyond the SLO
Stale reads from replicas confuse users; no consistency model documented

After

Sharded data layer chosen for the access pattern; horizontal scale as a normal mode
Per-shard QPS dashboards; hot keys detected before users are affected
Multi-region with LOCAL_QUORUM (Cassandra) or sharded ownership (CockroachDB)
Consistency level per query; engineers make the choice with eyes open

One machine cannot hold all your data and one machine cannot survive forever. Distributed data management is the discipline of splitting state across many machines (sharding) and keeping multiple copies of each piece (replication) so that the system stays available, durable, and fast enough - while exposing a coherent enough story to applications that they can be written without thinking about every node.

Replication Strategies

Three classic models, each a different trade-off:

Single-leader (PostgreSQL streaming, MySQL, MongoDB primary): all writes go through one leader. Followers replicate the leader's log. Simple to reason about, but the leader is a bottleneck and a SPOF (mitigated by automatic failover).
Multi-leader (multi-region MySQL with bidirectional replication, CRDT-backed systems): writes accepted at multiple nodes; conflicts resolved by merge rules. Harder; useful for geo-distributed systems where local writes matter.
Leaderless (Cassandra, DynamoDB, Riak): writes are sent to multiple replicas in parallel; reads are reconciled via quorum. No single “leader” per partition.

Synchronous vs asynchronous replication is orthogonal:

Synchronous: write is acknowledged after at least one replica confirms. Higher latency, no data loss on leader crash.
Asynchronous: write is acknowledged on leader commit; replicas catch up later. Lower latency; potential data loss window.
Semi-synchronous: hybrid; ack after any one replica confirms.

Sharding Strategies

Hash partitioning: partition = hash(key) % N. Excellent load distribution; range queries are expensive (every shard touched). Used by Cassandra, DynamoDB, Redis Cluster.

Range partitioning: each shard owns a contiguous key range. Range queries are efficient; hot spots are easy to create (a key prefix that is heavily written becomes a single-shard bottleneck). Used by HBase, BigTable, CockroachDB, MongoDB sharded clusters.

Consistent hashing: solves the “adding a node forces all keys to move” problem of naive hash partitioning. Each node is hashed onto a ring; each key belongs to the next node clockwise. Adding a node only moves 1/N of keys. Combined with virtual nodes (256 vnodes per physical node in Cassandra) for smoother rebalancing.

The Distributed Systems Algorithms guide goes deeper on consistent hashing math.

Quorum Math

With N replicas, write quorum W, and read quorum R, you achieve strong consistency when W + R > N. The intuition: any read overlaps any write by at least one node, so the latest write is visible.

N=3, W=2, R=2: tolerates 1 failure for both reads and writes; strong consistency. Standard Cassandra production.
N=3, W=3, R=1: every write hits all replicas; reads are fast and consistent; one failure blocks writes. Read-heavy workloads.
N=3, W=1, R=1: maximum availability and lowest latency; eventual consistency only. DynamoDB default.
N=5, W=3, R=3: tolerates 2 simultaneous failures; strong consistency. Mission-critical Cassandra clusters.

Cassandra exposes this as consistency levels (ONE, QUORUM, LOCAL_QUORUM, EACH_QUORUM, ALL). For multi-region deployments LOCAL_QUORUM is critical - it requires a quorum within the local datacentre but does not wait for cross-region acks.

Consistency Models in Practice

The same hierarchy you saw in Module 1 (Linearizable → Sequential → Causal → Read-your-writes → Eventual) shows up in every real database, often as tunable guarantees:

Cassandra: per-query consistency level.
MongoDB: readConcern (local, majority, linearizable) + writeConcern.
DynamoDB: per-call ConsistentRead flag.
Spanner / CockroachDB: linearizable by default, with explicit follower-read modes.

Distributed Database Choices

Postgres / MySQL: single-leader replication; horizontal scale via read replicas or external sharding (Citus, Vitess). Strong consistency on the leader; read lag on replicas.
Cassandra: leaderless, hash-partitioned, AP-leaning. High write throughput, tunable consistency. Great for write-heavy time-series; less great for low-latency reads of small data.
DynamoDB: managed, hash-partitioned, AP-leaning. Eventual or strong consistency per call. Excellent if you stay within its access patterns; expensive if you fight them.
CockroachDB: distributed SQL with per-range Raft consensus. Linearizable by default; horizontal scale; SQL surface. The heaviest of the modern options.
Spanner / TiDB: globally-distributed SQL with linearizable transactions backed by TrueTime / TSO. Premium pricing for premium consistency.
Redis Cluster: in-memory, hash-slot partitioned, asynchronous replication. Great cache or session store; not your primary database.

Common Pitfalls

Read-your-writes anomaly: writing to leader, reading from a replica, getting your own old value. Fix: stickify reads to leader for a short window after a write.
Hot partition: a partition key with skewed traffic (one tenant, one celebrity user) overloads one node. Fix: salted keys, increased shard count, multi-tenant isolation.
Cross-shard transactions: expensive (2PC, distributed locking). Design data models to keep related data in the same shard wherever possible.
Replication lag spikes: replicas fall behind during bulk writes; reads served from lagging replicas return stale data. Monitor lag in seconds or bytes; fail reads to leader past a threshold.

Quorum Write Flow

Self-Check Quiz

N=5 Cassandra cluster. You write with CL=QUORUM, then immediately read with CL=ONE. Can you see your write? (Answer: not guaranteed. CL=ONE returns the first replica response, which may be a stale one. For strong consistency, you need W+R>N - e.g. CL=QUORUM on both.)
Your single celebrity user generates 90% of writes for one shard. What is happening, and what is the fix? (Answer: hot partition. Fix: split the key with salting (user_id:0..N) and aggregate at read time, or scale out and use a different partitioning key.)
Why does range-partitioned MongoDB sometimes accidentally create hot shards while hash-partitioned Cassandra rarely does? (Answer: range partitioning concentrates time-prefix or sequential keys on one shard; hash partitioning randomises. Choose the partition strategy based on access pattern.)
You see replication lag of 30 seconds on a Postgres read replica. What three actions matter most? (Answer: alert if it exceeds threshold; route latency-sensitive reads to leader; investigate root cause - bulk writes, slow disk, or vacuum.)

For the algorithm-level treatment of consistent hashing, vector clocks, CRDTs, and quorum proofs, read the Distributed Systems Algorithms guide. For Kubernetes-specific operational patterns the Kubernetes cheatsheet is the fast reference.

Real world

Where this shows up

Cassandra at Netflix replicates user data across multiple regions with LOCAL_QUORUM for low-latency reads.
DynamoDB Global Tables provide multi-region active-active with last-writer-wins by default.
CockroachDB uses per-range Raft groups to provide globally-consistent SQL with horizontal scale.
Discord migrated from MongoDB to Cassandra (then to ScyllaDB) for their messages workload due to Cassandra's leaderless write throughput.

Production notes

Keep these close

Replication lag is a first-class metric to alert on. Past a threshold, fail reads back to the leader rather than serve stale data.
Hot partitions are the #1 distributed-data scalability bug. Detect via per-shard QPS dashboards; mitigate via key salting or tenant-aware shard routing.
Cross-shard transactions are expensive (2PC, distributed locking). Design data models to keep related data co-located in the same shard.

Common mistakes

What usually breaks

Choosing eventual consistency without thinking through the read-your-writes anomaly.
Single-region Cassandra with consistency level QUORUM - works fine until you go multi-region and discover the cross-region quorum cost.
Treating replicas as a read-scale solution when replication lag means stale reads.

Security risks

Threats to watch

Multi-tenant sharded systems leak data when the application forgets to scope queries by tenant_id; design tenant scoping into the storage layer, not the app.
Read replicas with different RBAC than the primary expose data the security team thought was protected.
Backups of distributed databases multiply the data-leak surface; encrypt at rest with KMS, audit access.
Cross-region replication over public internet must use TLS; a leaked snapshot in transit is a leaked database.

Tradeoffs

Design choices you should be able to defend

Single-leader (Postgres, MySQL)

Pros

Strong consistency on the leader
Simple to reason about
Atomic transactions

Cons

Leader is a write bottleneck
Failover is the SPOF reconciliation problem
Read replicas have lag

Leaderless (Cassandra, DynamoDB)

Pros

No single bottleneck
High write throughput
Tunable consistency

Cons

Complex consistency story
No cross-key transactions
Operational complexity

Multi-leader (CRDT-backed, multi-region MySQL)

Pros

Local writes in every region
No coordination latency

Cons

Conflict resolution required
Hard to reason about

Alternatives

Other production approaches

Sharded Postgres (Citus, Vitess)

Postgres with horizontal sharding; keep SQL surface, add scale.

Apache Cassandra / ScyllaDB

Leaderless, hash-partitioned, AP-leaning; best for high write throughput.

CockroachDB / TiDB / YugabyteDB

Distributed SQL with per-range consensus; SQL surface + horizontal scale.

Spanner / AlloyDB

Globally-consistent SQL; premium pricing for premium consistency.

DynamoDB

Managed, hash-partitioned; great if you stay within its access patterns, painful if you fight them.

Think like an engineer

Questions to answer before shipping

Choose partition keys based on the access pattern, not the storage convenience. Keys that are easy to write but hard to query become technical debt.
For multi-tenant systems, design for the worst tenant. The 90th-percentile customer is your bottleneck before it's your average.
Consistency level is per query, not per database. Force engineers to make the choice deliberately for each operation.

Key terms

Vocabulary used in this module

Sharding

Splitting data across multiple machines by partitioning the key space.

Quorum

Minimum number of nodes that must respond for an operation to be considered successful.

Replication lag

How far behind the leader a replica is; the staleness window for reads from that replica.

Hot partition

A partition with disproportionately high traffic, overloading one node.

Eventual consistency

Replicas converge to the same state if writes stop; the weakest useful guarantee.

Labs

Hands-on labs

60 minutesIntermediate

Lab 4.1 - Postgres Streaming Replication + Failover

Set up Postgres primary + replica, force failover, observe data loss window with sync vs async replication.

Spin up primary + replica with docker-compose
Run async replication; cause primary crash mid-write; measure data loss
Switch to synchronous_commit=on; rerun; verify zero data loss

View lab on GitHub

90 minutesIntermediate

Lab 4.2 - Cassandra Quorum Behaviour

Run a 3-node Cassandra cluster, vary consistency levels, observe behaviour during node failure.

Spin up Cassandra cluster (3 nodes)
Write with CL=QUORUM; verify reads see latest
Kill one node; verify QUORUM still works
Kill two nodes; verify QUORUM fails; ONE still works (eventual)

View lab on GitHub

45 minutesIntermediate

Lab 4.3 - Hot Partition Reproduction

Cause and mitigate a hot partition in Redis Cluster.

Send 90% of traffic to one key
Observe per-node QPS; identify the hot node
Apply key salting (key:0..key:9); redistribute
Confirm load balances

View lab on GitHub

Recap

Key takeaways

Replication is for durability and availability; sharding is for scale - you need both at scale
Quorum math (W + R > N) is the rule for strong consistency in leaderless systems
Hot partitions are the #1 distributed-data scalability bug; design key spaces deliberately
Cross-shard transactions are expensive; data models should make them rare
Replication lag is a first-class metric to alert on, not an implementation detail

Related resources

Distributed Data Management

Learning objectives

Replication Strategies

Sharding Strategies

Quorum Math

Consistency Models in Practice

Distributed Database Choices

Common Pitfalls

Quorum Write Flow

Self-Check Quiz

Where this shows up

Keep these close

What usually breaks

Threats to watch

Design choices you should be able to defend

Single-leader (Postgres, MySQL)

Leaderless (Cassandra, DynamoDB)

Multi-leader (CRDT-backed, multi-region MySQL)

Other production approaches

Sharded Postgres (Citus, Vitess)

Apache Cassandra / ScyllaDB

CockroachDB / TiDB / YugabyteDB

Spanner / AlloyDB

DynamoDB

Questions to answer before shipping

Vocabulary used in this module

Sharding

Quorum

Replication lag

Hot partition

Eventual consistency

Hands-on labs

Lab 4.1 - Postgres Streaming Replication + Failover

Lab 4.2 - Cassandra Quorum Behaviour

Lab 4.3 - Hot Partition Reproduction

Key takeaways

Keep learning across CodersSecret

Related guides

Cheatsheets

Interactive labs

Glossary terms