Multi-leader and leaderless (Dynamo-style) replication. — Cracked Java
// High-Level Design (HLD / Distributed Systems) · Replication & Partitioning (Sharding)
SeniorSystem Design

Multi-leader and leaderless (Dynamo-style) replication.

Multi-leader and leaderless (Dynamo-style) replication

When single-leader replication's single write node becomes the bottleneck — or when you need write availability across regions, or writes to continue during a partition — you move to one of two models that accept multiple write entry points.

Multi-leader replication

Multiple nodes accept writes and replicate to each other. The classic use case is multi-datacenter: each region has its own leader, so writes are fast and local, and a region can keep accepting writes even if the link between regions drops.

Multi-leader: each region has a leader; leaders replicate to each other asynchronously

The cost is write conflicts. The same record can be edited concurrently in two regions. Resolution strategies:

  • Last-write-wins (LWW) — keep the write with the highest timestamp. Simple, but silently discards the loser; risks clock-skew anomalies.
  • Conflict-free replicated data types (CRDTs) — data structures (counters, sets) that merge deterministically without losing updates.
  • Application-defined merge — surface the conflict and let app logic (or the user) resolve it, as in Git or collaborative editors.

Multi-leader is powerful but operationally tricky; avoid it unless you genuinely need multi-region writes or offline clients.

Leaderless replication (Dynamo-style)

No node is special. The client (or a coordinator) writes to all N replicas and reads from several, using quorums (W + R > N) to guarantee freshness. This is the model of DynamoDB, Cassandra, and Riak — the lineage of Amazon's Dynamo paper.

Because writes can land on different subsets of replicas (especially during failures), the store needs active anti-entropy mechanisms to keep replicas converging:

  • Read repair. On a read, the coordinator compares the values returned by the R replicas; if some are stale, it writes the freshest value back to the laggards during the read. Fixes frequently-read keys for free.
  • Anti-entropy / background repair. A periodic background process compares replicas and reconciles differences for keys that are rarely or never read (read repair alone would never touch them). Implemented efficiently with Merkle trees so only divergent ranges are exchanged.
  • Hinted handoff. If a target replica is down at write time, a healthy node stores a "hint" and forwards the write once the target recovers — keeping writes available during transient failures ("sloppy quorum").

Comparison

Multi-leaderLeaderless
Write entry pointsA few leadersAny replica
Primary use caseMulti-region / offline writesHigh availability, horizontal scale
Conflict handlingLWW / CRDT / app mergeVersioning + read repair + anti-entropy
ConsistencyEventual (async between leaders)Tunable via W/R quorums
ExamplesCouchDB, multi-region MySQL, BDRDynamoDB, Cassandra, Riak

Mark your status