Leaderless Replication: Quorums, Chaos, and Cassandra

The Hook

No leader. No followers. Any node can accept any write. This sounds like chaos—and it is. But Amazon's Dynamo pioneered this approach, and it powers some of the world's largest systems: Cassandra, Riak, Voldemort. The secret sauce? Quorums: if you write to 3 nodes and read from 3 nodes, at least one must have the latest value... right? Well, not always. Let's dive into the beautiful mess of leaderless replication.

Learning Objectives

By the end of this article, you will be able to:

Explain how leaderless replication differs from leader-based approaches
Apply the quorum formula (w + r > n) to guarantee read freshness
Analyze edge cases where quorums fail to provide consistency
Evaluate trade-offs between availability and consistency in leaderless systems

The Big Picture: The Radical Alternative

Leader-based replication has a single point of write coordination. What if we removed that?

Loading diagram...

Diagram Explanation: In leader-based systems, all writes flow through one designated node. In leaderless systems, clients can write to any node—there's no special "leader" role. All nodes are peers.

Who Uses This?

| System | Type | Notes | |--------|------|-------| | Amazon Dynamo | Original paper (2007) | Internal to Amazon | | Cassandra | Wide-column store | Most popular Dynamo-style DB | | Riak | Key-value store | Strong CRDT support | | Voldemort | LinkedIn's KV store | Based on Dynamo paper |

Writing Without a Leader

The client sends writes to all replicas in parallel:

Loading diagram...

Diagram Explanation: The client writes to 3 nodes simultaneously. Node 3 is down, but 2/3 succeeded. With w=2 required, the write is confirmed. Node 3 will catch up later.

The Quorum Parameters

| Parameter | Meaning | |-----------|---------| | n | Total number of replicas | | w | Write quorum (min nodes that must confirm write) | | r | Read quorum (min nodes to query on read) |

Reading Without a Leader

Reads also go to multiple nodes:

Loading diagram...

Diagram Explanation: The client reads from all 3 nodes. Two have the latest value (42), one is stale (40). Using version numbers, the client returns the newest value.

The Magic Formula: w + r > n

The key insight: if w + r > n, at least one node in the read quorum must have the latest write.

Loading diagram...

Diagram Explanation: With n=5, w=3, r=3, the quorums overlap by at least 1 node. Node C is in both quorums—guaranteeing the read sees the latest write.

Common Quorum Configurations

| Config | Trade-off | Use Case | |--------|-----------|----------| | w=n, r=1 | Slow writes, fast reads | Read-heavy workloads | | w=1, r=n | Fast writes, slow reads | Write-heavy workloads | | w=r=(n+1)/2 | Balanced | General purpose | | w=1, r=1 | ⚠️ No consistency! | Only for caches |

How Stale Nodes Catch Up

When a node misses writes (network partition, crash), it needs to recover. Two mechanisms:

1. Read Repair

Loading diagram...

Diagram Explanation: During a read, the client notices Node 3 has stale data. It writes the correct value back—"repairing" the replica. This is opportunistic and only works for frequently-read data.

2. Anti-Entropy Background Process

Loading diagram...

Diagram Explanation: A background process continuously compares data between nodes using Merkle trees (hash trees) to efficiently find differences. Unlike read repair, this catches data that's rarely read.

When Quorums Fail

⚠️ The quorum formula doesn't guarantee strong consistency!

Even with w + r > n, edge cases break the guarantee:

Edge Case 1: Sloppy Quorum

Loading diagram...

Diagram Explanation: With sloppy quorums, writes go to ANY available nodes during failures. Node 4 temporarily holds the data. When Node 1 comes back, reads might query {1, 2, 3}—missing the value on Node 4!

Edge Case 2: Concurrent Writes

Loading diagram...

Diagram Explanation: Two clients write concurrently. Different nodes end up with different values. Without conflict resolution (like LWW or version vectors), reads return inconsistent results.

Edge Case 3: Write and Read Overlap... But Wrong

Loading diagram...

Diagram Explanation: The quorum math assumes writes complete before reads start. If a read races with a write, the read might miss the new value even though w + r > n.

Sloppy Quorums and Hinted Handoff

Sloppy quorums sacrifice consistency for availability:

Loading diagram...

Diagram Explanation: When home nodes are unreachable, writes go to any available node as "hints." When home nodes recover, hints are handed off. This increases write availability but breaks read quorum guarantees.

| Mode | Guarantee | Availability | |------|-----------|--------------| | Strict quorum | w + r > n guarantees overlap | ❌ Fails if too many nodes down | | Sloppy quorum | ⚠️ No overlap guarantee | ✅ Writes succeed even during partitions |

Detecting Concurrent Writes

Without a leader, how do you know which write "wins"?

Version Vectors

Each node maintains a version number. When you read, you get all versions:

# Riak-style version vector
{
    "value": "hello",
    "vector_clock": {
        "node_A": 3,
        "node_B": 2,
        "node_C": 1
    }
}

Conflict Detection

Loading diagram...

Diagram Explanation: If one version's clock dominates another in all positions, it's newer. If neither dominates (concurrent writes), you have a conflict that requires resolution.

Real-World Analogy: The Committee Vote

Think of quorum replication like a voting committee:

| Concept | Committee Analogy | |---------|-------------------| | n replicas | Committee of 5 members | | w (write quorum) | Need 3 votes to pass a motion | | r (read quorum) | Need 3 members to confirm current policy | | w + r > n | At least 1 member in both groups | | Sloppy quorum | Bring in temporary substitutes during absences | | Conflict | Two subgroups passed contradictory motions |

If 3 members vote to pass Motion A, and later you ask 3 members about current policy, at least one was in the original vote—they can confirm the decision.

But if two separate groups of 3 each pass conflicting motions? Chaos. That's a concurrent write conflict.

Practical Example: Cassandra Configuration

# cassandra.yaml
replication_factor: 3  # n=3
 
# In CQL queries, specify consistency:
# QUORUM = floor(n/2) + 1 = 2
 
# For strong consistency:
# write with LOCAL_QUORUM (w=2)
# read with LOCAL_QUORUM (r=2)
# 2 + 2 = 4 > 3 ✅

-- Write with quorum
INSERT INTO users (id, name) VALUES (1, 'Alice')
USING CONSISTENCY QUORUM;
 
-- Read with quorum
SELECT * FROM users WHERE id = 1
CONSISTENCY QUORUM;

Cassandra Consistency Levels

| Level | Meaning | Use Case | |-------|---------|----------| | ONE | One replica responds | Lowest latency, eventual consistency | | QUORUM | Majority responds | Balance of speed and consistency | | ALL | All replicas respond | Strongest consistency, lowest availability | | LOCAL_QUORUM | Majority in local datacenter | Multi-DC deployments |

Key Takeaways

Leaderless means any node can accept writes: No single point of failure, but also no single point of coordination. Clients send writes to multiple nodes in parallel.
w + r > n guarantees overlap, not consistency: The formula ensures at least one read node has the latest write—but concurrent writes, sloppy quorums, and timing issues can still cause inconsistencies.
Two catch-up mechanisms: Read repair (opportunistic, during reads) and anti-entropy (background, proactive). Both are needed for reliable convergence.
Sloppy quorums trade consistency for availability: During network partitions, writes succeed to "wrong" nodes and are handed off later. Good for availability, bad for strong consistency.
Version vectors detect conflicts: When writes are truly concurrent, version vectors identify conflicts. The application must resolve them (merge, LWW, or ask the user).

Common Pitfalls

| ❌ Misconception | ✅ Reality | |-----------------|-----------| | "Quorum = strong consistency" | Quorums provide probabilistic overlap, not linearizability | | "w + r > n is enough" | Edge cases (concurrent writes, sloppy quorums, timing) break the guarantee | | "Leaderless is simpler" | It pushes complexity to the client (conflict resolution, version handling) | | "Any node can serve any read" | Without r > n-w, you might read stale data | | "Cassandra is always eventually consistent" | With QUORUM writes and reads, it provides stronger guarantees | | "Hinted handoff fixes everything" | Hints can be lost; anti-entropy is still needed |

What's Next?

We've explored three replication approaches:

Leader-based: Simple, one coordinator
Multi-leader: Multiple coordinators, conflict resolution
Leaderless: No coordinator, quorum-based

But replication is just one piece of distributed data. What about partitioning—splitting data across machines? How do you ensure transactions work across partitions? How do you handle failures without losing data?

In the next chapter, we'll explore Partitioning (Sharding)—how to split data across nodes while maintaining balanced load, efficient queries, and fault tolerance.