Leaders and Followers: The Art of Database Replication
Your primary database just crashed. Users are screaming on Twitter. Revenue is bleeding at $10,000 per minute. But wait—within 30 seconds, a backup seamlessly takes over. That's the magic of replication.
The Hook
Your primary database just crashed. Users are screaming on Twitter. Revenue is bleeding at $10,000 per minute. But wait—within 30 seconds, a backup seamlessly takes over, and nobody even notices. That's the magic of replication. It's not just about making copies; it's about orchestrating a dance between a leader and its followers that keeps your data alive, available, and fast. Get it wrong, and you'll face the nightmare scenarios that have taken down GitHub, Amazon, and countless startups.
Learning Objectives
By the end of this article, you will be able to:
- Understand why leader-based replication is the dominant pattern in distributed databases
- Compare synchronous vs asynchronous replication and their trade-offs
- Analyze failover scenarios and the dangerous edge cases (split brain, data loss)
- Apply best practices for setting up new followers and handling node outages
The Big Picture: Why Replicate at All?
Imagine you're Netflix, serving 200 million users across the globe. A user in Tokyo shouldn't wait for a database query to travel to Virginia and back. That's where replication comes in—keeping copies of the same data on multiple machines for three critical reasons:
| Goal | Problem Solved | Example | |------|----------------|---------| | Reduce Latency | Data geographically close to users | Netflix CDN nodes worldwide | | Increase Availability | System works even when parts fail | Amazon staying up during AWS outages | | Scale Read Throughput | More machines to serve read queries | LinkedIn serving millions of profile views |
Here's the catch: if your data never changed, replication would be trivial—just copy once and done. All the complexity comes from handling changes to replicated data.
Leader-Based Replication: The Command Structure
The most common solution is leader-based replication (also called master-slave or primary-replica). Think of it as a military command structure:
Diagram Explanation: The leader (gold crown) accepts ALL write operations and propagates changes to followers via a replication log. Clients can read from any node, but writes must go through the leader. This single point of write coordination prevents conflicts.
The Three Rules of Leader-Based Replication:
- One leader is designated to accept all writes
- Followers receive changes via replication log/change stream
- Reads can go anywhere—leader or any follower
This pattern powers PostgreSQL, MySQL, MongoDB, SQL Server, and even message brokers like Kafka and RabbitMQ.
Synchronous vs Asynchronous: The Great Trade-off
Here's where things get interesting. When the leader writes data, how long should it wait before confirming success to the client?
Diagram Explanation: The leader waits for Follower 1 (synchronous) to acknowledge before confirming to the client, but doesn't wait for Follower 2 (asynchronous). This creates different durability guarantees—Follower 1 is always up-to-date, while Follower 2 might lag behind.
The Trade-off Matrix
| Aspect | Synchronous | Asynchronous | |--------|-------------|--------------| | Durability | ✅ Guaranteed on 2+ nodes | ⚠️ May lose recent writes | | Latency | ❌ Slower (wait for ACK) | ✅ Faster | | Availability | ❌ Blocked if sync follower down | ✅ Leader continues regardless | | Use Case | Financial transactions | Social media feeds |
Semi-Synchronous: The Practical Middle Ground
In practice, making ALL followers synchronous is impractical—one slow node blocks the entire system. The solution? Semi-synchronous replication:
- One follower is synchronous (guaranteed backup)
- Other followers are asynchronous (eventual consistency)
- If sync follower fails, an async one is promoted to sync
This gives you the best of both worlds: at least two nodes always have the latest data, but the system doesn't grind to a halt if one replica is slow.
Setting Up New Followers: The Snapshot Dance
Adding a new replica isn't as simple as copying files. With a live database receiving writes, a naive file copy would capture inconsistent data—like photographing someone mid-blink.
Diagram Explanation: The four-step process for adding a new replica without downtime. The key insight is the snapshot's position marker (PostgreSQL's "log sequence number", MySQL's "binlog coordinates") that tells the follower exactly where to start catching up.
The Position Marker: Your Database Bookmark
Every snapshot needs a position marker—a precise point in the replication log. Without it, the new follower wouldn't know where to start catching up:
| Database | Position Marker Name | |----------|---------------------| | PostgreSQL | Log Sequence Number (LSN) | | MySQL | Binlog Coordinates | | MongoDB | OpTime | | Kafka | Offset |
Handling Node Outages
Nodes will fail. It's not a question of if, but when. The goal is to minimize impact.
Follower Failure: Catch-up Recovery
When a follower comes back online, recovery is straightforward:
- Check local log for last processed transaction
- Connect to leader
- Request all changes since that point
- Apply changes and resume normal operation
This is idempotent—you can replay the same changes safely.
Leader Failure: The Dreaded Failover
Leader failure is where things get dangerous. Failover involves:
- Detecting the leader is dead (usually via timeout)
- Electing a new leader (the most up-to-date replica)
- Reconfiguring clients and followers to use the new leader
Diagram Explanation: Failover sequence showing detection, election, and reconfiguration. The follower with the most up-to-date data becomes the new leader. When the old leader returns, it must accept its demotion to follower status.
Failover Nightmares: What Can Go Wrong
Failover is "fraught with things that can go wrong." Here are the horror stories:
1. The Lost Writes Problem
With asynchronous replication, the new leader might not have all writes from the old leader:
Old Leader: Processed writes A, B, C, D, E
New Leader: Only received A, B, C
Result: Writes D and E are LOST FOREVER
The common solution? Discard the old leader's unreplicated writes. Yes, this violates durability guarantees, but it's often the only practical option.
2. The GitHub Incident: When Lost Writes Break Everything
In one infamous incident at GitHub, an out-of-date MySQL follower was promoted to leader. The problem? Auto-incrementing primary keys.
Old Leader: Assigned IDs 1, 2, 3, 4, 5
New Leader: Last synced at ID 3
New Leader: Assigns new IDs 4, 5, 6...
Result: IDs 4 and 5 now reference DIFFERENT data!
These IDs were also stored in Redis. The mismatch caused private user data to be disclosed to wrong users. A catastrophic security breach from a simple failover.
3. Split Brain: Two Leaders, Total Chaos
What if both nodes think they're the leader?
Diagram Explanation: Split brain occurs when network partitions make nodes unable to communicate, and both assume leadership. Both accept conflicting writes, leading to data corruption or loss when the partition heals.
4. The Timeout Dilemma
| Timeout Setting | Risk | |-----------------|------| | Too long | Slow recovery when leader actually fails | | Too short | False positives during load spikes or network glitches |
A temporary load spike can cause response times to exceed the timeout, triggering an unnecessary failover—which adds more load to an already struggling system. Cascading failure.
Implementation: Replication Logs
How does the leader actually send changes to followers?
Statement-Based Replication (The Naive Approach)
Just forward every SQL statement: INSERT INTO users VALUES (...).
Problems:
NOW()returns different times on each replicaRAND()gives different values- Auto-increment depends on execution order
- Triggers may have side effects
Verdict: Simple but broken. MySQL used this before version 5.1, now defaults to row-based.
Write-Ahead Log (WAL) Shipping
Send the actual low-level storage changes (which bytes changed on disk).
Used by: PostgreSQL, Oracle
Trade-off: Very efficient, but tightly coupled to storage engine version. Upgrading requires downtime.
Real-World Analogy: The Newspaper Printing Press
Think of leader-based replication like a newspaper with one central editor (leader) and multiple printing presses (followers):
- The editor receives all articles, fact-checks them, and creates the master copy
- Printing presses receive the finalized pages and produce copies for distribution
- Readers can get papers from any printing press (read from any replica)
- Only the editor can change the content (writes go to leader)
When the editor gets sick (leader fails):
- The most experienced journalist is promoted to editor (failover)
- Some late-breaking stories might be lost (unreplicated writes)
- If two journalists both claim to be editor, chaos ensues (split brain)
Key Takeaways
-
Leader-based replication is the dominant pattern: One node accepts writes, others follow. It's simple, battle-tested, and used by PostgreSQL, MySQL, MongoDB, Kafka, and more.
-
Sync vs Async is a fundamental trade-off: Synchronous guarantees durability but kills availability. Asynchronous is fast but can lose data. Semi-synchronous is the practical compromise.
-
Failover is dangerous, not automatic: Split brain, lost writes, and cascading failures are real risks. Many ops teams prefer manual failover despite automation being available.
-
Position markers are critical: Snapshots need exact positions in the replication log (LSN, binlog coordinates) to enable catch-up without missing or duplicating data.
-
The GitHub incident is a warning: Autoincrementing IDs + failover + external systems (Redis) = potential security disaster. Replication bugs don't stay contained.
Common Pitfalls
| ❌ Misconception | ✅ Reality | |-----------------|-----------| | "Replication makes my data safe" | Only if synchronous; async replication can lose confirmed writes | | "Failover is automatic and safe" | It's one of the hardest problems in distributed systems; split brain can corrupt data | | "More replicas = more durable" | Only if at least one is synchronous; 10 async replicas all lag behind | | "Replication lag is milliseconds" | Cross-region can be seconds; cross-continent can be minutes | | "Read replicas always have current data" | Never assume this—design for eventual consistency | | "Statement-based replication is fine" | Nondeterministic functions (NOW, RAND) break everything |
What's Next?
We've covered the foundation—leader-based replication. But what happens when you need multiple leaders across data centers? What about leaderless systems like Cassandra and Dynamo?
In the next section, we'll explore Problems with Replication Lag—the consistency anomalies that arise when followers fall behind, and the guarantees (read-your-writes, monotonic reads) that help tame the chaos.