Leaders and Followers: The Art of Database Replication

The Hook

Your primary database just crashed. Users are screaming on Twitter. Revenue is bleeding at $10,000 per minute. But wait—within 30 seconds, a backup seamlessly takes over, and nobody even notices. That's the magic of replication. It's not just about making copies; it's about orchestrating a dance between a leader and its followers that keeps your data alive, available, and fast. Get it wrong, and you'll face the nightmare scenarios that have taken down GitHub, Amazon, and countless startups.

Learning Objectives

By the end of this article, you will be able to:

Understand why leader-based replication is the dominant pattern in distributed databases
Compare synchronous vs asynchronous replication and their trade-offs
Analyze failover scenarios and the dangerous edge cases (split brain, data loss)
Apply best practices for setting up new followers and handling node outages

The Big Picture: Why Replicate at All?

Imagine you're Netflix, serving 200 million users across the globe. A user in Tokyo shouldn't wait for a database query to travel to Virginia and back. That's where replication comes in—keeping copies of the same data on multiple machines for three critical reasons:

| Goal | Problem Solved | Example | |------|----------------|---------| | Reduce Latency | Data geographically close to users | Netflix CDN nodes worldwide | | Increase Availability | System works even when parts fail | Amazon staying up during AWS outages | | Scale Read Throughput | More machines to serve read queries | LinkedIn serving millions of profile views |

Here's the catch: if your data never changed, replication would be trivial—just copy once and done. All the complexity comes from handling changes to replicated data.

Leader-Based Replication: The Command Structure

The most common solution is leader-based replication (also called master-slave or primary-replica). Think of it as a military command structure:

Loading diagram...

Diagram Explanation: The leader (gold crown) accepts ALL write operations and propagates changes to followers via a replication log. Clients can read from any node, but writes must go through the leader. This single point of write coordination prevents conflicts.

The Three Rules of Leader-Based Replication:

One leader is designated to accept all writes
Followers receive changes via replication log/change stream
Reads can go anywhere—leader or any follower

This pattern powers PostgreSQL, MySQL, MongoDB, SQL Server, and even message brokers like Kafka and RabbitMQ.

Synchronous vs Asynchronous: The Great Trade-off

Here's where things get interesting. When the leader writes data, how long should it wait before confirming success to the client?

Loading diagram...

Diagram Explanation: The leader waits for Follower 1 (synchronous) to acknowledge before confirming to the client, but doesn't wait for Follower 2 (asynchronous). This creates different durability guarantees—Follower 1 is always up-to-date, while Follower 2 might lag behind.

The Trade-off Matrix

| Aspect | Synchronous | Asynchronous | |--------|-------------|--------------| | Durability | ✅ Guaranteed on 2+ nodes | ⚠️ May lose recent writes | | Latency | ❌ Slower (wait for ACK) | ✅ Faster | | Availability | ❌ Blocked if sync follower down | ✅ Leader continues regardless | | Use Case | Financial transactions | Social media feeds |

Semi-Synchronous: The Practical Middle Ground

In practice, making ALL followers synchronous is impractical—one slow node blocks the entire system. The solution? Semi-synchronous replication:

One follower is synchronous (guaranteed backup)
Other followers are asynchronous (eventual consistency)
If sync follower fails, an async one is promoted to sync

This gives you the best of both worlds: at least two nodes always have the latest data, but the system doesn't grind to a halt if one replica is slow.

Setting Up New Followers: The Snapshot Dance

Adding a new replica isn't as simple as copying files. With a live database receiving writes, a naive file copy would capture inconsistent data—like photographing someone mid-blink.

Loading diagram...

Diagram Explanation: The four-step process for adding a new replica without downtime. The key insight is the snapshot's position marker (PostgreSQL's "log sequence number", MySQL's "binlog coordinates") that tells the follower exactly where to start catching up.

The Position Marker: Your Database Bookmark

Every snapshot needs a position marker—a precise point in the replication log. Without it, the new follower wouldn't know where to start catching up:

| Database | Position Marker Name | |----------|---------------------| | PostgreSQL | Log Sequence Number (LSN) | | MySQL | Binlog Coordinates | | MongoDB | OpTime | | Kafka | Offset |

Handling Node Outages

Nodes will fail. It's not a question of if, but when. The goal is to minimize impact.

Follower Failure: Catch-up Recovery

When a follower comes back online, recovery is straightforward:

Check local log for last processed transaction
Connect to leader
Request all changes since that point
Apply changes and resume normal operation

This is idempotent—you can replay the same changes safely.

Leader Failure: The Dreaded Failover

Leader failure is where things get dangerous. Failover involves:

Detecting the leader is dead (usually via timeout)
Electing a new leader (the most up-to-date replica)
Reconfiguring clients and followers to use the new leader

Loading diagram...

Diagram Explanation: Failover sequence showing detection, election, and reconfiguration. The follower with the most up-to-date data becomes the new leader. When the old leader returns, it must accept its demotion to follower status.

Failover Nightmares: What Can Go Wrong

Failover is "fraught with things that can go wrong." Here are the horror stories:

1. The Lost Writes Problem

With asynchronous replication, the new leader might not have all writes from the old leader:

Old Leader: Processed writes A, B, C, D, E
New Leader: Only received A, B, C

Result: Writes D and E are LOST FOREVER

The common solution? Discard the old leader's unreplicated writes. Yes, this violates durability guarantees, but it's often the only practical option.

2. The GitHub Incident: When Lost Writes Break Everything

In one infamous incident at GitHub, an out-of-date MySQL follower was promoted to leader. The problem? Auto-incrementing primary keys.

Old Leader: Assigned IDs 1, 2, 3, 4, 5
New Leader: Last synced at ID 3
New Leader: Assigns new IDs 4, 5, 6...

Result: IDs 4 and 5 now reference DIFFERENT data!

These IDs were also stored in Redis. The mismatch caused private user data to be disclosed to wrong users. A catastrophic security breach from a simple failover.

3. Split Brain: Two Leaders, Total Chaos

What if both nodes think they're the leader?

Loading diagram...

Diagram Explanation: Split brain occurs when network partitions make nodes unable to communicate, and both assume leadership. Both accept conflicting writes, leading to data corruption or loss when the partition heals.

4. The Timeout Dilemma

| Timeout Setting | Risk | |-----------------|------| | Too long | Slow recovery when leader actually fails | | Too short | False positives during load spikes or network glitches |

A temporary load spike can cause response times to exceed the timeout, triggering an unnecessary failover—which adds more load to an already struggling system. Cascading failure.

Implementation: Replication Logs

How does the leader actually send changes to followers?

Statement-Based Replication (The Naive Approach)

Just forward every SQL statement: INSERT INTO users VALUES (...).

Problems:

NOW() returns different times on each replica
RAND() gives different values
Auto-increment depends on execution order
Triggers may have side effects

Verdict: Simple but broken. MySQL used this before version 5.1, now defaults to row-based.

Write-Ahead Log (WAL) Shipping

Send the actual low-level storage changes (which bytes changed on disk).

Used by: PostgreSQL, Oracle

Trade-off: Very efficient, but tightly coupled to storage engine version. Upgrading requires downtime.

Real-World Analogy: The Newspaper Printing Press

Think of leader-based replication like a newspaper with one central editor (leader) and multiple printing presses (followers):

The editor receives all articles, fact-checks them, and creates the master copy
Printing presses receive the finalized pages and produce copies for distribution
Readers can get papers from any printing press (read from any replica)
Only the editor can change the content (writes go to leader)

When the editor gets sick (leader fails):

The most experienced journalist is promoted to editor (failover)
Some late-breaking stories might be lost (unreplicated writes)
If two journalists both claim to be editor, chaos ensues (split brain)

Key Takeaways

Leader-based replication is the dominant pattern: One node accepts writes, others follow. It's simple, battle-tested, and used by PostgreSQL, MySQL, MongoDB, Kafka, and more.
Sync vs Async is a fundamental trade-off: Synchronous guarantees durability but kills availability. Asynchronous is fast but can lose data. Semi-synchronous is the practical compromise.
Failover is dangerous, not automatic: Split brain, lost writes, and cascading failures are real risks. Many ops teams prefer manual failover despite automation being available.
Position markers are critical: Snapshots need exact positions in the replication log (LSN, binlog coordinates) to enable catch-up without missing or duplicating data.
The GitHub incident is a warning: Autoincrementing IDs + failover + external systems (Redis) = potential security disaster. Replication bugs don't stay contained.

Common Pitfalls

| ❌ Misconception | ✅ Reality | |-----------------|-----------| | "Replication makes my data safe" | Only if synchronous; async replication can lose confirmed writes | | "Failover is automatic and safe" | It's one of the hardest problems in distributed systems; split brain can corrupt data | | "More replicas = more durable" | Only if at least one is synchronous; 10 async replicas all lag behind | | "Replication lag is milliseconds" | Cross-region can be seconds; cross-continent can be minutes | | "Read replicas always have current data" | Never assume this—design for eventual consistency | | "Statement-based replication is fine" | Nondeterministic functions (NOW, RAND) break everything |

What's Next?

We've covered the foundation—leader-based replication. But what happens when you need multiple leaders across data centers? What about leaderless systems like Cassandra and Dynamo?

In the next section, we'll explore Problems with Replication Lag—the consistency anomalies that arise when followers fall behind, and the guarantees (read-your-writes, monotonic reads) that help tame the chaos.