Database Replication & Sharding
How databases scale beyond one machine. Read replicas, leader/follower, sharding, and the trade-offs.
When One Database Isn't Enough
Most apps run fine on a single database for a long time. Then one of three things happens:
- Read load grows — the database can't keep up with queries
- Write load grows — the database can't keep up with INSERTs/UPDATEs
- Data size grows — the database can't fit on one machine, or queries on huge tables get slow
These three problems have different solutions. Confusing them is one of the most expensive mistakes in backend engineering. Knowing which problem you actually have is half the battle.
Vertical Scaling — The Boring First Answer
Before you do anything fancy, give your database a bigger machine.
Modern cloud providers offer database instances with hundreds of GB of RAM, dozens of CPU cores, and tens of terabytes of fast SSD storage. PostgreSQL on a beefy machine can handle far more load than most engineers assume — millions of rows queried per second, tens of thousands of concurrent connections (with a pooler).
Companies routinely run multi-billion-dollar businesses on a single primary PostgreSQL or MySQL instance. Stack Overflow famously serves much of its traffic from one SQL Server. Figma, Notion, and Linear all run primarily on a small number of large Postgres databases.
The case for vertical scaling first:
• It's a config change, not an architecture change
• You don't change your code at all
• Operations stay simple — one database, one place to look
• You can scale up in minutes; you can't unsplit shards in minutes
When vertical scaling stops working:
• You hit the largest available instance — typically when you exceed a few TB of hot data
• Single-machine bottlenecks (network bandwidth, single-threaded operations) cap throughput
• Cost grows superlinearly — the biggest instances cost dramatically more than two medium ones
Run vertical scaling as long as you reasonably can. The complexity savings are enormous.
Read Replicas — Scaling Reads
When reads outpace what one database can serve, the simplest answer is replication: copy the database to additional read-only replicas.
┌──────────────┐
│ Primary │ ← all writes go here
│ (writable) │
└──────┬───────┘
│
streaming replication
│
┌──────────────────┼──────────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│Replica 1│ │Replica 2│ │Replica 3│
│read-only│ │read-only│ │read-only│
└─────────┘ └─────────┘ └─────────┘
▲ ▲ ▲
└──────── reads distributed ──────────┘
How it works:
1. Every write to the primary is appended to a replication log (PostgreSQL's WAL, MySQL's binlog).
2. Each replica continuously streams that log and replays it locally.
3. Replicas accept read queries — selects, joins, reports.
4. Writes always go to the primary.
The big win: read capacity scales horizontally. 5 replicas = roughly 5x read throughput.
The catch: replication lag. A write to the primary takes some time to appear on replicas. Usually milliseconds, occasionally seconds, rarely longer if the network hiccups.
This means your application code now has to make a decision: route this read to the primary (latest data, slower) or to a replica (possibly stale, faster)?
// Common pattern: read-your-writes via the primary
async function updateProfile(userId, data) {
await primaryDb.update(...);
// Don't immediately query the replica — it might not have caught up
// Either return the updated data from the primary, or wait, or use a session marker
return await primaryDb.findUser(userId);
}
// Reads that don't need the latest data → replica
async function getDashboard(userId) {
return await replicaDb.findRecentActivity(userId);
}
When it's right: read-heavy workloads (most apps), reporting and analytics, geographic distribution (replica per region for low-latency reads).
What it doesn't fix: write-heavy workloads. If your bottleneck is INSERTs, replicas don't help — every write still goes to the same primary.
Failover & High Availability
Once you have replicas, you have the building blocks for high availability. If the primary dies, promote a replica to become the new primary.
Primary (failing!) Primary (failing!) New Primary
│ │ │
┌────┴────┐ ─────► ┌────┴────┐ ─────► ┌────┴────┐
│ │ detect │ │ promote │ │
▼ ▼ failure ▼ ▼ replica ▼ ▼
Replica Replica Replica (stops) Replica Replica
│
▼
(gone)
Two flavors:
Manual failover — humans decide when to promote. Slower (minutes), but safe. Used at smaller scale.
Automatic failover — software detects the failure and promotes a replica. Faster (seconds) but risky: if the detection is wrong, you can have a "split brain" where two databases think they're primary.
Tools that handle this:
• Patroni for PostgreSQL
• Orchestrator for MySQL
• AWS RDS / Aurora — managed automatic failover
• Cloud SQL, Cloud Spanner, etc.
Replication mode matters too:
• Asynchronous — primary commits without waiting for replicas. Fast, but on failover you might lose the last few writes.
• Synchronous — primary waits for replica acknowledgment. Slower writes, no data loss on failover.
• Semi-synchronous — wait for at least one replica's ack. Compromise.
For most apps, async + automated failover gives the right trade-off: fast writes during normal operation, very rare data loss on failure.
Sharding — Scaling Writes & Storage
Replicas don't scale writes. When even your primary can't keep up — or the data is bigger than fits on one machine — you need sharding.
Sharding splits your data ACROSS multiple databases. Each shard owns a subset of the data; together they make up the whole.
┌─────────────────────────────────────┐
│ All Users │
└─────────────────────────────────────┘
│
partitioned by user_id
│
┌────────────┼────────────┐
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│Shard 1 │ │Shard 2 │ │Shard 3 │
│users │ │users │ │users │
│A-H │ │I-P │ │Q-Z │
└────────┘ └────────┘ └────────┘
The shard key is the field used to decide which shard a row goes to. Pick a key that:
• Distributes data evenly (avoid hotspots)
• Aligns with your query patterns (related data should land on the same shard)
• Doesn't change (changing a shard key means moving the row)
Common shard keys: user_id (per-user data stays together), tenant_id (per-customer data stays together), region.
What gets harder with sharding:
1. Cross-shard queries are painful. "Top 10 most active users globally" now requires querying every shard and merging results.
2. Cross-shard transactions are nearly impossible. A transaction spanning two shards needs distributed protocols (2PC, sagas — see Module 31).
3. Joins across shards don't work. You either denormalize, fan out and join in application code, or accept that some queries can't be answered.
4. Resharding is brutal. Going from 4 shards to 8 shards means moving half your data. Some systems (Vitess, Citus) automate this; doing it manually is multi-month work.
5. Schema migrations multiply. Every ALTER TABLE has to run on every shard, with all the timing risks that implies.
Modern alternatives that hide sharding from you:
• Distributed SQL databases — CockroachDB, Spanner, TiDB, Vitess. They look like one database; under the hood they shard automatically.
• Document databases with built-in sharding — MongoDB, DynamoDB. Pick the shard key and it just works.
• Multi-tenant strategies — for SaaS, sharding by tenant is often natural and clean (see Module 38).
The progression most teams follow:
1. Single database (probably forever, honestly)
2. Read replicas (scale reads + HA)
3. Vertical scaling to the largest instance (scale writes)
4. Logical partitioning (separate hot tables to their own database)
5. True sharding (only when nothing else works)
Most startups never need step 4 or 5. Companies that succeed past step 5 (Pinterest, Slack, Stripe) have engineering teams of hundreds dedicated to data infrastructure. Don't shard prematurely — it's an order of magnitude more operational complexity.
⁂ Back to all modules