The Foundation
CAP, ACID vs BASE, latency numbers, back-of-envelope estimation, single points of failure — the vocabulary every system designer thinks in.
What system design actually means
Writing code that works on your laptop is engineering. Designing a system that stays up for a hundred million people, survives entire datacentres going dark, and answers in milliseconds from anywhere on earth — that is something else. System design is the deliberate part. The choices you make before the first line of code, that decide whether your architecture will buckle at 10k requests per second or whether it will quietly absorb a hundred times that.
The distinction between low-level design (LLD) and high-level design (HLD) is worth getting straight up front, because interviews and job descriptions blur them.
Low-Level Design High-Level Design
──────────────── ─────────────────
Classes, functions Services, data flow
"Design a parking lot in code" "Design Uber"
UML, design patterns Architecture diagrams
Line-by-line Boxes and arrows
Output: code structure Output: tradeoff decisions
This series is entirely about HLD. We are not picking variable names. We are deciding whether to shard a database, whether to put a message queue between two services, whether eventual consistency is acceptable for this particular feature. The output of HLD is rarely code — it is a diagram and a set of justified choices.
Here is the framing that helps most. A village of a hundred people needs one well. A city of ten million needs treatment plants, reservoirs, pipes, pumping stations, and a maintenance crew. You cannot just make the well bigger. The architecture is fundamentally different. Software is identical — the same app that serves your ten friends will not serve your ten million users with more RAM. It needs a different shape.
Every design decision in this entire series is, underneath, a balance between four properties: scalability (handle more load by adding resources), reliability (keep working when components fail), performance (respond fast under load), and maintainability (change without breaking). You can never maximise all four. A senior engineer does not find the perfect solution — they pick the best tradeoff for the context and can explain why.
CAP theorem — what you actually have to give up
CAP is the most-quoted and most-misunderstood theorem in distributed systems. The statement is short: in any distributed data store, when a network partition happens, you must choose between consistency and availability. You cannot have both.
The three letters mean specific things:
- Consistency. Every read returns the most recent write, or an error. "Consistency" here is NOT the database C in ACID. It is linearizability — all clients see the same data at the same time, as if there were one global copy.
- Availability. Every request gets a non-error response. The system answers, even if the answer is slightly stale.
- Partition tolerance. The system keeps working when network links between nodes fail.
The theorem is often stated as "pick two of three." That is misleading. In any real distributed system running across more than one machine, partitions WILL happen — switches fail, cables get unplugged, packets get dropped. P is not optional. The real choice is what to do during a partition: keep answering with possibly-stale data (AP) or refuse to answer until the partition heals (CP).
Network partition!
┌─────────┐ ✗ ┌─────────┐
│ Node A │ ────── │ Node B │
│ user=42 │ │ user=99 │
└─────────┘ └─────────┘
│ │
read returns 42 read returns 99
write succeeds write succeeds
↳ AP system: both nodes keep serving, data diverges, reconcile later
↳ CP system: one node refuses writes (or all writes) until partition heals
Map that onto real systems and the choices become concrete:
- CP systems prioritise consistency. They will refuse to serve some requests during a partition rather than serve stale data. Examples: most relational databases in their default config, etcd, ZooKeeper, HBase, the metadata layer of MongoDB. Use these when correctness matters more than uptime — banking ledgers, inventory counts where overselling is unacceptable, leader election.
- AP systems prioritise availability. They keep answering during partitions and reconcile later, which means clients can briefly see stale or conflicting data. Examples: Cassandra, DynamoDB in its default mode, Riak, CouchDB. Use these when uptime matters more than instant correctness — social feeds, product reviews, shopping carts, telemetry.
The practical lesson: CAP is not a property of "a database." It is a property of how a database is configured and used. DynamoDB can be tuned toward C (strongly consistent reads) or toward A (eventually consistent reads) per query. MongoDB lets you set read and write concerns per operation. The right question is never "is this database CP or AP?" — it is "for THIS query, which guarantee do I need?"
A practical extension is PACELC: when there is a Partition, choose A or C; Else (normal operation), choose between Latency and Consistency. Even with no partition, strong consistency costs latency (cross-region quorums take time). DynamoDB is PA/EL — favours availability during partition, low latency otherwise. A traditional RDBMS is PC/EC — favours consistency always, accepts the latency cost. This framing is more useful than CAP alone.
ACID vs BASE — two different worlds
ACID and BASE describe two ways a data store can guarantee — or deliberately not guarantee — what happens to your data. They evolved for different problems and they cost different things.
ACID is the classical RDBMS contract:
- Atomicity — a transaction is all-or-nothing. Either every write in the transaction succeeds, or none of them do. No partial failures visible to anyone else.
- Consistency — every transaction moves the database from one valid state to another. Constraints, foreign keys, and triggers hold. (Different from CAP's C.)
- Isolation — concurrent transactions don't trip over each other. The result is as if they had run one after another. (Most databases offer several isolation levels with different tradeoffs.)
- Durability — once a transaction commits, it survives crashes, power loss, and reboots. The write is on disk, fsync'd, replicated.
The canonical example is a bank transfer. Debit account A, credit account B. If the debit succeeds and the credit fails, money has vanished. ACID makes that impossible — either both happen or neither does. This is why every banking, ledger, and inventory system you have ever used runs on an ACID database, regardless of what marketing says.
BASE is what you get when you trade ACID guarantees for scale and availability:
- Basically Available — the system responds, even during partial failures. It may serve stale data.
- Soft state — data can change without an external write, because replicas converge in the background.
- Eventual consistency — given enough time and no new writes, all replicas will converge to the same value. Eventually. The window can be milliseconds or, in pathological cases, minutes.
BASE was coined when Amazon and Google realised that for many workloads — product catalogues, social feeds, recommendations — strict consistency was the wrong tradeoff. A user briefly seeing yesterday's review count is fine. A user being unable to load the page because one node's replica is lagging is not fine.
ACID BASE
───── ─────
PostgreSQL Cassandra
MySQL (InnoDB) DynamoDB (eventual mode)
Oracle, SQL Server Riak
"correctness or nothing" "answer first, reconcile later"
The honest framing: ACID and BASE are not competitors. They are different toolboxes for different problems. A real production system uses both — the user-profile service might be on Postgres (ACID), the activity feed on DynamoDB (BASE), the search index on Elasticsearch (its own model). The skill is matching the guarantee to the workload, not picking a side.
Watch out for one trap: "NoSQL" does NOT mean "BASE." MongoDB has ACID transactions since 4.0. DynamoDB has ACID transactions across items. The labels have blurred. Read the actual consistency model of the database you are evaluating, not the marketing bucket.
Latency numbers every designer should memorise
Jeff Dean's famous list of "numbers every programmer should know" is the single most useful mental model for back-of-envelope reasoning about performance. Most architectural mistakes are about not respecting how slow some operations are relative to others.
The approximate numbers, rounded for memorability:
Operation Time Relative
───────────────────────────────── ────────── ────────────
L1 cache reference 0.5 ns 1×
Branch mispredict 5 ns 10×
L2 cache reference 7 ns 14×
Mutex lock/unlock 25 ns 50×
Main memory reference (RAM) 100 ns 200×
Compress 1 KB with zippy 3,000 ns 6,000×
Send 1 KB over 1 Gbps network 10,000 ns 20,000×
Read 1 MB sequentially from RAM 250,000 ns 500,000×
Round trip within same datacenter 500,000 ns 1,000,000×
Read 1 MB sequentially from SSD 1,000,000 ns 2,000,000×
Disk seek (HDD) 10,000,000 ns 20,000,000×
Read 1 MB sequentially from HDD 20,000,000 ns 40,000,000×
Round trip CA → Netherlands → CA 150,000,000 ns 300,000,000×
A few orders-of-magnitude facts worth burning into memory:
- RAM is roughly 100× faster than SSD, which is roughly 10× faster than spinning disk.
- A network round trip in the same datacentre is roughly 100× slower than reading from local RAM.
- A cross-continent round trip is another 300× slower again.
- Disk seeks on spinning disks are catastrophically slow — never seek if you can scan.
What this teaches you in practice: a query that hits an in-memory index in your app server is ~100 ns. The same query, if it forces a database call, is ~500 µs minimum (the round trip). If the database itself has to read from disk, add another millisecond. If you make 50 such queries to render a single page, you have spent 25 ms before the database even started thinking — and your user is in another country, so add 150 ms for the response to come back.
This is why caching is not a nice-to-have. It is why batching exists. It is why N+1 queries are a problem. It is why CDNs put content close to users. Every architectural pattern in this series is a response to one of these latency numbers being inconveniently large.
Back-of-envelope estimation
Senior engineers can sketch the rough shape of a system in five minutes. The skill is not magic — it is a small set of numbers you have memorised, applied with confident arithmetic. Interviewers ask "design Twitter" partly to see whether you can produce a plausible storage and QPS estimate without seizing up.
The constants to keep in your head:
- 1 day ≈ 86,400 seconds ≈ 10⁵ seconds (round up — it makes the arithmetic clean).
- 1 month ≈ 2.6 million seconds ≈ 2.5 × 10⁶ seconds.
- 1 year ≈ 3 × 10⁷ seconds.
- A modern commodity server: ~100,000 QPS for in-memory work, ~10,000 QPS for normal CRUD, ~1,000 QPS for heavy queries. (Rough — varies wildly with workload.)
- Memory: ~64 GB on a typical app server, ~256 GB on a beefy database server.
- Storage: SSDs are now ~1-4 TB at commodity prices; cloud object storage is effectively unlimited at ~$0.02/GB/month.
The canonical interview drill: estimate the storage and bandwidth for Twitter.
Assumption Number
──────────── ──────────────
Daily active 200 million
Tweets/user 2 average
Tweets/day 400 million ≈ 4 × 10⁸
Tweets/sec ~5,000 average, peak maybe 10×
Bytes/tweet 300 (text + metadata)
Daily storage 4 × 10⁸ × 300 = 1.2 × 10¹¹ B = 120 GB/day
Yearly ~44 TB tweets/year (text only)
Reads/sec Reads >> writes. ~200k reads/sec is plausible.
Once you have these numbers, real architectural choices fall out of them. 5k tweets per second is easily absorbed by a single sharded relational database — the write rate is not the bottleneck. The interesting load is the read fan-out for newsfeeds. 200 million users reading their timelines means you cannot compute feeds on demand for celebrities with 100M followers — you need a hybrid fan-out strategy (we cover this in Module 8).
The procedure is always the same:
1. State assumptions out loud. "Let's say 100 million daily active users." The interviewer can correct you and you can adjust. 2. Round generously. 86,400 becomes 10⁵. 365 becomes 400. You are estimating, not auditing. 3. Compute reads and writes separately. They almost always have very different ratios. 4. Compute storage growth per year. This decides whether you need sharding from day one. 5. Identify the bottleneck. Storage, write QPS, read QPS, bandwidth, or fan-out — usually one dominates.
Do not skip step 1. The single most common mistake is plunging into arithmetic without stating what you are assuming. The numbers do not matter if the assumptions are wrong.
Single points of failure
A single point of failure (SPOF) is any component whose failure brings the whole system down. Designing for reliability is, in large part, the systematic removal of SPOFs. The pattern is mechanical: identify what could die, ask what happens if it does, and replicate or fail-over what cannot tolerate the answer.
A naive architecture is almost entirely SPOFs:
Client ──► Load Balancer ──► App Server ──► Database ──► Disk
↑ SPOF ↑ SPOF ↑ SPOF ↑ SPOF
Kill any one of those boxes and the system is gone. The progression of fixes runs roughly:
- Replicate the app server. Two or more behind the load balancer. Now an app server can die without taking the site down. The load balancer routes traffic to whichever is healthy.
- Replicate the load balancer. Use DNS-level health checks or run two load balancers in active-passive (or active-active) mode behind a virtual IP. Cloud providers offer managed load balancers that are themselves replicated across availability zones.
- Replicate the database. Primary plus one or more replicas. The replicas serve reads. If the primary dies, a standby is promoted. Tools like PostgreSQL streaming replication, MySQL group replication, or managed offerings like RDS Multi-AZ handle the mechanics. Caveat: automatic failover is hard. Split-brain (two nodes both believing they are primary) is the classic disaster — covered properly in Module 7 on distributed systems.
- Replicate the disk. RAID, replicated storage (Ceph), or just multiple replicas across machines, which subsumes RAID. Cloud block storage (EBS, GCP Persistent Disk) is already replicated under the hood.
- Replicate the datacentre. Multi-AZ for synchronous replication within a region. Multi-region for disaster recovery — the entire region (multiple AZs) goes down, and you fail over to another continent. Multi-region is expensive and complicated; most systems do not need it.
A useful drill: pick any architecture diagram you have drawn and ask, for every single box and every single line, "what happens if this dies?" If the answer is "the site is down," you have found a SPOF. The line items, the network links, the DNS provider, the certificate authority — all are potential SPOFs.
Two SPOFs people consistently forget:
- Configuration / secrets stores. If your app cannot start without HashiCorp Vault or AWS Secrets Manager, those services are SPOFs. Cache secrets in memory at startup; don't fetch them on every request.
- Deployments. If your build pipeline goes down, you cannot ship a fix during an incident. Have a manual emergency path — being able to SSH to a server and edit a file is a deeply unfashionable but occasionally life-saving capability.
Reliability targets are usually expressed in nines. 99.9% ("three nines") is roughly 8.7 hours of downtime per year. 99.99% is 52 minutes. 99.999% is 5 minutes. Each extra nine costs roughly an order of magnitude more in engineering and infrastructure. Match the target to what the business actually needs — five nines is for emergency dispatch, not for a hobby SaaS.
What this gives you for the rest of the series
This module is the vocabulary. Every other module references these ideas, often without re-explaining them.
A quick gist of what carries forward:
- When we say a system "prefers A over C," we mean it sits on the AP side of CAP, like Cassandra or DynamoDB's default reads.
- When we estimate storage for a design, we use the back-of-envelope drill from this module — daily users, average payload, write rate, read rate.
- When we add a replica, a load balancer, or a queue, we are removing a SPOF — and the choice of whether the replica is sync or async is implicitly a CAP/PACELC choice.
- Every choice between a relational database and a NoSQL one (the next module's whole topic) is partly an ACID-vs-BASE choice.
- Every architectural decision is, at some level, a latency-numbers decision. Crossing the network is 1000× slower than touching RAM. That single fact drives caching, batching, denormalisation, and CDN placement.
The next module looks at data storage in detail — relational, NoSQL, indexing, sharding, replication, object storage, and the newer NewSQL hybrids. Everything we cover after that is patterns and components that live on top of a storage layer you chose deliberately.
⁂ Back to all modules