Home
High-Level System Design / Module 1 — The Foundation

The Foundation

CAP, ACID vs BASE, latency numbers, back-of-envelope estimation, single points of failure — the vocabulary every system designer thinks in.


What system design actually means

Writing code that works on your laptop is engineering. Designing a system that stays up for a hundred million people, survives entire datacentres going dark, and answers in milliseconds from anywhere on earth — that is something else. System design is the deliberate part. The choices you make before the first line of code, that decide whether your architecture will buckle at 10k requests per second or whether it will quietly absorb a hundred times that.

The distinction between low-level design (LLD) and high-level design (HLD) is worth getting straight up front, because interviews and job descriptions blur them.

text
   Low-Level Design                 High-Level Design
   ────────────────                 ─────────────────
   Classes, functions               Services, data flow
   "Design a parking lot in code"   "Design Uber"
   UML, design patterns             Architecture diagrams
   Line-by-line                     Boxes and arrows
   Output: code structure           Output: tradeoff decisions

This series is entirely about HLD. We are not picking variable names. We are deciding whether to shard a database, whether to put a message queue between two services, whether eventual consistency is acceptable for this particular feature. The output of HLD is rarely code — it is a diagram and a set of justified choices.

Here is the framing that helps most. A village of a hundred people needs one well. A city of ten million needs treatment plants, reservoirs, pipes, pumping stations, and a maintenance crew. You cannot just make the well bigger. The architecture is fundamentally different. Software is identical — the same app that serves your ten friends will not serve your ten million users with more RAM. It needs a different shape.

Every design decision in this entire series is, underneath, a balance between four properties: scalability (handle more load by adding resources), reliability (keep working when components fail), performance (respond fast under load), and maintainability (change without breaking). You can never maximise all four. A senior engineer does not find the perfect solution — they pick the best tradeoff for the context and can explain why.

CAP theorem — what you actually have to give up

CAP is the most-quoted and most-misunderstood theorem in distributed systems. The statement is short: in any distributed data store, when a network partition happens, you must choose between consistency and availability. You cannot have both.

The three letters mean specific things:

The theorem is often stated as "pick two of three." That is misleading. In any real distributed system running across more than one machine, partitions WILL happen — switches fail, cables get unplugged, packets get dropped. P is not optional. The real choice is what to do during a partition: keep answering with possibly-stale data (AP) or refuse to answer until the partition heals (CP).

text
   Network partition!
   ┌─────────┐    ✗    ┌─────────┐
   │ Node A  │ ──────  │ Node B  │
   │ user=42 │         │ user=99 │
   └─────────┘         └─────────┘
        │                   │
   read returns 42     read returns 99
   write succeeds      write succeeds
   ↳ AP system: both nodes keep serving, data diverges, reconcile later
   ↳ CP system: one node refuses writes (or all writes) until partition heals

Map that onto real systems and the choices become concrete:

The practical lesson: CAP is not a property of "a database." It is a property of how a database is configured and used. DynamoDB can be tuned toward C (strongly consistent reads) or toward A (eventually consistent reads) per query. MongoDB lets you set read and write concerns per operation. The right question is never "is this database CP or AP?" — it is "for THIS query, which guarantee do I need?"

A practical extension is PACELC: when there is a Partition, choose A or C; Else (normal operation), choose between Latency and Consistency. Even with no partition, strong consistency costs latency (cross-region quorums take time). DynamoDB is PA/EL — favours availability during partition, low latency otherwise. A traditional RDBMS is PC/EC — favours consistency always, accepts the latency cost. This framing is more useful than CAP alone.

ACID vs BASE — two different worlds

ACID and BASE describe two ways a data store can guarantee — or deliberately not guarantee — what happens to your data. They evolved for different problems and they cost different things.

ACID is the classical RDBMS contract:

The canonical example is a bank transfer. Debit account A, credit account B. If the debit succeeds and the credit fails, money has vanished. ACID makes that impossible — either both happen or neither does. This is why every banking, ledger, and inventory system you have ever used runs on an ACID database, regardless of what marketing says.

BASE is what you get when you trade ACID guarantees for scale and availability:

BASE was coined when Amazon and Google realised that for many workloads — product catalogues, social feeds, recommendations — strict consistency was the wrong tradeoff. A user briefly seeing yesterday's review count is fine. A user being unable to load the page because one node's replica is lagging is not fine.

text
   ACID                       BASE
   ─────                      ─────
   PostgreSQL                 Cassandra
   MySQL (InnoDB)             DynamoDB (eventual mode)
   Oracle, SQL Server         Riak
   "correctness or nothing"   "answer first, reconcile later"

The honest framing: ACID and BASE are not competitors. They are different toolboxes for different problems. A real production system uses both — the user-profile service might be on Postgres (ACID), the activity feed on DynamoDB (BASE), the search index on Elasticsearch (its own model). The skill is matching the guarantee to the workload, not picking a side.

Watch out for one trap: "NoSQL" does NOT mean "BASE." MongoDB has ACID transactions since 4.0. DynamoDB has ACID transactions across items. The labels have blurred. Read the actual consistency model of the database you are evaluating, not the marketing bucket.

Latency numbers every designer should memorise

Jeff Dean's famous list of "numbers every programmer should know" is the single most useful mental model for back-of-envelope reasoning about performance. Most architectural mistakes are about not respecting how slow some operations are relative to others.

The approximate numbers, rounded for memorability:

text
   Operation                          Time            Relative
   ─────────────────────────────────  ──────────      ────────────
   L1 cache reference                 0.5 ns          1×
   Branch mispredict                  5 ns            10×
   L2 cache reference                 7 ns            14×
   Mutex lock/unlock                  25 ns           50×
   Main memory reference (RAM)        100 ns          200×
   Compress 1 KB with zippy           3,000 ns        6,000×
   Send 1 KB over 1 Gbps network      10,000 ns       20,000×
   Read 1 MB sequentially from RAM    250,000 ns      500,000×
   Round trip within same datacenter  500,000 ns      1,000,000×
   Read 1 MB sequentially from SSD    1,000,000 ns    2,000,000×
   Disk seek (HDD)                    10,000,000 ns   20,000,000×
   Read 1 MB sequentially from HDD    20,000,000 ns   40,000,000×
   Round trip CA → Netherlands → CA   150,000,000 ns  300,000,000×

A few orders-of-magnitude facts worth burning into memory:

What this teaches you in practice: a query that hits an in-memory index in your app server is ~100 ns. The same query, if it forces a database call, is ~500 µs minimum (the round trip). If the database itself has to read from disk, add another millisecond. If you make 50 such queries to render a single page, you have spent 25 ms before the database even started thinking — and your user is in another country, so add 150 ms for the response to come back.

This is why caching is not a nice-to-have. It is why batching exists. It is why N+1 queries are a problem. It is why CDNs put content close to users. Every architectural pattern in this series is a response to one of these latency numbers being inconveniently large.

Back-of-envelope estimation

Senior engineers can sketch the rough shape of a system in five minutes. The skill is not magic — it is a small set of numbers you have memorised, applied with confident arithmetic. Interviewers ask "design Twitter" partly to see whether you can produce a plausible storage and QPS estimate without seizing up.

The constants to keep in your head:

The canonical interview drill: estimate the storage and bandwidth for Twitter.

text
   Assumption     Number
   ────────────   ──────────────
   Daily active   200 million
   Tweets/user    2 average
   Tweets/day     400 million ≈ 4 × 10⁸
   Tweets/sec     ~5,000 average, peak maybe 10×
   Bytes/tweet    300 (text + metadata)
   Daily storage  4 × 10⁸ × 300 = 1.2 × 10¹¹ B = 120 GB/day
   Yearly         ~44 TB tweets/year (text only)
   Reads/sec      Reads >> writes. ~200k reads/sec is plausible.

Once you have these numbers, real architectural choices fall out of them. 5k tweets per second is easily absorbed by a single sharded relational database — the write rate is not the bottleneck. The interesting load is the read fan-out for newsfeeds. 200 million users reading their timelines means you cannot compute feeds on demand for celebrities with 100M followers — you need a hybrid fan-out strategy (we cover this in Module 8).

The procedure is always the same:

1. State assumptions out loud. "Let's say 100 million daily active users." The interviewer can correct you and you can adjust. 2. Round generously. 86,400 becomes 10⁵. 365 becomes 400. You are estimating, not auditing. 3. Compute reads and writes separately. They almost always have very different ratios. 4. Compute storage growth per year. This decides whether you need sharding from day one. 5. Identify the bottleneck. Storage, write QPS, read QPS, bandwidth, or fan-out — usually one dominates.

Do not skip step 1. The single most common mistake is plunging into arithmetic without stating what you are assuming. The numbers do not matter if the assumptions are wrong.

Single points of failure

A single point of failure (SPOF) is any component whose failure brings the whole system down. Designing for reliability is, in large part, the systematic removal of SPOFs. The pattern is mechanical: identify what could die, ask what happens if it does, and replicate or fail-over what cannot tolerate the answer.

A naive architecture is almost entirely SPOFs:

text
   Client ──► Load Balancer ──► App Server ──► Database ──► Disk
               ↑ SPOF             ↑ SPOF        ↑ SPOF       ↑ SPOF

Kill any one of those boxes and the system is gone. The progression of fixes runs roughly:

A useful drill: pick any architecture diagram you have drawn and ask, for every single box and every single line, "what happens if this dies?" If the answer is "the site is down," you have found a SPOF. The line items, the network links, the DNS provider, the certificate authority — all are potential SPOFs.

Two SPOFs people consistently forget:

Reliability targets are usually expressed in nines. 99.9% ("three nines") is roughly 8.7 hours of downtime per year. 99.99% is 52 minutes. 99.999% is 5 minutes. Each extra nine costs roughly an order of magnitude more in engineering and infrastructure. Match the target to what the business actually needs — five nines is for emergency dispatch, not for a hobby SaaS.

What this gives you for the rest of the series

This module is the vocabulary. Every other module references these ideas, often without re-explaining them.

A quick gist of what carries forward:

The next module looks at data storage in detail — relational, NoSQL, indexing, sharding, replication, object storage, and the newer NewSQL hybrids. Everything we cover after that is patterns and components that live on top of a storage layer you chose deliberately.


⁂ Back to all modules