Home
High-Level System Design / Module 3 — Caching

Caching

Why caching exists, the strategies (cache-aside, write-through, write-behind), Redis, eviction, invalidation, stampedes, and CDNs.


Why caches exist at all

The first module of this series listed the latency numbers. A round trip to your database in the same datacentre is roughly 500 microseconds. A read from local memory is roughly 100 nanoseconds. The difference is a factor of 5,000. If you can answer a query without crossing the network, you save your user's time and you save your database's life.

That is the entire reason caches exist. A cache is a faster, smaller copy of data that lives closer to the consumer. Faster (usually in-memory). Smaller (you can't fit your whole database in RAM). Closer (in the same process, the same machine, or the same datacentre as the code that needs it).

There are several layers where a cache typically sits in a real system:

text
   Browser cache         (closest, smallest)
        │
        ▼
   CDN cache             (geographically near user)
        │
        ▼
   API/edge cache        (e.g. Cloudflare Workers)
        │
        ▼
   Application cache     (Redis, Memcached — shared across app servers)
        │
        ▼
   In-process cache      (LRU map inside one process)
        │
        ▼
   Database              (source of truth)

Each layer trades capacity for speed. The browser cache is bytes from local disk. The CDN cache is one network hop away in your continent. The database is wherever it lives, with a query optimiser working hard for every single request. Most real performance work is about getting more requests answered by the layers nearer the top.

The rule of thumb that matters: a single Redis instance can serve roughly 100,000 reads per second on commodity hardware. A single Postgres server might serve 5,000 to 20,000 read queries per second. Caching the right thing in the right layer can be the difference between needing 50 database servers and needing one.

Caching strategies

There are four core patterns for how data flows between application, cache, and database. Picking the right one for each piece of data is the single most important caching decision.

1. Cache-aside (lazy loading). The application is in charge of the cache. On a read: check the cache; if miss, load from the database, store in cache, return. On a write: update the database, then either update or invalidate the cache.

text
   READ:
   App ──► Cache ──► (hit?) ──► return value
             │
             └──── (miss) ──► DB ──► put in cache ──► return

   WRITE:
   App ──► DB ──► invalidate cache entry

Most common pattern. Simple. Cache only contains what was actually requested. Stale-data risk is bounded by how aggressively you invalidate.

2. Read-through. The cache itself loads from the database on a miss. The app talks only to the cache. The cache library or middleware handles the database connection.

Difference from cache-aside is mostly who owns the load logic. Read-through centralises it inside the cache layer; cache-aside scatters it through the application. Read-through is cleaner; cache-aside is more flexible.

3. Write-through. Writes go through the cache to the database. The cache always has the latest value because every write updates both.

text
   WRITE:
   App ──► Cache ──► DB
              ▲
              ▼
           always in sync

Eliminates the read-after-write staleness problem. Costs: every write does double work (cache + DB), and writes that get cached but never read pollute the cache. Use when reads heavily outnumber writes AND every write needs to be visible immediately.

4. Write-behind (write-back). Writes go to the cache, get acknowledged, and are flushed to the database asynchronously in the background.

text
   WRITE:
   App ──► Cache  (acks immediately)
             │
             ▼
         (later, batched)
             ▼
            DB

Massively reduces write latency and the database's write load. The cost is danger: if the cache crashes before flushing, those writes are lost. Use only for data where this is acceptable — analytics counters, view counts, non-financial telemetry. Never for user data or money.

The practical default: use cache-aside for most application data, write-through for cache-resident user sessions, and write-behind only for high-volume counters where the occasional lost write is acceptable.

Redis — the workhorse

Redis is the in-memory data store that has eaten the application-caching market. Memcached predates it and is still in use, but Redis offers much more — rich data types, persistence, replication, pub/sub, Lua scripting — without being noticeably slower for the simple cases.

The relevant facts:

A few patterns that show up everywhere:

text
   # Session storage
   SET session:abc123 "{user_id:42,exp:1729012345}" EX 3600

   # Rate limiting (fixed window)
   INCR rate:192.0.2.1:minute
   EXPIRE rate:192.0.2.1:minute 60
   ↳ Reject when count > limit

   # Leaderboard
   ZADD scores 1500 "alice"
   ZREVRANGE scores 0 9          ← top 10 with scores
   ZRANK scores "alice"          ← Alice's rank

   # Distributed lock (simplest form — see Module 7 for the real one)
   SET lock:order-42 "server-A" NX EX 30

Memory management is where Redis bites people. The two failure modes are: (1) the working set grows beyond available memory and Redis starts evicting things you wanted to keep, or (2) Redis runs out of memory entirely and starts refusing writes. Set maxmemory and maxmemory-policy explicitly. The default policy noeviction will fail writes when full; allkeys-lru evicts least-recently-used keys (usually what you want for a cache).

Clustering. A single Redis node tops out at the RAM of one machine and ~100k ops/sec. Redis Cluster shards keys across multiple nodes using consistent hashing on a key's hash slot. Each shard typically has a primary and one or two replicas. Cross-shard transactions and multi-key operations are restricted (the keys involved must hash to the same slot, which you achieve by putting {...} tags in the key names: user:{42}:profile and user:{42}:settings share a slot).

Persistence-or-not. A cache that is allowed to be empty after a restart can run persistence-free, taking the highest throughput. A cache that powers a session store probably wants AOF persistence — losing every active user's session on a reboot is rude. Be deliberate. The default of RDB snapshots every 15 minutes is usually wrong for both ends of the spectrum.

Eviction policies — what gets thrown out

A cache that never evicts is not a cache; it is a slow database. Eviction is the act of removing entries to make room for new ones. The policy decides which entry leaves.

The common policies:

Redis lets you configure several policies; the practical ones are allkeys-lru (most-common cache use), volatile-lru (only evict keys that have a TTL), allkeys-lfu (better for stable hot sets), and volatile-ttl (evict the soonest-expiring first).

A subtle but important detail: TTLs serve two purposes that people conflate.

When you set a TTL, know which one you mean. A capacity TTL that becomes a de-facto freshness contract is how bugs are born — someone changes the cache TTL from 1 hour to 1 day for cost reasons and breaks a pricing feature that was relying on the freshness.

Invalidation — the hardest problem

"There are only two hard things in computer science: cache invalidation and naming things." The line is glib but the cache half is real. The whole point of caching is keeping a copy of something; the moment the original changes, your copy is wrong, and finding every wrong copy is non-trivial.

Three broad strategies:

1. TTL-based (passive invalidation). Every cached entry has a max age. After it expires, the next read causes a refetch. The cache and the source can diverge for up to one TTL.

Works well when: bounded staleness is acceptable, the data changes on a known cadence, and the alternative (active invalidation) is too expensive. Almost every CDN works this way.

2. Explicit invalidation (active). When the underlying data changes, the application explicitly deletes or updates the cached entry.

python
# Update the database, then invalidate the cache
db.update_user(user_id, new_data)
cache.delete(f"user:{user_id}")

Works well when: the application owns all writes and knows every cache key derived from them. Falls down when caches are derived from joins or aggregations — invalidating "the top-10 trending posts" cached value when a single post is edited is surprisingly hard.

3. Event-based / change-data-capture (CDC). A change feed (Debezium, Postgres logical replication, DynamoDB streams, Kafka events) drives invalidation. The application doesn't have to remember to invalidate — the database tells you what changed.

Works well when: writes happen from many sources (background jobs, external integrations, manual SQL fixes), and you cannot trust every writer to call the invalidation API.

A classic distributed bug: write skew between cache and database. The sequence:

text
   t1: Service A reads user:42 from DB → {name: "Old"}
   t2: Service B writes user:42 to DB  → {name: "New"}
   t2: Service B invalidates cache key user:42
   t3: Service A puts the value it read in step t1 into the cache
   ── Cache now contains "Old" forever (or until next TTL).

The fix is one of: (a) write-through caching (the cache is always written with the database transaction), (b) use a versioned/timestamped key (read the version with the data and don't overwrite a newer one), (c) only let one writer per key set the cache, with a short window between DB write and cache write. None of these is free.

The pragmatic stance: pick the weakest invalidation that gives correct behaviour. TTLs of seconds are an acceptable fallback for most things. Hard "this must update instantly across the whole fleet" requirements should be rare, and when they exist, the right answer is usually to not cache that particular piece of data.

Cache stampedes and thundering herds

Imagine 10,000 application servers all trying to render the same page. Each of them looks up popular:posts in the cache. The TTL expires at the same instant. All 10,000 servers simultaneously discover the cache miss and slam the database with the same query.

That is a cache stampede (also called a thundering herd). It happens whenever a popular cache entry expires and many clients see the miss before any one of them has refilled it. The database — which was sitting comfortably serving zero traffic for that key — suddenly takes the full burst. Often that burst is enough to take the database down, which makes the cache miss permanent for as long as the database is down, which means every subsequent request is also a miss, which the database also cannot answer. You have a cascading failure caused by the cache.

The fixes, in increasing order of sophistication:

1. Stale-while-revalidate. Don't delete the value when it expires — mark it as stale. The first request after expiry returns the stale value AND triggers a background refresh. Subsequent requests during the refresh also get the stale value. Only one goroutine talks to the database.

text
   t0: cache has fresh value, TTL = 60s
   t60: TTL expires. Mark stale, don't delete.
   t61: Request comes in. Return stale value. Spawn refresh in background.
   t61.05: Refresh completes. Cache fresh again.
   t61.01-t61.05: Other requests get stale value (no DB stampede).

2. Probabilistic early refresh. Each request to a near-expiring cache entry has a small chance of triggering a refresh, with the probability rising as expiry approaches. By the time the TTL actually fires, the refresh has usually already happened. Implemented in libraries as XFetch.

3. Locking / single-flight. When a cache miss happens, only one process is allowed to query the database. Others wait for it (or get a stale value or a fallback). Go's singleflight package and Java's Caffeine cache implement this. The lock can be local (per-server) or distributed (a Redis SET NX lock).

4. Pre-warming. For very expensive computations (top-N rankings, recommendations), don't rely on a request to trigger the refresh. Run a background job on a schedule that always keeps the cache full.

5. Negative caching. Cache the absence of a value too. A request for user:99999999999 that returns nothing should cache that fact for a short time, otherwise a botnet hammering bad IDs can stampede the database with lookups for things that don't exist.

The simplest one-line defence against most stampedes is TTL jitter: when you set a TTL of 60 seconds, actually set 50-70 seconds randomly. The cache entries no longer expire in sync; the burst smears across 20 seconds. It is one line of code and it prevents 90% of stampede pain.

CDN caching — pushing the cache to the edge

A CDN (Content Delivery Network) is a cache that lives at the geographic edge of the internet. Cloudflare, Fastly, Akamai, CloudFront, and others operate hundreds of edge locations worldwide. When a user requests a URL, the request hits the nearest edge first; if the edge has a cached response, it returns it without ever talking to your origin server.

For static content (images, CSS, JS bundles, fonts, downloads), this is a huge win — the user's request might travel 50 km to a city-level POP instead of 8,000 km to your origin. The bandwidth cost is offloaded to the CDN. Many requests never touch your servers at all.

The HTTP machinery that drives CDN caching is the Cache-Control header:

text
   Cache-Control: public, max-age=86400, s-maxage=604800, immutable
   ├── public        Cacheable by CDN and browser
   ├── max-age       Browser cache duration (seconds)
   ├── s-maxage      Shared (CDN) cache duration
   └── immutable     This content will never change — never revalidate

   Cache-Control: private, no-store
   ├── private       Don't cache in shared caches (per-user data)
   └── no-store      Don't cache anywhere (sensitive data)

The canonical pattern for SPAs: hash the static asset filenames during the build (app.a1b2c3.js), set Cache-Control: public, max-age=31536000, immutable on them, and never invalidate. The HTML that references them is Cache-Control: no-cache. Now you get free, infinite caching on bytes that will never change, and instant deploys on the entry-point.

Dynamic content is harder. The strategies:

The big gotcha: cookies and personalisation kill caching. If your homepage sets a session cookie on first visit, the response is no longer cacheable by the CDN (it differs per user). Strip cookies from responses the CDN will cache, or move personalisation entirely to the client side (the API endpoint is uncached, the page shell is cached).

A real-world result: a properly-configured CDN for a typical content site offloads 90% or more of requests from the origin. The origin only sees cache misses, real users hitting truly dynamic endpoints, and bots. The cost savings and reliability improvements are dramatic. Caching is the lever, in real systems, that gives you the biggest bang for the least architectural complexity. The next module looks at the networking and infrastructure that gets those bytes to users in the first place.


⁂ Back to all modules