Caching
Why caching exists, the strategies (cache-aside, write-through, write-behind), Redis, eviction, invalidation, stampedes, and CDNs.
Why caches exist at all
The first module of this series listed the latency numbers. A round trip to your database in the same datacentre is roughly 500 microseconds. A read from local memory is roughly 100 nanoseconds. The difference is a factor of 5,000. If you can answer a query without crossing the network, you save your user's time and you save your database's life.
That is the entire reason caches exist. A cache is a faster, smaller copy of data that lives closer to the consumer. Faster (usually in-memory). Smaller (you can't fit your whole database in RAM). Closer (in the same process, the same machine, or the same datacentre as the code that needs it).
There are several layers where a cache typically sits in a real system:
Browser cache (closest, smallest)
│
▼
CDN cache (geographically near user)
│
▼
API/edge cache (e.g. Cloudflare Workers)
│
▼
Application cache (Redis, Memcached — shared across app servers)
│
▼
In-process cache (LRU map inside one process)
│
▼
Database (source of truth)
Each layer trades capacity for speed. The browser cache is bytes from local disk. The CDN cache is one network hop away in your continent. The database is wherever it lives, with a query optimiser working hard for every single request. Most real performance work is about getting more requests answered by the layers nearer the top.
The rule of thumb that matters: a single Redis instance can serve roughly 100,000 reads per second on commodity hardware. A single Postgres server might serve 5,000 to 20,000 read queries per second. Caching the right thing in the right layer can be the difference between needing 50 database servers and needing one.
Caching strategies
There are four core patterns for how data flows between application, cache, and database. Picking the right one for each piece of data is the single most important caching decision.
1. Cache-aside (lazy loading). The application is in charge of the cache. On a read: check the cache; if miss, load from the database, store in cache, return. On a write: update the database, then either update or invalidate the cache.
READ:
App ──► Cache ──► (hit?) ──► return value
│
└──── (miss) ──► DB ──► put in cache ──► return
WRITE:
App ──► DB ──► invalidate cache entry
Most common pattern. Simple. Cache only contains what was actually requested. Stale-data risk is bounded by how aggressively you invalidate.
2. Read-through. The cache itself loads from the database on a miss. The app talks only to the cache. The cache library or middleware handles the database connection.
Difference from cache-aside is mostly who owns the load logic. Read-through centralises it inside the cache layer; cache-aside scatters it through the application. Read-through is cleaner; cache-aside is more flexible.
3. Write-through. Writes go through the cache to the database. The cache always has the latest value because every write updates both.
WRITE:
App ──► Cache ──► DB
▲
▼
always in sync
Eliminates the read-after-write staleness problem. Costs: every write does double work (cache + DB), and writes that get cached but never read pollute the cache. Use when reads heavily outnumber writes AND every write needs to be visible immediately.
4. Write-behind (write-back). Writes go to the cache, get acknowledged, and are flushed to the database asynchronously in the background.
WRITE:
App ──► Cache (acks immediately)
│
▼
(later, batched)
▼
DB
Massively reduces write latency and the database's write load. The cost is danger: if the cache crashes before flushing, those writes are lost. Use only for data where this is acceptable — analytics counters, view counts, non-financial telemetry. Never for user data or money.
The practical default: use cache-aside for most application data, write-through for cache-resident user sessions, and write-behind only for high-volume counters where the occasional lost write is acceptable.
Redis — the workhorse
Redis is the in-memory data store that has eaten the application-caching market. Memcached predates it and is still in use, but Redis offers much more — rich data types, persistence, replication, pub/sub, Lua scripting — without being noticeably slower for the simple cases.
The relevant facts:
- Single-threaded for command processing. Each Redis instance runs commands on one thread. That sounds limiting; in practice it serves 100k+ ops/sec because the commands are O(1) memory operations and the kernel does most of the network work. It also means commands are atomic by construction — no command can interleave with another.
- In-memory primary, optional disk persistence. Redis can snapshot to disk (RDB) periodically or append every write to a log (AOF). Most cache use cases run with persistence off; database-replacement uses turn it on.
- Rich data types. Not just strings — also hashes (great for storing structured objects), lists, sets, sorted sets (which are unique to Redis and brilliant for leaderboards and rate limiters), streams, geospatial indexes, bitmaps, HyperLogLog (probabilistic cardinality).
A few patterns that show up everywhere:
# Session storage
SET session:abc123 "{user_id:42,exp:1729012345}" EX 3600
# Rate limiting (fixed window)
INCR rate:192.0.2.1:minute
EXPIRE rate:192.0.2.1:minute 60
↳ Reject when count > limit
# Leaderboard
ZADD scores 1500 "alice"
ZREVRANGE scores 0 9 ← top 10 with scores
ZRANK scores "alice" ← Alice's rank
# Distributed lock (simplest form — see Module 7 for the real one)
SET lock:order-42 "server-A" NX EX 30
Memory management is where Redis bites people. The two failure modes are: (1) the working set grows beyond available memory and Redis starts evicting things you wanted to keep, or (2) Redis runs out of memory entirely and starts refusing writes. Set maxmemory and maxmemory-policy explicitly. The default policy noeviction will fail writes when full; allkeys-lru evicts least-recently-used keys (usually what you want for a cache).
Clustering. A single Redis node tops out at the RAM of one machine and ~100k ops/sec. Redis Cluster shards keys across multiple nodes using consistent hashing on a key's hash slot. Each shard typically has a primary and one or two replicas. Cross-shard transactions and multi-key operations are restricted (the keys involved must hash to the same slot, which you achieve by putting {...} tags in the key names: user:{42}:profile and user:{42}:settings share a slot).
Persistence-or-not. A cache that is allowed to be empty after a restart can run persistence-free, taking the highest throughput. A cache that powers a session store probably wants AOF persistence — losing every active user's session on a reboot is rude. Be deliberate. The default of RDB snapshots every 15 minutes is usually wrong for both ends of the spectrum.
Eviction policies — what gets thrown out
A cache that never evicts is not a cache; it is a slow database. Eviction is the act of removing entries to make room for new ones. The policy decides which entry leaves.
The common policies:
- LRU (Least Recently Used). Evict the entry that hasn't been accessed for the longest time. The default in most caches. Good when access patterns are temporally clustered — what was hot recently is likely to be hot soon. Implemented as a doubly-linked list plus a hash map; an access moves the entry to the head of the list.
- LFU (Least Frequently Used). Evict the entry that has been accessed the fewest times. Better for workloads with a stable hot set — items that were popular yesterday are likely still popular today, even if they weren't accessed in the last hour. Costs more memory (you have to track counts) and risks "cache pollution" from old items that were once hot.
- FIFO (First In, First Out). Evict the oldest-inserted entry regardless of access. Simple, cheap, rarely the right choice — but useful when entries have natural lifetimes (rotating sessions).
- Random. Evict a random entry. Sounds dumb, performs surprisingly well, and is trivially cheap. Used internally by some caches as a fallback when LRU bookkeeping is too expensive.
- TTL-based. Every entry has a time-to-live; entries are removed when they expire. Often used in combination with one of the above — TTL guarantees eventual eviction; LRU/LFU handles capacity pressure between TTLs.
Redis lets you configure several policies; the practical ones are allkeys-lru (most-common cache use), volatile-lru (only evict keys that have a TTL), allkeys-lfu (better for stable hot sets), and volatile-ttl (evict the soonest-expiring first).
A subtle but important detail: TTLs serve two purposes that people conflate.
- Capacity TTLs are about keeping the cache from filling up with junk. "Sessions expire in 24 hours" is this.
- Freshness TTLs are about correctness — the cached value is allowed to be stale only for so long. "Product price expires in 60 seconds" is this.
When you set a TTL, know which one you mean. A capacity TTL that becomes a de-facto freshness contract is how bugs are born — someone changes the cache TTL from 1 hour to 1 day for cost reasons and breaks a pricing feature that was relying on the freshness.
Invalidation — the hardest problem
"There are only two hard things in computer science: cache invalidation and naming things." The line is glib but the cache half is real. The whole point of caching is keeping a copy of something; the moment the original changes, your copy is wrong, and finding every wrong copy is non-trivial.
Three broad strategies:
1. TTL-based (passive invalidation). Every cached entry has a max age. After it expires, the next read causes a refetch. The cache and the source can diverge for up to one TTL.
Works well when: bounded staleness is acceptable, the data changes on a known cadence, and the alternative (active invalidation) is too expensive. Almost every CDN works this way.
2. Explicit invalidation (active). When the underlying data changes, the application explicitly deletes or updates the cached entry.
# Update the database, then invalidate the cache
db.update_user(user_id, new_data)
cache.delete(f"user:{user_id}")
Works well when: the application owns all writes and knows every cache key derived from them. Falls down when caches are derived from joins or aggregations — invalidating "the top-10 trending posts" cached value when a single post is edited is surprisingly hard.
3. Event-based / change-data-capture (CDC). A change feed (Debezium, Postgres logical replication, DynamoDB streams, Kafka events) drives invalidation. The application doesn't have to remember to invalidate — the database tells you what changed.
Works well when: writes happen from many sources (background jobs, external integrations, manual SQL fixes), and you cannot trust every writer to call the invalidation API.
A classic distributed bug: write skew between cache and database. The sequence:
t1: Service A reads user:42 from DB → {name: "Old"}
t2: Service B writes user:42 to DB → {name: "New"}
t2: Service B invalidates cache key user:42
t3: Service A puts the value it read in step t1 into the cache
── Cache now contains "Old" forever (or until next TTL).
The fix is one of: (a) write-through caching (the cache is always written with the database transaction), (b) use a versioned/timestamped key (read the version with the data and don't overwrite a newer one), (c) only let one writer per key set the cache, with a short window between DB write and cache write. None of these is free.
The pragmatic stance: pick the weakest invalidation that gives correct behaviour. TTLs of seconds are an acceptable fallback for most things. Hard "this must update instantly across the whole fleet" requirements should be rare, and when they exist, the right answer is usually to not cache that particular piece of data.
Cache stampedes and thundering herds
Imagine 10,000 application servers all trying to render the same page. Each of them looks up popular:posts in the cache. The TTL expires at the same instant. All 10,000 servers simultaneously discover the cache miss and slam the database with the same query.
That is a cache stampede (also called a thundering herd). It happens whenever a popular cache entry expires and many clients see the miss before any one of them has refilled it. The database — which was sitting comfortably serving zero traffic for that key — suddenly takes the full burst. Often that burst is enough to take the database down, which makes the cache miss permanent for as long as the database is down, which means every subsequent request is also a miss, which the database also cannot answer. You have a cascading failure caused by the cache.
The fixes, in increasing order of sophistication:
1. Stale-while-revalidate. Don't delete the value when it expires — mark it as stale. The first request after expiry returns the stale value AND triggers a background refresh. Subsequent requests during the refresh also get the stale value. Only one goroutine talks to the database.
t0: cache has fresh value, TTL = 60s
t60: TTL expires. Mark stale, don't delete.
t61: Request comes in. Return stale value. Spawn refresh in background.
t61.05: Refresh completes. Cache fresh again.
t61.01-t61.05: Other requests get stale value (no DB stampede).
2. Probabilistic early refresh. Each request to a near-expiring cache entry has a small chance of triggering a refresh, with the probability rising as expiry approaches. By the time the TTL actually fires, the refresh has usually already happened. Implemented in libraries as XFetch.
3. Locking / single-flight. When a cache miss happens, only one process is allowed to query the database. Others wait for it (or get a stale value or a fallback). Go's singleflight package and Java's Caffeine cache implement this. The lock can be local (per-server) or distributed (a Redis SET NX lock).
4. Pre-warming. For very expensive computations (top-N rankings, recommendations), don't rely on a request to trigger the refresh. Run a background job on a schedule that always keeps the cache full.
5. Negative caching. Cache the absence of a value too. A request for user:99999999999 that returns nothing should cache that fact for a short time, otherwise a botnet hammering bad IDs can stampede the database with lookups for things that don't exist.
The simplest one-line defence against most stampedes is TTL jitter: when you set a TTL of 60 seconds, actually set 50-70 seconds randomly. The cache entries no longer expire in sync; the burst smears across 20 seconds. It is one line of code and it prevents 90% of stampede pain.
CDN caching — pushing the cache to the edge
A CDN (Content Delivery Network) is a cache that lives at the geographic edge of the internet. Cloudflare, Fastly, Akamai, CloudFront, and others operate hundreds of edge locations worldwide. When a user requests a URL, the request hits the nearest edge first; if the edge has a cached response, it returns it without ever talking to your origin server.
For static content (images, CSS, JS bundles, fonts, downloads), this is a huge win — the user's request might travel 50 km to a city-level POP instead of 8,000 km to your origin. The bandwidth cost is offloaded to the CDN. Many requests never touch your servers at all.
The HTTP machinery that drives CDN caching is the Cache-Control header:
Cache-Control: public, max-age=86400, s-maxage=604800, immutable
├── public Cacheable by CDN and browser
├── max-age Browser cache duration (seconds)
├── s-maxage Shared (CDN) cache duration
└── immutable This content will never change — never revalidate
Cache-Control: private, no-store
├── private Don't cache in shared caches (per-user data)
└── no-store Don't cache anywhere (sensitive data)
The canonical pattern for SPAs: hash the static asset filenames during the build (app.a1b2c3.js), set Cache-Control: public, max-age=31536000, immutable on them, and never invalidate. The HTML that references them is Cache-Control: no-cache. Now you get free, infinite caching on bytes that will never change, and instant deploys on the entry-point.
Dynamic content is harder. The strategies:
- Per-URL caching with short TTLs. Pages that don't change per user (blog posts, product pages) cache fine at the edge for 30 seconds to a few minutes. The hit rate of even a 30-second TTL is dramatic for high-traffic pages.
- Cache-key customisation. Include relevant headers in the cache key (country for currency display, device for layout). Be careful: every key dimension multiplies cache fragmentation.
- Edge logic. Cloudflare Workers and similar run code at the edge before the request reaches your origin. You can stitch together cached fragments, serve A/B test variants, or block bad traffic — all without an origin round trip.
- Purging. When content changes, tell the CDN. Most CDNs offer URL-by-URL purge, tag-based purge (you tag a response when serving it, then purge by tag), or whole-site purge. Purge cost grows with cache size, so tag-based purging is what you want at scale.
The big gotcha: cookies and personalisation kill caching. If your homepage sets a session cookie on first visit, the response is no longer cacheable by the CDN (it differs per user). Strip cookies from responses the CDN will cache, or move personalisation entirely to the client side (the API endpoint is uncached, the page shell is cached).
A real-world result: a properly-configured CDN for a typical content site offloads 90% or more of requests from the origin. The origin only sees cache misses, real users hitting truly dynamic endpoints, and bots. The cost savings and reliability improvements are dramatic. Caching is the lever, in real systems, that gives you the biggest bang for the least architectural complexity. The next module looks at the networking and infrastructure that gets those bytes to users in the first place.
⁂ Back to all modules