Module 9 • Architecture • 22 min read • Updated May 21, 2026

Supplementary — Gap Fills & Additional Designs

Task queues, idempotency in depth, web crawler, search autocomplete, hotel/seat reservation, top-K — the topics that didn't fit cleanly into the main modules.

Why a supplementary module

Some topics straddle two modules and live properly in neither. Some classic designs are too niche to belong with URL shorteners and Twitter feeds but too important to skip. This module collects them.

The pattern of each section is the same as Module 8 — requirements, the key insight, architecture, and the parts that need extra care. The depth is meant to make each topic interview-ready on its own, while pulling forward the primitives from earlier modules.

The six topics here are: task and job queues (the operational reality of running async work at scale), idempotency in real depth (because the one-paragraph version everywhere undersells the difficulty), web crawler architecture, search autocomplete, hotel/seat reservation (the design problem distributed locks were invented for), and top-K / heavy-hitter computation.

Task and job queues

Module 5 covered messaging primitives. This section is about running task queues in production — the operational layer where most teams discover the gap between "we have a queue" and "our queue system works."

The shape of the problem. A web request enqueues a task; a worker picks it up; the task runs; the result is recorded or notified. The task might be: send email, transcode video, generate report, charge card, retry a failed external call, recompute analytics, expire stale data, rebuild a cache. Different tasks have very different latency, retry, and isolation requirements.

The production concerns that go beyond "PUT and GET":

Priority. A password-reset email needs to go out in seconds; a weekly digest can wait minutes. Two approaches: separate queues per priority (workers poll high first), or a single queue with priority scores (Redis sorted set; pop highest first). Per-queue is simpler and easier to reason about.
Scheduling. Some tasks should run later ("send reminder in 24 hours"). Implement with a delayed queue (Redis sorted set keyed by run_at timestamp; workers poll for due tasks).
Concurrency limits. "Only N transcodes at once" to avoid GPU thrashing. Worker pool sizing per queue.
Per-key serialisation. Some tasks for the same key must not run concurrently. "Rebuild user 42's feed" should not have two workers fighting. Use a distributed lock on the key, or shard the queue by key so only one worker handles each shard.
Dead-letter queues. A task that fails N times moves to DLQ for human attention. Without this, poison messages cycle forever and mask real errors.
Visibility into the queue. Depth metric per queue, age of oldest message, processing time percentiles, retry rate. Without these dashboards, you find out a queue is backed up when users complain.

Real-world choices:

Sidekiq / Resque (Ruby), Celery (Python), BullMQ (Node), java's Quartz/Spring Batch — language-specific libraries, often Redis-backed. Best for monolithic apps with a clear language stack.
AWS SQS + worker process — language-agnostic, fully managed, virtually unlimited scale.
Temporal / Cadence — workflow engines for tasks that are really multi-step business processes. Built-in retries, timeouts, child workflows, signals. The right choice when a "task" is actually a long-running workflow.
Kafka — overkill for simple task queues; right when the task stream is also the basis for analytics, audit, or replay.

Avoiding the at-least-once bite. Every task queue is at-least-once. Every task handler must be idempotent — which is the next section's full topic. A task that sends an email and crashes after sending but before acking will resend. A task that charges a card and crashes will double-charge. Build idempotency into the task, not on top of the queue.

Idempotency — deep dive

Module 6 mentioned idempotency keys. The full story is more involved and worth its own section, because almost every distributed-system bug at scale traces back to idempotency done badly or not at all.

The definition. An operation is idempotent if performing it once or many times produces the same end state. SET x = 5 is idempotent. x += 1 is not. INSERT INTO orders is not idempotent (you get duplicate rows). INSERT INTO orders ... ON CONFLICT DO NOTHING is idempotent.

The shape of an idempotent API:

text

   Client                      Server
     │                            │
     │── POST /charge ───────────►│
     │   Idempotency-Key: K1       │
     │   amount: 5000              │
     │                            │
     │                            │ Lookup K1 → not seen
     │                            │ Charge card → success
     │                            │ Record (K1, success, result)
     │◄── 200 OK, charge_id=X ────│
     │                            │
     │ (timeout, retry)            │
     │                            │
     │── POST /charge ───────────►│
     │   Idempotency-Key: K1       │
     │                            │ Lookup K1 → already done
     │◄── 200 OK, charge_id=X ────│ (returns stored result)

Implementation: the idempotency store. A table keyed by (idempotency_key, request_hash) storing the result. Three states: not seen, in progress, completed.

sql

CREATE TABLE idempotency (
    key            TEXT PRIMARY KEY,
    request_hash   TEXT NOT NULL,
    status         TEXT NOT NULL,    -- pending, done
    response       JSONB,
    locked_until   TIMESTAMPTZ,
    created_at     TIMESTAMPTZ DEFAULT NOW()
);

The race condition. Two simultaneous requests with the same key. Both lookup, both see "not seen," both proceed to charge — double-charge.

Fix: lock on insert. Use INSERT ... ON CONFLICT DO NOTHING RETURNING *. The losing INSERT returns no row, meaning "someone else got there first." The losing request then polls or waits for the winner to finish.

The crash-mid-operation problem. Service inserts the idempotency row as "pending," charges the card, then crashes before recording "done." On retry, the next request sees "pending" and... what?

Options:

Wait + retry. If still "pending" after lock_timeout, assume the prior attempt died, take over. Race: the prior might not be dead, just slow.
Reconcile externally. Query the downstream system (Stripe) by your own client-side reference ID to determine whether the prior actually succeeded.

The second is what production payment systems do. Every idempotency key is also passed to the external service; on uncertainty, ask the external service "did you process key K?" before retrying.

Request hash matters. If client sends Idempotency-Key K1 for {amount: 50}, then mistakenly reuses K1 for {amount: 500}, the server must reject (key reuse with different params). Compare a hash of the canonicalised request body; mismatch = 400 error.

TTL. Keep idempotency records for at least the maximum retry window of any reasonable client — typically 24 hours. Some systems keep them for days for audit.

Where to enforce. API gateway is appealing (one place, one implementation), but tasks generated from internal queues also need idempotency. The right answer is to push it into the service layer that owns the operation.

Web crawler

Requirements. Discover and fetch pages from across the web. Respect robots.txt. Re-crawl by freshness needs. Politeness (don't hammer one server). Detect duplicates. Scale to hundreds of millions of pages.

Estimation. Web is ~50B pages indexed by major engines. Even a tiny crawler targeting 100M pages at one fetch per second per worker needs ~30 worker-days. Realistic crawlers run thousands of workers in parallel.

Architecture.

text

   Seed URLs ──► URL Frontier (priority queue, partitioned by host)
                       │
                       ▼
                 Fetcher workers ──► HTTP
                       │
                       ▼
                  Parser ──► extract links ──► back to frontier
                       │
                       ▼
                  Content store (S3) + metadata DB
                       │
                       ▼
                  Indexer / downstream consumers

URL frontier. A queue of URLs to fetch. Two pressures: prioritise interesting URLs (PageRank, recency, importance hints) AND enforce politeness (no more than 1 request/sec per host).

The canonical design: a two-level queue. Top level is per-host queues, each rate-limited. Bottom level is a fairness scheduler that selects which host to fetch next, weighted by priority. New URLs enqueue into their host's bucket; fetchers poll the scheduler, which picks a host whose rate limit has lapsed.

Deduplication. Same URL might be discovered from many pages. Same content might be served at multiple URLs (canonicalisation, parameter variations).

Use: a Bloom filter or HyperLogLog for "have I seen this URL?" — probabilistic, fits in memory at scale. For content dedup, compute SHA-256 of normalised content; store hashes; skip duplicates.

Politeness. robots.txt is fetched and cached per host (with a TTL). User-Agent identifies your crawler. Crawl-delay is respected. Don't fetch faster than configured per host even if politely allowed.

Re-crawl strategy. Not all pages change at the same rate. Track an estimated change frequency per page (or per host). Frequently-changing news sites recrawled hourly; archival blogs monthly.

Storage. Raw HTML in S3, keyed by URL hash. Metadata (URL, fetch time, status code, parsed links, content hash) in a wide-column store like HBase or Cassandra — perfect for the access pattern (write-once per fetch, read by URL).

Robustness. Bad responses, redirect loops, traps (parameter cycles), giant pages, slow servers. Limit response size; limit redirect depth (~5); per-host timeout. Quarantine hosts that misbehave.

Search autocomplete

Requirements. As a user types, suggest the top N completions. Sub-100ms latency. Up-to-date (trending queries appear within minutes). Personalised optionally.

Estimation. 1B users × 5 chars typed per query × 1 autocomplete call per char = massive read traffic. The interesting bottleneck is read latency, not write or storage.

The core data structure: trie. A tree where each node is a character and each path from root spells a string. Walking the trie by user input lets you find all completions in O(prefix_length).

text

   Trie for ["cat", "car", "cars", "care", "dog"]

              (root)
              /    \
            c        d
            |        |
            a        o
          / | \      |
         t  r  ...   g
            |
            s,e

At each leaf (or every node), store the top K precomputed completions sorted by score. "User typed 'ca'" → walk to 'ca' node → return its precomputed top-K.

Why precompute. Sorting at query time means traversing the subtree and computing aggregate scores per request. At hundreds of thousands of QPS, this is too expensive. Precompute the top K at every prefix node; queries become O(1) after the prefix walk.

Storage. Two practical options:

Redis with sorted sets keyed by prefix. Each key is a prefix; the value is a sorted set of completions by score. Lookup: ZREVRANGE prefix 0 K. Trades memory (lots of redundant key-prefix entries) for query speed.
Elasticsearch's completion suggester or edge-ngram analyser. Built-in, scales well, more flexible (typo tolerance, multi-language). Slightly slower than Redis but operationally simpler.

For very large catalogues (e-commerce, music), Elasticsearch with edge-ngrams plus a Redis caching layer for top queries is common.

Updating the index. Search logs flow into Kafka. A streaming job aggregates per-query counts in a rolling window (last hour, last day, last week). Hot queries get bumped in score. Updated scores get written into the suggestion store. Latency from query event to appearing in autocomplete: minutes.

Personalisation. A user's recent searches go to the top of their personal trie. Implementation: per-user sorted set in Redis, merged with the global top-K at query time. Costs an extra Redis call per autocomplete.

Trending. Surface new queries climbing fast. Compare current-window count to baseline; high ratio = trending. Show with a small visual cue.

Hotel / seat reservation

Requirements. Users book seats / rooms / slots from a limited inventory. Prevent double-booking. Hold a seat during checkout. Release on timeout or cancel. Handle popular events with millions of concurrent users.

The hard property. Inventory is finite and contended. Two users clicking "book" on the same seat at the same instant cannot both succeed. This is the canonical use case for distributed locks — and the canonical case where naive locks fail.

Architecture.

text

   Browse ──► Inventory display (with cache)
   Click seat ──► Hold seat (30 minutes)
   Pay        ──► Convert hold to booking
   Timeout    ──► Release hold

Inventory storage. Strong consistency required. Postgres or similar. Schema:

sql

CREATE TABLE seats (
    id            UUID PRIMARY KEY,
    event_id      UUID NOT NULL,
    seat_number   TEXT NOT NULL,
    status        TEXT NOT NULL,    -- available, held, booked
    held_by       UUID,
    held_until    TIMESTAMPTZ,
    booked_by     UUID,
    UNIQUE (event_id, seat_number)
);

The hold operation. Atomically transition status='available' to status='held'. Use a conditional update:

sql

UPDATE seats
SET status = 'held', held_by = $user, held_until = NOW() + INTERVAL '30 min'
WHERE id = $seat AND status = 'available'
RETURNING *;

Only one transaction wins the row lock. The losing transactions see zero rows updated → already taken. No application-level locking needed.

The release mechanism. A background job scans WHERE status='held' AND held_until < NOW() periodically and releases. Trades a small window of "appears held longer than 30 min" for simplicity. Alternative: each hold inserts a Redis key with TTL = 30 minutes; on expiry, a key-event triggers release. Faster recovery, more moving parts.

Reading inventory under load. For a popular event, 1M users browsing simultaneously. Per-seat status changes constantly. Don't query Postgres on every page load.

Pattern: cache the inventory view in Redis, updated by an event stream from Postgres. Reads are O(1) from Redis; writes still go through Postgres. Browsers may show a seat as available for a few seconds after it was held — that's fine; the actual hold attempt will fail correctly.

Flash-sale traffic. A concert with 10k seats has 1M people clicking "book" in the same second. Even with the conditional update working correctly, 1M concurrent UPDATEs hammering one table is too much. Mitigations:

Queue the bookings. Accept all requests into a queue; workers process FIFO; the first 10k succeed, the rest get "sold out."
Pre-shard. Block seats into groups; route each user to one group's queue. Lower contention; some shards might sell out before others.
Virtual waiting room. Users get queued before they reach the booking page; release them in controlled batches. The actual booking system sees a manageable trickle.

Virtual waiting rooms (Queue-it, AWS Wait Room) are what every concert site uses. The architecture isn't fancy — it just spreads a thundering herd across minutes.

Overbooking. Hotels and airlines deliberately overbook (10-20% beyond capacity) because they know a fraction of bookings cancel. The booking system needs to support inventory = capacity * overbook_factor and handle the rare case where everyone shows up (compensation policies, denied boarding).

Top-K / heavy hitters

Requirements. Find the top K items in a high-volume stream — top searched queries, top viewed videos, top clicked ads, top accessed cache keys. Approximate is fine; the absolute exact top-K isn't worth the storage.

Why this needs care. Naive approach: hash map of (item → count); periodically sort and take top K. Works for small streams. At 1M events/sec with millions of unique items, the hash map doesn't fit in memory — and if it does, the sort is expensive.

The data-streams literature has elegant probabilistic algorithms for this.

Count-Min Sketch. A 2D array of counters; multiple hash functions. To record an event, hash it with each function and increment the corresponding cells. To query an item's count, hash it the same way and take the minimum value across cells.

text

   d hash fns × w buckets each

   h1: [.][.][3][.][.][1][.][.][.][.]
   h2: [.][2][.][.][.][.][.][.][.][.]
   h3: [.][.][.][.][1][.][.][.][.][.]

   Count("foo") = min(cell[h_i("foo")] for each i)

Memory: w × d counters total, regardless of unique-item count. Tunable: bigger w lowers overestimation, bigger d lowers probability of overestimate.

Tracking the top K from Count-Min: maintain a min-heap of size K of the items seen so far, with their estimated counts.

Misra-Gries. Deterministic algorithm. Maintain K-1 counters; for each new item: if it's already tracked, increment; if a counter is zero, evict and replace with new item at count 1; else decrement all counters. The final set is a superset of the true top-K; a second pass refines.

HyperLogLog. Not for top-K but for cardinality ("how many distinct items have we seen?"). 16 KB of memory tracks billions of distinct items with ~2% error. Redis has this built in (PFADD, PFCOUNT).

Pragmatic stack.

For streaming top-K in production:

text

   Event stream (Kafka) ──► Flink/Spark Streaming
        ──► Count-Min Sketch + heap per window
        ──► Output: top K per minute, hour, day
        ──► Stored in Redis sorted sets keyed by (entity, window)

The sketch is per-shard (partitioned by hash of item). Merging sketches is just per-cell addition — distributed-friendly. The final top-K is read from Redis on demand.

A simpler approximation that often suffices: probabilistic counting. Increment a counter with probability 1/n where n is the current count. Counters stay small even for very common items; you scale by 1/p when reporting. Loses precision but trivial to implement.

The broader lesson: at high volume, exact computation is rarely worth the cost. Probabilistic structures (Count-Min, HyperLogLog, Bloom filters, t-digest for percentiles) are the right primitive for analytics on streams. Internalise them.

These six topics close most of the gaps left by the main modules. The next and final module is bonus material: the operational disciplines (observability, chaos, DR, security) that turn an architecture diagram into a system that survives contact with reality.

⁂ Back to all modules

Supplementary — Gap Fills & Additional Designs

Why a supplementary module

Task and job queues

Idempotency — deep dive

Web crawler

Search autocomplete

Hotel / seat reservation

Top-K / heavy hitters

Continue reading

Distributed Systems

Classic System Designs

Reliability, Observability & Security