Home
High-Level System Design / Module 6 — Architecture Patterns

Architecture Patterns

Microservices vs monolith, circuit breaker, rate limiting, retry and backoff, saga, API patterns, bulkhead and timeout — the named patterns that recur in every real system.


Microservices vs monolith

The most-debated architectural question of the last decade also has the most-misunderstood answer. The honest version is: neither one is right by default, and the choice depends on team size, deploy independence, and the cost of being wrong.

A monolith is a single deployable unit. One codebase, one build, one process (replicated for scale). All the business logic lives in one place; modules call each other in-process. PHP'd version of the early Facebook, Basecamp's Rails monolith, Shopify's monolith — these handle enormous load. "Monolith" is not synonymous with "badly architected."

Microservices is one deployable unit per bounded context — a coherent slice of business logic with its own data and its own team. Each service has its own database, its own deploy pipeline, its own runtime. Services talk over the network (HTTP, gRPC, queues).

text
   Monolith                  Microservices
   ────────                  ─────────────
   ┌─────────────┐          ┌──────┐┌──────┐┌──────┐
   │  Orders     │          │Orders││Users ││Cart  │
   │  Users      │          └──────┘└──────┘└──────┘
   │  Cart       │          ┌──────┐┌──────┐┌──────┐
   │  Payments   │          │Pay   ││Ship  ││Inv   │
   │  Shipping   │          └──────┘└──────┘└──────┘
   │  Inventory  │          (each its own DB, deploy, team)
   └─────────────┘
   one DB, one deploy

The microservices pitch is independence: teams ship without coordinating, each service scales separately, a fault in one is contained. The hidden costs are also real:

A pragmatic decision framework:

Watch for the failure mode of premature decomposition: a small team with five microservices and four engineers spends most of their time on cross-service plumbing instead of product. Modular monolith first, microservices when you have evidence of organisational pain is a defensible default.

Circuit breaker

When a downstream service is dying, calling it more does not help. Each call you make is more load on the dying service AND more pending requests piled up in your service, eventually exhausting your own thread pool or memory. The cascading failure pattern is among the most common ways a distributed system goes from "one slow service" to "site down."

A circuit breaker sits between your code and the downstream call. It tracks recent failures. When failures pass a threshold, the breaker opens — and all calls fail fast for some cooldown period, without ever touching the downstream service.

text
   States:
     CLOSED  → calls pass through normally; failures counted
         │
         │ (failure threshold exceeded)
         ▼
     OPEN    → calls fail immediately for cooldown period
         │
         │ (cooldown elapsed)
         ▼
     HALF-OPEN → allow a few test calls
         │
         ├── success → back to CLOSED
         └── failure → back to OPEN

The payoff:

Real-world implementations: Hystrix (Netflix, now in maintenance), resilience4j (the modern Java standard), Polly (.NET), Envoy's outlier detection (mesh-level), opossum (Node.js). Most service meshes implement circuit breaking at the proxy level — no library required.

Tuning the breaker matters and is not obvious:

The pattern pairs naturally with fallbacks. When the breaker is open, return a sensible default instead of an error if possible — cached data, an empty list, a "degraded mode" flag. The user sees the site working, possibly with reduced functionality, instead of broken.

Rate limiting

Rate limiting bounds the number of requests a client can make in a time window. It is a hygiene tool — it protects your infrastructure from misbehaving clients, paying customers from each other, and your bill from runaway processes.

The four canonical algorithms:

1. Fixed window. Counter resets at the start of each window. "100 requests per minute, reset on the minute." Simple to implement (an INCR with a TTL in Redis). Suffers from edge bursts: a client can make 100 requests at 12:00:59 and another 100 at 12:01:00, for 200 requests in 2 seconds.

2. Sliding window log. Store the timestamp of every request. To check the limit, count requests within the last minute. Accurate but expensive — memory is proportional to request rate.

3. Sliding window counter. Hybrid: store a counter per fixed window AND interpolate based on how much of the current window has elapsed. Approximate but cheap (two counters per client). Most production rate limiters use this.

4. Token bucket. A bucket holds N tokens; tokens refill at rate R per second; each request consumes one. If the bucket is empty, the request is rejected (or queued). Smooth, allows bursts up to the bucket size, the canonical algorithm in network gear and API gateways.

text
   Token bucket: bucket size 10, refill 2 tokens/sec

   t=0s:  bucket=10  → burst of 10 requests allowed instantly
   t=1s:  bucket=2   → request 11 rejected (bucket empty after burst)
   t=2s:  bucket=4   → 4 requests allowed
   t=10s: bucket=10  → refilled to max

Where to enforce the limit matters:

What to key on: API key (for paid plans), user ID (for per-user fairness), IP address (for unauthenticated traffic — but careful with NAT and corporate gateways), client device ID, or some combination.

Response when limited. Return 429 Too Many Requests with a Retry-After header indicating how many seconds to wait. Many client libraries will respect this automatically.

A distributed setup needs a shared counter — usually Redis. The pattern INCR rate:user:42:minute plus EXPIRE rate:user:42:minute 60 is the entire rate limiter for fixed window. Token bucket needs a few more operations but is still a small Lua script in Redis.

Don't overlook adaptive rate limits: limits that tighten under load. When your latency p99 spikes, drop everyone's limits by 50% temporarily. This is what protects you from a traffic spike you didn't capacity-plan for.

Retry with backoff

Network calls fail transiently — a packet drops, a process restarts, a database briefly pauses for a checkpoint. Most of the time, retrying succeeds. But the way you retry decides whether you fix a problem or make it worse.

Three retry rules that always apply:

text
   Without jitter (every client lined up)
   ──────────────────────────────────────
   t=0      ALL fail
   t=100ms  ALL retry (storm)
   t=200ms  ALL retry (storm)
   t=400ms  ALL retry (storm)

   With jitter (smeared)
   ─────────────────────
   t=0           ALL fail
   t=50-150ms    retries trickle in
   t=120-280ms   trickle
   t=300-500ms   trickle

When NOT to retry:

Retry budgets: Google's SRE book popularised the idea of a global retry-rate cap. If retries are more than 10% of total calls, something is wrong; instead of retrying more, you back off everywhere. This prevents the failure pattern where the system spends most of its capacity retrying.

Idempotency keys make retries safe even for state-changing operations. The client generates a unique key per logical operation and sends it on every retry. The server records the key + result; on a retry of the same key, return the stored result instead of executing again. Stripe's API is the canonical reference for this design — every payment request takes an Idempotency-Key header.

Saga pattern

A traditional database transaction gives you all-or-nothing semantics: every write commits or none do. In a microservices architecture, the writes span multiple services and multiple databases — a single ACID transaction is impossible. You need a different shape.

A saga is a long-running business transaction implemented as a sequence of local transactions, each in one service. If any step fails, compensating transactions run for the previously-completed steps to roll back what they did.

The canonical example is placing an order:

text
   Forward path:
   1. Reserve inventory (Inventory service)
   2. Charge payment       (Payment service)
   3. Create shipment      (Shipping service)
   4. Send confirmation    (Notification service)

   If step 3 fails:
   ↓
   Compensate:
   1. Refund payment       (Payment service)
   2. Release inventory    (Inventory service)

Key properties:

Two styles of saga orchestration:

1. Choreography. Each service listens to events and decides whether to act. No central coordinator. Inventory service consumes OrderPlaced, reserves stock, emits InventoryReserved. Payment service consumes that, charges, emits PaymentCharged. And so on. Distributed; no single point of control; emergent.

Pros: services stay decoupled. Cons: hard to reason about the full flow; visualising what's happening for a stuck order requires hunting through logs in multiple services.

2. Orchestration. A dedicated orchestrator service drives the saga step by step. It calls inventory, calls payment, calls shipping, deciding what's next based on each response. Centralised; explicit; auditable.

Pros: the flow is in one place; failures are obvious. Cons: the orchestrator can become a god service; it's a coupling point.

Real systems mix both — orchestration for the critical happy-path workflows (orders, payments) and choreography for fan-out side effects (analytics, recommendations).

Tooling: Temporal, AWS Step Functions, Camunda, Cadence are workflow engines designed for orchestrated sagas. They handle persistence, retries, timeouts, and human-step integrations. For simpler cases, a saga can be implemented with a state machine in a database + a worker that polls and advances it.

The critical mindset shift: sagas give you eventual consistency, not atomicity. There are moments when inventory is reserved but payment is not yet charged. Your business needs to be okay with these intermediate states — or you need to design the steps so the intermediate state is at least not user-visible.

API design patterns

The way services expose themselves to each other shapes everything that touches them. Three patterns dominate today, and each one has a place.

REST. Resources identified by URL, operations identified by HTTP verb (GET, POST, PUT, PATCH, DELETE), data exchanged as JSON. The default for public web APIs. Strengths: discoverable, cacheable via HTTP semantics, every client and tool understands it, easy to debug with curl. Weaknesses: over- or under-fetching is common (the client wants 3 fields from a /users endpoint that returns 30), N+1 problems when joining related resources.

Good REST follows a few rules: nouns for resources, verbs as HTTP methods, status codes that mean what they say (200 success, 201 created, 400 bad request, 404 not found, 409 conflict, 500 server error). Versioning typically via the URL path (/v1/users) or a header (Accept: application/vnd.api.v2+json). Pagination via cursor (better) or page/offset (simpler but breaks under writes).

GraphQL. A query language where the client specifies exactly the fields it wants, possibly across multiple resources, in one request. The server resolves the query against multiple data sources. Strengths: no over-fetching, no N+1 at the network layer (the server does the joining), evolves without versioning (just add fields). Weaknesses: hard to cache at the HTTP layer (every query is unique); query complexity can be a DoS vector if unbounded; a single bad resolver makes a query slow.

GraphQL shines when: many client types need different shapes of the same data, mobile apps where bandwidth matters, dashboards with deeply nested queries. It is overkill for: simple CRUD APIs with one client.

gRPC. A binary, HTTP/2-based RPC protocol with strongly typed schemas (Protocol Buffers). Strengths: small payloads, streaming in both directions, generated client libraries in every major language, contract enforcement at compile time. Weaknesses: not directly callable from browsers (you need a proxy like grpc-web); harder to debug; JSON-vs-binary mental shift.

gRPC is the default for service-to-service inside a cluster. The performance and type-safety benefits compound when you have many services calling each other.

A few cross-cutting design rules that matter regardless of protocol:

Bulkhead and timeout patterns

Two smaller patterns that pair with everything else in this module and remove a class of failure that otherwise haunts every distributed system.

Bulkhead comes from shipbuilding. A ship's hull is divided into watertight compartments; if one floods, the others stay dry, and the ship floats. Applied to software: partition resources (thread pools, connection pools, memory) so a failure in one part can't drain shared resources.

The classic failure without bulkheads: your service has a 200-thread Tomcat pool. Downstream service B becomes slow. Requests to B pile up, each holding a thread. Within minutes all 200 threads are blocked on B. Now requests to other downstreams — completely healthy ones — also queue up because there are no free threads. You have a full outage caused by one slow dependency.

With bulkheads: a separate thread pool of, say, 20 threads for calls to B. If B is slow, those 20 threads pile up. The other 180 threads are still serving other work. The blast radius is contained.

text
   No bulkhead                       With bulkhead
   ───────────                       ─────────────
   ┌─────────────────┐               ┌──────────┐ ┌──────────┐ ┌──────────┐
   │ shared pool 200 │               │ B: 20    │ │ C: 50    │ │ D: 30    │
   │ all calls       │               └──────────┘ └──────────┘ └──────────┘
   └─────────────────┘               (B slow → only B's pool exhausts)
   (one slow → all blocked)

The principle generalises beyond thread pools — separate connection pools per downstream, separate Redis clients, separate Kubernetes namespaces for blast-radius isolation.

Timeout is the most boring and most important pattern in distributed systems. Every network call must have a timeout. No exceptions.

The default HTTP client behaviour in many languages is "wait forever." If the downstream never responds, your call thread is gone until the TCP keepalive eventually decides the connection is dead — which can take minutes.

A few non-obvious points:

Timeouts + retries + circuit breaker + bulkhead are the four patterns that together turn a brittle distributed system into a resilient one. Each one solves a different failure mode. Most outages I have ever seen could have been prevented or contained by at least one of them being in place. The next module digs into the distributed-systems primitives — consensus, locks, quorums, clocks — that some of these patterns rely on under the hood.


⁂ Back to all modules