Home
Backend from First Principles / Module 18 — Scaling & Performance

Scaling & Performance

Vertical vs horizontal. Load balancing. Read replicas. The optimization checklist.


Vertical vs Horizontal Scaling

Vertical scaling (scale up) — Give the server more CPU, RAM, disk.
Simple. No code changes. But: there's a limit (biggest instance size), single point of failure, expensive.

Horizontal scaling (scale out) — Add more servers. Run multiple instances behind a load balancer.
Requires stateless architecture (no local state). Much higher ceiling. Handles failures better.

The combination: start vertical, then scale horizontal when hitting the ceiling.

Rule: Make your service stateless first. State (sessions, cache) goes to external services (Redis, DB). Then horizontal scaling is trivial.


Load Balancing

A load balancer distributes requests across multiple backend instances.

Algorithms:
• Round Robin — each server in turn. Simple, even distribution.
• Least Connections — send to server with fewest active connections.
• IP Hash — same client always goes to same server (sticky sessions).
• Weighted Round Robin — servers get proportional to their capacity.
• Random — surprisingly effective for stateless services.

Layer 4 (TCP) vs Layer 7 (HTTP):
• L4: Routes based on IP/TCP. Fast, no HTTP knowledge.
• L7: Routes based on HTTP (headers, path, cookies). Can route /api → backend, /static → CDN.

Tools: Nginx, HAProxy, AWS ALB, Envoy, Traefik.

Sticky sessions: when a user must always hit the same server (avoid if possible — breaks horizontal scaling).


Database Scaling

Read replicas — Add read-only copies of the database. Write to primary, read from replicas. Scale reads horizontally. (Most apps are 90% reads.)

Connection pooling — PgBouncer between app and DB. Multiplexes thousands of app connections into a few DB connections.

Query optimization — Indexes, avoiding N+1, proper EXPLAIN ANALYZE usage. Often the first step before infrastructure scaling.

Partitioning (Sharding) — Split data across multiple databases by a shard key (userId, region). Complex to implement. Use only when other options exhausted.

CQRS — Command Query Responsibility Segregation. Separate write model (commands) from read model (queries). Different data stores optimized for each.

Read/write splitting: Route queries to replicas automatically via middleware or ORM config.


Performance Optimization Checklist

In order of ROI (do first what yields most):

  1. Database indexes — Index every foreign key and common WHERE column.
  2. N+1 query elimination — Use JOINs or batch loading.
  3. Caching — Redis in front of expensive queries.
  4. Connection pooling — PgBouncer, Redis connection pool.
  5. Async where possible — Non-blocking I/O, background jobs.
  6. Pagination — Never return unlimited lists.
  7. Response compression — gzip/brotli on all text responses.
  8. HTTP/2 — Multiplexing cuts latency for many small requests.
  9. CDN — Static assets and cacheable API responses at the edge.
  10. Read replicas — Scale reads horizontally.
  11. Horizontal scaling — Add instances behind load balancer.
  12. Sharding — Last resort for data too large for one server.

Source & Credit

The Backend from First Principles series is based on what I learnt from Sriniously's YouTube playlist — a thoughtful, framework-agnostic walk through backend engineering. If this material helped you, please go check the original out: youtube.com/@Sriniously. The notes here are my own restatement for revisiting later.

⁂ Back to all modules