Module 18 • Performance • 14 min read

Scaling & Performance

Vertical vs horizontal. Load balancing. Read replicas. The optimization checklist.

Vertical vs Horizontal Scaling

Vertical scaling (scale up) — Give the server more CPU, RAM, disk.
Simple. No code changes. But: there's a limit (biggest instance size), single point of failure, expensive.

Horizontal scaling (scale out) — Add more servers. Run multiple instances behind a load balancer.
Requires stateless architecture (no local state). Much higher ceiling. Handles failures better.

The combination: start vertical, then scale horizontal when hitting the ceiling.

Rule: Make your service stateless first. State (sessions, cache) goes to external services (Redis, DB). Then horizontal scaling is trivial.

Load Balancing

A load balancer distributes requests across multiple backend instances.

Algorithms:
• Round Robin — each server in turn. Simple, even distribution.
• Least Connections — send to server with fewest active connections.
• IP Hash — same client always goes to same server (sticky sessions).
• Weighted Round Robin — servers get proportional to their capacity.
• Random — surprisingly effective for stateless services.

Layer 4 (TCP) vs Layer 7 (HTTP):
• L4: Routes based on IP/TCP. Fast, no HTTP knowledge.
• L7: Routes based on HTTP (headers, path, cookies). Can route /api → backend, /static → CDN.

Tools: Nginx, HAProxy, AWS ALB, Envoy, Traefik.

Sticky sessions: when a user must always hit the same server (avoid if possible — breaks horizontal scaling).

Database Scaling

Read replicas — Add read-only copies of the database. Write to primary, read from replicas. Scale reads horizontally. (Most apps are 90% reads.)

Connection pooling — PgBouncer between app and DB. Multiplexes thousands of app connections into a few DB connections.

Query optimization — Indexes, avoiding N+1, proper EXPLAIN ANALYZE usage. Often the first step before infrastructure scaling.

Partitioning (Sharding) — Split data across multiple databases by a shard key (userId, region). Complex to implement. Use only when other options exhausted.

CQRS — Command Query Responsibility Segregation. Separate write model (commands) from read model (queries). Different data stores optimized for each.

Read/write splitting: Route queries to replicas automatically via middleware or ORM config.

Performance Optimization Checklist

In order of ROI (do first what yields most):

Database indexes — Index every foreign key and common WHERE column.
N+1 query elimination — Use JOINs or batch loading.
Caching — Redis in front of expensive queries.
Connection pooling — PgBouncer, Redis connection pool.
Async where possible — Non-blocking I/O, background jobs.
Pagination — Never return unlimited lists.
Response compression — gzip/brotli on all text responses.
HTTP/2 — Multiplexing cuts latency for many small requests.
CDN — Static assets and cacheable API responses at the edge.
Read replicas — Scale reads horizontally.
Horizontal scaling — Add instances behind load balancer.
Sharding — Last resort for data too large for one server.

★

Source & Credit

The Backend from First Principles series is based on what I learnt from Sriniously's YouTube playlist — a thoughtful, framework-agnostic walk through backend engineering. If this material helped you, please go check the original out: youtube.com/@Sriniously. The notes here are my own restatement for revisiting later.

⁂ Back to all modules