Module 14 • Reliability • 14 min read • Updated May 07, 2026

Error Handling

Operational vs programmer errors. Retries, circuit breakers, async pitfalls.

Error Handling Philosophy

Errors are expected. Every external call can fail: database, cache, third-party API. Design your system assuming failures happen.

Categories of errors:

Operational errors — expected runtime errors. DB connection lost, user not found, validation failed. Handle gracefully, return appropriate status.

Programmer errors — bugs. Null pointer, wrong type, logic error. Don't catch these — let them crash and alert. Fix the code.

Transient errors — temporary failures. Network blip, API timeout. Retry with backoff.

Permanent errors — will always fail. Invalid email format, user not found. No point retrying.

Custom Error Classes

Use typed errors to distinguish cases and set HTTP status codes:

JavaScript

class AppError extends Error {
  constructor(message, statusCode = 500, code) {
    super(message);
    this.statusCode = statusCode;
    this.code = code;
    this.isOperational = true; // expected error
  }
}

class NotFoundError extends AppError {
  constructor(resource) {
    super(`${resource} not found`, 404, "NOT_FOUND");
  }
}

class ValidationError extends AppError {
  constructor(details) {
    super("Validation failed", 400, "VALIDATION_ERROR");
    this.details = details;
  }
}

class ConflictError extends AppError {
  constructor(message) { super(message, 409, "CONFLICT"); }
}

// In handler, throw typed errors:
throw new NotFoundError("User");
// Error middleware catches and formats the response.

Global Error Middleware

One error handler catches everything. Services/handlers just throw — they don't format HTTP responses.

JavaScript

// Express error middleware (4 args = error handler)
app.use((err, req, res, next) => {
  // Log the error
  logger.error({ err, requestId: req.requestId });

  // Operational error: send clean response
  if (err.isOperational) {
    return res.status(err.statusCode).json({
      error: {
        code: err.code,
        message: err.message,
        details: err.details
      }
    });
  }

  // Programmer error: don't leak internals
  res.status(500).json({
    error: { code: "INTERNAL_ERROR", message: "Something went wrong" }
  });

  // Optionally: process.exit(1) for unrecoverable programmer errors
});

Async Error Pitfalls

In Node.js, unhandled promise rejections are silent errors (they're caught now, but were historically dropped).

JavaScript

// Dangerous: unhandled rejection
app.get("/users", async (req, res) => {
  const users = await db.getUsers(); // if this throws, who catches it?
  res.json(users);
});

// Safe: wrap in try-catch
app.get("/users", async (req, res, next) => {
  try {
    const users = await db.getUsers();
    res.json(users);
  } catch (err) {
    next(err); // passes to error middleware
  }
});

// Better: use a wrapper utility
const asyncHandler = (fn) => (req, res, next) =>
  Promise.resolve(fn(req, res, next)).catch(next);

app.get("/users", asyncHandler(async (req, res) => {
  const users = await db.getUsers();
  res.json(users);
}));

Retries with Exponential Backoff

When you call a downstream service that fails, don't give up immediately — and don't hammer it with retries either. Wait progressively longer between each attempt.

JavaScript

async function callWithRetry(fn, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err) {
      if (attempt === maxRetries - 1) throw err;
      // 100ms, 200ms, 400ms, 800ms ...
      const delay = Math.pow(2, attempt) * 100;
      await new Promise(res => setTimeout(res, delay));
    }
  }
}

Why exponential, not constant? If you retry every 100ms and the downstream is overloaded, your retries become part of the problem. Backing off gives the downstream room to recover.

Add jitter (small random delay) to avoid the "thundering herd" problem — when all your servers retry at exactly the same moment after a downstream blip.

Don't retry on every error. Retry on transient failures (timeouts, 503s, network errors). Don't retry on permanent ones (404s, 400s, auth failures) — they'll never succeed.

Circuit Breakers — Failing Fast

Retries are the right answer for occasional blips. They're the wrong answer when a downstream service is hard down. If a payment provider is offline and every request waits 3 seconds before timing out, your threads pile up, your connection pool exhausts, and your service goes down too. The failure cascades.

The circuit breaker pattern, borrowed from electrical engineering, prevents this.

It tracks the failure rate of calls to a downstream service, and has three states:

Text

   ┌──────────┐  error rate >    ┌──────────┐
   │  CLOSED  │  threshold       │   OPEN   │
   │ (normal) │ ───────────────► │ (failing │
   │          │                  │  fast)   │
   └──────────┘                  └──────────┘
        ▲                              │
        │                              │ timeout
        │                              ▼
        │                       ┌──────────┐
        │ test request          │ HALF-    │
        └──── succeeds ─────────│  OPEN    │
                                └──────────┘

CLOSED — the normal state. Requests pass through to the downstream service.

OPEN — too many failures crossed the threshold. The breaker "trips". Now every request fails immediately, without even trying the downstream. Your service stays responsive while the downstream recovers.

HALF-OPEN — after a cooldown period, the breaker lets ONE test request through. If it succeeds, the breaker closes back to normal. If it fails, it opens again.

JavaScript

import CircuitBreaker from 'opossum';
const breaker = new CircuitBreaker(callPaymentService, {
  timeout: 3000,                    // requests over 3s count as failure
  errorThresholdPercentage: 50,     // open if 50% of recent calls failed
  resetTimeout: 30000               // try again after 30s
});
breaker.fallback(() => ({ queued: true }));  // graceful fallback

The fallback is critical: when the breaker is open, what does your service do instead of calling the downstream? Queue the request? Return a cached value? Return a degraded response? You need a plan.

★

Source & Credit

The Backend from First Principles series is based on what I learnt from Sriniously's YouTube playlist — a thoughtful, framework-agnostic walk through backend engineering. If this material helped you, please go check the original out: youtube.com/@Sriniously. The notes here are my own restatement for revisiting later.

⁂ Back to all modules

Error Handling

Error Handling Philosophy

Custom Error Classes

Global Error Middleware

Async Error Pitfalls

Retries with Exponential Backoff

Circuit Breakers — Failing Fast

Continue reading

Logging, Monitoring & Observability

Security

Graceful Shutdown