Home
DevOps & Cloud Engineering / Lesson 6 — Continuous Integration

Continuous Integration

Automated testing on every change. The shortest feedback loop in software engineering.


What CI Actually Means

Continuous Integration (CI) is a practice, not a tool. The practice: every change to the codebase is automatically built and tested, immediately, against the rest of the codebase.

Without CI:
• Developers work on branches for weeks
• Integration happens once, before a release, in panic
• Bugs from interactions between changes appear all at once
• Fixing them is hard because you're juggling many changes

With CI:
• Every push triggers a build + tests automatically
• Bugs appear within minutes of the change that caused them
• Each bug is in a small change → easy to find, easy to fix
• Branches stay short → conflicts stay small

The core insight: integration problems are exponentially harder when batched. CI keeps each integration trivially small.

CI by itself doesn't deploy anything. It validates that the code COULD be deployed. The next lesson covers deployment. But you can't have CD without CI — broken main means there's nothing safe to deploy.


What a Good CI Pipeline Does

A typical CI pipeline runs these stages on every commit:

Text
1. Checkout code from Git
2. Restore caches (deps, build artifacts)
3. Install dependencies
4. Lint & format check         ← fast, fail early
5. Type check
6. Build                        ← produces deployable artifact
7. Unit tests                   ← should pass in < 5 min
8. Integration / contract tests
9. Security scans (deps + SAST)
10. Coverage report
11. Artifact publish (if main)  ← Docker image, tarball
12. Notify on failure

The most important property: it runs on EVERY push, automatically. No "did you remember to run tests?" If it's not in CI, it didn't happen.

A pipeline can be simpler — even just lint + tests is dramatically better than nothing. Start small, iterate.


The Speed Imperative

A CI pipeline that takes 30 minutes is functionally broken. Developers context-switch, forget what they were testing, and have to re-engage when results come back.

Target: under 10 minutes for the full pipeline. Under 5 if you can.

How to get there:

Snippet
1. Cache aggressively
   • Dependency caches (npm/pip/maven/gradle)
   • Docker layer caches
   • Test result caches (Bazel, Turborepo)
Snippet
2. Parallelize
   • Run lint, type-check, and tests concurrently
   • Split test suites across multiple runners
Snippet
3. Test pyramid discipline
   • 70-80% fast unit tests (milliseconds each)
   • 15-20% integration tests (seconds)
   • 5% end-to-end tests (longer, in nightly or post-merge)
Snippet
4. Skip irrelevant work
   • Documentation-only PRs don't need to run all tests
   • Path-based triggers: backend changes don't trigger frontend builds
Snippet
5. Right-size your runners
   • CPU-bound tests benefit from bigger runners
   • Cloud CI charges by minute — bigger runners can be cheaper overall

Watch the cost. A team running CI 1000 times a week with 30-minute pipelines is burning serious money on compute.


Branch Protections

CI alone doesn't prevent bad code from merging. You need branch protection to enforce that CI must pass before merge.

GitHub branch protection (Settings → Branches → Add rule):
• Require pull request before merge
• Require approvals (1-2 reviewers)
• Require status checks — point to specific CI jobs
• Require branches to be up to date — must rebase if main moved
• Restrict who can push to main

Text
main branch
  ├─ requires PR
  ├─ requires 1 approval
  ├─ requires CI: tests, lint, security to pass
  ├─ no direct pushes — even maintainers go through PR
  └─ no force pushes (history preserved)

This makes CI MEANINGFUL. Without protection, CI is just a status indicator that anyone can ignore. With it, broken code physically cannot reach main.


A Real Pipeline

Here's a complete CI pipeline for a Node.js + TypeScript project in GitHub Actions (.github/workflows/ci.yml):

YAML
name: CI

on:
  pull_request:
    branches: [main]
  push:
    branches: [main]

jobs:
  build-and-test:
    runs-on: ubuntu-latest
    timeout-minutes: 15

    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_USER: test
          POSTGRES_PASSWORD: test
          POSTGRES_DB: test
        ports: ['5432:5432']
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Setup Node
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Lint
        run: npm run lint

      - name: Type check
        run: npm run typecheck

      - name: Build
        run: npm run build

      - name: Unit tests
        run: npm run test:unit

      - name: Integration tests
        run: npm run test:integration
        env:
          DATABASE_URL: postgres://test:test@localhost:5432/test

  security-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: gitleaks/gitleaks-action@v2
      - run: npm audit --audit-level=high

Key points:
pull_request trigger — runs on every PR push
services — spins up Postgres for integration tests
cache: 'npm' — automatic dependency caching
npm ci not npm install — uses lock file, doesn't update it (more reproducible)
• Two parallel jobs — each on a separate runner

Push this once, and every PR thereafter triggers it automatically.


The Flake Killer

The thing that destroys CI cultures more than anything: flaky tests.

A flaky test sometimes passes and sometimes fails on identical code. Common causes:
• Timing dependencies (race conditions)
• Tests sharing database state
• Tests dependent on external services
• Test order dependencies
• Hardcoded ports that conflict

Why flakes are catastrophic:
• Engineers learn to "just retry" instead of investigating
• Real bugs get hidden — "oh, just a flake"
• Teams lose faith in CI
• Eventually CI is bypassed entirely

Defending against flakes:

1. Quarantine flaky tests immediately. Tag them, exclude from PR blocking, but track them. The team owns fixing them.

2. Use deterministic fixtures. Reset databases between tests. Don't rely on test order.

3. Mock external services. Use VCR-style libraries to record/replay HTTP responses.

4. Avoid sleep(). Use proper synchronization (await, polling with timeout, fake clocks).

5. Test failures should be loud. A failing flake is the same as a real fail — both are bugs.

6. Monitor flake rate. Track the % of CI runs that fail and pass on retry. Push it down ruthlessly.

A team with reliable CI ships fast. A team with flaky CI eventually stops trusting CI, and then it doesn't matter how good the pipeline is — nobody is paying attention.


⁂ Back to all modules