Home
DevOps & Cloud Engineering / Lesson 10 — Docker — Containers from First Principles

Docker — Containers from First Principles

Images, layers, networking, multi-stage builds. The technology that made 'works on my machine' obsolete.


What a Container Actually Is

A container is a process running on a Linux host with an isolated view of its environment. That's it. It's not a VM. It's a regular Linux process with kernel features (namespaces, cgroups) restricting what it can see and how much resource it can use.

Compare to a VM:

Text
   VM                                 Container
   ┌─────────────────┐               ┌─────────────────┐
   │ Application     │               │ Application     │
   ├─────────────────┤               ├─────────────────┤
   │ Guest OS kernel │               │ (uses host      │
   │ (full Linux)    │               │  kernel)        │
   ├─────────────────┤               └─────────────────┘
   │ Virtualization  │               
   │ (hypervisor)    │               Shares host kernel
   ├─────────────────┤               Starts in milliseconds
   │ Host OS         │               Tiny memory overhead
   └─────────────────┘               Image is ~50MB-1GB
                                      
   Boots in seconds                   
   GBs of memory                      
   GBs of disk                        

Why this matters:
• A container starts in 100ms, a VM in 30+ seconds
• A container adds ~10MB overhead, a VM adds 100MB+
• You can run dozens of containers per host
• An image of 100MB packages your whole app + its dependencies

The kernel features that make it work:
• namespaces — isolate what a process can see (PIDs, network, mounts, users)
• cgroups — limit what a process can use (CPU, memory, I/O)
• Union filesystem — efficient layered image storage

Docker is the most popular tool for working with containers, but containers are a Linux feature, not a Docker feature. Other runtimes (containerd, CRI-O, podman) all do the same fundamental thing.


Images vs Containers

Images and containers are like classes and instances:

Lifecycle:

Bash
# Pull an image from a registry
docker pull nginx:1.27

# Run a container from the image
docker run -d -p 8080:80 --name web nginx:1.27

# Container is now running. Inspect it:
docker ps                   # running containers
docker logs web             # container's stdout/stderr
docker exec -it web bash    # shell inside the running container
docker stats web            # CPU, memory, network usage

# Stop and remove
docker stop web
docker rm web

# Or in one go
docker rm -f web

Images are stored locally and in registries:

Bash
docker images               # local images
docker pull <image>         # download
docker push <image>         # upload to registry
docker rmi <image>          # delete local image

Image names follow the pattern: [registry/]namespace/image:tag
- nginx:1.27 — Docker Hub default registry, library/nginx, tag 1.27
- gcr.io/my-project/api:v2 — Google Container Registry
- 123456.dkr.ecr.us-east-1.amazonaws.com/api:abc123 — AWS ECR

Always tag images explicitly. The default latest is a moving target and causes "it worked yesterday" bugs.


Dockerfile — Building Your Own Images

A Dockerfile is a script that builds an image. Each instruction creates a layer.

A simple Node.js app:

Bash
# Use an official base image
FROM node:20-alpine

# Set working directory inside the container
WORKDIR /app

# Copy package files first — layer caching
COPY package*.json ./
RUN npm ci --only=production

# Copy app source
COPY . .

# Document which port the app listens on
EXPOSE 3000

# Default command when container starts
CMD ["node", "server.js"]

Build it:

Bash
docker build -t myapp:v1 .

The . is the build context — Docker tarballs everything in this directory and sends it to the daemon. Use .dockerignore to exclude things you don't want copied (node_modules, .git, secrets):

Text
# .dockerignore
node_modules
.git
.env
*.log
README.md

Key Dockerfile instructions:
FROM — base image (always required, first non-comment line)
WORKDIR — set the directory for subsequent commands
COPY / ADD — copy files from build context into image
RUN — execute a shell command at build time
ENV — set environment variables
ARG — build-time variables (not in final image)
EXPOSE — document port (doesn't actually publish — that's -p at runtime)
USER — switch to a non-root user (security)
ENTRYPOINT / CMD — what runs when the container starts

ENTRYPOINT vs CMD:
• Use CMD for the typical default command
• Use ENTRYPOINT when you want the image to behave like an executable
• If both: ENTRYPOINT is the command, CMD provides default args


Layers and Caching

Each instruction in your Dockerfile creates a layer. Docker caches layers — if the inputs to a layer haven't changed, it reuses the cached layer.

This makes layer ordering critical. Bad:

Bash
FROM node:20-alpine
WORKDIR /app
COPY . .                    # any source change invalidates this
RUN npm ci                  # ...and re-runs install every time

Every code change means re-installing dependencies. Slow.

Good:

Bash
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./       # only invalidates when deps change
RUN npm ci                  # cache hit unless package*.json changed
COPY . .                    # source changes invalidate only this
CMD ["node", "server.js"]

Now npm ci is cached as long as package*.json hasn't changed. Builds drop from minutes to seconds.

The general rule: order instructions from least-to-most-changing. Stable things first (base image, deps), changing things last (source code).

Inspect layers:

Bash
docker history myapp:v1
docker inspect myapp:v1

Each RUN, COPY, etc. creates a new layer. To minimize layers and image size, combine related commands:

Bash
# Bad — three layers, each layer keeps the apt cache
RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*

# Good — one layer, cache cleaned in same layer
RUN apt-get update && \
    apt-get install -y curl && \
    rm -rf /var/lib/apt/lists/*

Multi-Stage Builds

Production images shouldn't contain build tools. Multi-stage builds let you compile in one stage, copy artifacts to a slim final image:

Bash
# Stage 1: build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Stage 2: runtime
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY --from=builder /app/dist ./dist
USER node                              # don't run as root
EXPOSE 3000
CMD ["node", "dist/server.js"]

The final image only contains:
• Node.js runtime (alpine variant: ~50MB)
• Production deps
• Compiled output

Build tools, dev deps, source code — none in final. A 500MB build stage produces a 100MB runtime image.

Even smaller: distroless images contain ONLY your app and its runtime, no shell, no package manager:

Bash
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o /myapp

FROM gcr.io/distroless/static-debian12
COPY --from=builder /myapp /myapp
ENTRYPOINT ["/myapp"]

Resulting image is ~10-20MB and has no shell — much smaller attack surface. Standard for Go services in 2026.

Other slim base options:
alpine — ~5MB, has busybox shell, common
debian:slim — ~30MB, full glibc, useful for Python/Node
distroless — minimal, no shell, secure
scratch — empty, only for fully static binaries (Go, Rust)


Networking

By default, Docker creates a bridge network so containers can talk to each other and the host.

Bash
docker network ls               # list networks
docker network inspect bridge   # see details

Common patterns:

Publish a port to the host:

Bash
docker run -d -p 8080:80 nginx     # host:8080 → container:80
docker run -d -p 80:80 nginx       # privileged port (root)
docker run -d -P nginx             # publish all exposed ports to random host ports

Containers on the same custom network can reach each other by name:

Bash
docker network create app-net

docker run -d --name db --network app-net postgres:16
docker run -d --name api --network app-net -e DB_HOST=db myapp

# Inside the api container, "db" resolves to the postgres container's IP

DNS resolution between containers works on user-defined networks but not on the default bridge. Always create a network for multi-container apps.

Talking to the host from inside a container:
• Linux: use host.docker.internal (recent Docker versions) or the docker0 bridge IP
• macOS / Windows: host.docker.internal works natively

Volumes — persistent storage:

Bash
# Named volume
docker volume create pgdata
docker run -d -v pgdata:/var/lib/postgresql/data postgres:16

# Bind mount a host directory
docker run -d -v $(pwd)/code:/app -p 3000:3000 node:20

# Tmpfs (in-memory, gone when container stops)
docker run --tmpfs /tmp myapp

Use named volumes for persistent data (databases). Use bind mounts for development (hot-reloading code from host).


Image Tagging & Registries

Production images need consistent tagging so you know what's deployed.

Bad: only latest tag

Bash
docker build -t myapp:latest .
docker push myapp:latest

Now you can never roll back — every deploy overwrites the same tag.

Good: immutable tags + a moving alias

Bash
SHA=$(git rev-parse --short HEAD)
docker build -t myapp:$SHA -t myapp:latest .
docker push myapp:$SHA
docker push myapp:latest

Now you can pin deploys to specific commits:

YAML
# Kubernetes
image: myapp:abc1234

Common tag schemes:
• Git SHA: abc1234 — every commit gets a unique tag
• Semantic version: v1.2.3 — for releases
• Branch name: main, develop — for latest of a branch
• Combination: v1.2.3-abc1234 — version + commit

Container registries:
• Docker Hub — public, free for public repos
• GitHub Container Registry (ghcr.io) — free for public, integrates with Actions
• AWS ECR — AWS-native, IAM-integrated
• Google Artifact Registry — GCP-native
• Azure Container Registry — Azure-native
• Self-hosted: Harbor (open source), JFrog Artifactory

Authentication:

Bash
# Docker Hub
docker login

# AWS ECR
aws ecr get-login-password | docker login --username AWS --password-stdin 123456.dkr.ecr.us-east-1.amazonaws.com

# Google
gcloud auth configure-docker

Production-Ready Dockerfile Template

A solid template combining everything:

Bash
# syntax=docker/dockerfile:1.6
ARG NODE_VERSION=20

# ─── BUILD STAGE ───────────────────────────────────────────────
FROM node:${NODE_VERSION}-alpine AS builder
WORKDIR /app

# Install only build deps
COPY package*.json ./
RUN --mount=type=cache,target=/root/.npm \
    npm ci

COPY . .
RUN npm run build

# ─── RUNTIME STAGE ─────────────────────────────────────────────
FROM node:${NODE_VERSION}-alpine

# Don't run as root
RUN addgroup -S app && adduser -S app -G app
USER app
WORKDIR /app

# Only production deps
COPY --chown=app:app package*.json ./
RUN --mount=type=cache,target=/root/.npm \
    npm ci --only=production && \
    npm cache clean --force

# Compiled output
COPY --from=builder --chown=app:app /app/dist ./dist

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD wget --quiet --spider http://localhost:3000/health || exit 1

EXPOSE 3000
ENV NODE_ENV=production
CMD ["node", "dist/server.js"]

Notes:
# syntax=docker/dockerfile:1.6 — enable BuildKit features
--mount=type=cache — persistent cache across builds (faster CI)
• Non-root user — security
HEALTHCHECK — Docker can monitor the container's health
ENV NODE_ENV=production — many libraries optimize based on this

The next lesson covers image security — once you're building images, securing them is critical.


⁂ Back to all modules