Home
DevOps & Cloud Engineering / Lesson 14 — Kubernetes in Practice

Kubernetes in Practice

Helm, autoscaling, persistent volumes, RBAC — what you actually need to run production K8s.


Helm — Templating Your YAML

K8s YAML is verbose. Multiply it across environments (dev, staging, production) and you copy-paste a lot.

Helm is the package manager for Kubernetes. You write templates with variables; Helm renders them with values.

Anatomy of a chart:

Text
my-chart/
  Chart.yaml              # metadata
  values.yaml             # default values
  values-prod.yaml        # production overrides
  templates/
    deployment.yaml       # uses {{ .Values.image.tag }}
    service.yaml
    ingress.yaml

A templated deployment:

YAML
# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}
spec:
  replicas: {{ .Values.replicaCount }}
  template:
    spec:
      containers:
        - name: app
          image: {{ .Values.image.repository }}:{{ .Values.image.tag }}
          resources:
            {{- toYaml .Values.resources | nindent 12 }}

Default values:

YAML
# values.yaml
replicaCount: 1
image:
  repository: myapp
  tag: latest
resources:
  requests:
    cpu: 100m
    memory: 128Mi

Production overrides:

YAML
# values-prod.yaml
replicaCount: 5
resources:
  requests:
    cpu: 500m
    memory: 512Mi

Install:

Bash
helm install myapp ./my-chart                          # uses defaults
helm install myapp ./my-chart -f values-prod.yaml     # production values
helm upgrade myapp ./my-chart -f values-prod.yaml     # update
helm rollback myapp 1                                  # rollback to revision 1
helm uninstall myapp                                   # remove

Use community charts via Helm Hub (https://artifacthub.io):

Bash
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install postgres bitnami/postgresql --set auth.password=secret

Alternative: Kustomize (built into kubectl) — patches base YAML with overlays. Simpler than Helm for small overlays, but Helm wins for distributable charts.


Autoscaling

K8s offers three kinds of autoscaling:

1. Horizontal Pod Autoscaler (HPA) — adds/removes Pods based on metrics

YAML
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70   # scale to keep avg CPU at 70%

Custom metrics — scale on requests/sec or queue depth:

YAML
metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"

Requires a metrics adapter (Prometheus Adapter is the standard).

Snippet
2. Vertical Pod Autoscaler (VPA) — adjusts resource requests/limits
   Useful for finding the right size, but restart on changes makes it tricky for stateful workloads.
Snippet
3. Cluster Autoscaler — adds/removes NODES from the cluster
   When Pods can't be scheduled (pending due to no capacity), add a node. When nodes are underutilized, remove them.
   Managed K8s (EKS, GKE) often handles this automatically.

KEDA (Kubernetes Event-Driven Autoscaling) — scale based on external events (Kafka lag, queue depth, cron schedules). Increasingly popular in 2026 for event-driven workloads.


Persistent Storage

Pods are ephemeral. For data that must survive Pod restarts, use PersistentVolumes.

The flow:
1. Cluster admin (or cloud provider) defines StorageClasses (e.g., AWS gp3, GCP SSD).
2. App developer requests storage via a PersistentVolumeClaim (PVC).
3. K8s provisions a matching PersistentVolume (PV) — typically a cloud disk.
4. Pod mounts the PVC as a volume.

Example for a database:

YAML
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 100Gi
  storageClassName: gp3

---
apiVersion: apps/v1
kind: StatefulSet           # not Deployment — see below
metadata:
  name: postgres
spec:
  serviceName: postgres
  replicas: 1
  selector:
    matchLabels: { app: postgres }
  template:
    metadata:
      labels: { app: postgres }
    spec:
      containers:
        - name: postgres
          image: postgres:16
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
    - metadata: { name: data }
      spec:
        accessModes: [ReadWriteOnce]
        storageClassName: gp3
        resources:
          requests:
            storage: 100Gi

StatefulSet vs Deployment:
• Deployment — Pods are interchangeable. Each gets a random name.
• StatefulSet — Pods have stable identities (myapp-0, myapp-1) and stable storage. Used for databases, queues, anything that cares about its identity.

Honest note: running stateful systems (Postgres, Kafka) in K8s is doable but operationally complex. Most teams use managed services (RDS, MSK, Cloud SQL) for stateful infrastructure and only run stateless apps on K8s. This is fine and recommended.


RBAC — Who Can Do What

Role-Based Access Control limits what users (and ServiceAccounts) can do.

Three concepts:
• Role / ClusterRole — a set of permissions
• RoleBinding / ClusterRoleBinding — assign a Role to a user/group/SA
• ServiceAccount — identity for processes (your Pods)

Read-only access for a developer:

YAML
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: viewer
rules:
  - apiGroups: ["*"]
    resources: ["*"]
    verbs: ["get", "list", "watch"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: alice-viewer
subjects:
  - kind: User
    name: alice@example.com
roleRef:
  kind: ClusterRole
  name: viewer
  apiGroup: rbac.authorization.k8s.io

For your apps — give them only the API access they need:

YAML
apiVersion: v1
kind: ServiceAccount
metadata:
  name: myapp

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: myapp-config-reader
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "list"]
    resourceNames: ["app-config"]    # specific config, not all

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: myapp
subjects:
  - kind: ServiceAccount
    name: myapp
roleRef:
  kind: Role
  name: myapp-config-reader
  apiGroup: rbac.authorization.k8s.io

Reference the ServiceAccount in your Pod spec:

YAML
spec:
  serviceAccountName: myapp

Default ServiceAccount has more permissions than most apps need. Always create a dedicated one and grant minimum permissions.

Tools that audit RBAC:
• kubectl-who-can — "who can delete pods in production?"
• rbac-lookup — comprehensive RBAC inspection


Deploys & Rollouts

When you change a Deployment's image, K8s does a rolling update by default:

YAML
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%   # at most 25% of Pods can be down
      maxSurge: 25%         # at most 25% extra Pods during rollout

Trigger a deploy by changing the image:

Bash
kubectl set image deployment/myapp app=myapp:v2
# OR
kubectl apply -f deployment.yml   # with new image tag

Watch the rollout:

Bash
kubectl rollout status deployment/myapp
kubectl rollout history deployment/myapp
kubectl rollout undo deployment/myapp           # rollback to previous
kubectl rollout undo deployment/myapp --to-revision=2

For more advanced strategies (canary, blue-green, automated rollback):
• Argo Rollouts — purpose-built progressive delivery
• Flagger — works with service meshes (Istio, Linkerd)

Argo Rollouts example for canary:

YAML
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: myapp
spec:
  strategy:
    canary:
      steps:
        - setWeight: 10
        - pause: { duration: 5m }
        - setWeight: 50
        - pause: { duration: 10m }
        - setWeight: 100
      analysis:
        templates:
          - templateName: error-rate-check    # if error rate spikes, rollback

Production K8s deploys go through tools like ArgoCD or Flux (GitOps) — covered in a later lesson — rather than direct kubectl. Git is the source of truth; the cluster reconciles to match.


Networking & Service Mesh

We covered Services and Ingress in the previous lesson. Two more pieces:

NetworkPolicy — firewall rules between Pods
By default, every Pod can reach every other Pod. NetworkPolicies restrict this.

YAML
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: db-from-api-only
spec:
  podSelector:
    matchLabels:
      app: postgres
  policyTypes: [Ingress]
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: api
      ports:
        - port: 5432

Now only Pods labeled app=api can reach the postgres Pods on port 5432. Defense in depth.

NetworkPolicies require a CNI plugin that supports them (Calico, Cilium). Some default cloud K8s installs (basic GKE, basic EKS) don't support them out of the box.

Service Mesh — for advanced traffic management
A service mesh adds a sidecar proxy to every Pod. The proxy handles encryption (mTLS between services), retries, traffic splitting, observability — without app code changes.

Popular meshes:
• Istio — feature-rich, complex, heavy
• Linkerd — simpler, lighter
• Cilium Service Mesh — uses eBPF, no sidecars, very efficient

When to adopt:
• You have many services (10+)
• You need mTLS everywhere (compliance)
• You want fine-grained traffic control (canary by header, region-based routing)
• You can afford the operational complexity

When NOT to:
• You have 2 services. Don't.
• Your team is still figuring out basic K8s.
• You haven't measured what mesh would actually solve.

Service meshes are one of those things that look essential at conferences and feel like overhead in production. Adopt deliberately.


Operating K8s — What Goes Wrong

The most common K8s production issues:

Snippet
1. ImagePullBackOff — can't pull the image
   - Wrong tag, missing image, registry auth issue
   - Check: `kubectl describe pod <name>`
Snippet
2. CrashLoopBackOff — container starts and immediately dies
   - App bug, missing config, can't reach dependency
   - Check: `kubectl logs <pod> --previous`
Snippet
3. Pending Pods — can't schedule
   - No node has enough resources
   - Node selector / taints/tolerations mismatch
   - Cluster autoscaler should add nodes; if not, investigate
Snippet
4. OOMKilled — container killed for using too much memory
   - Memory limit too low or app has a leak
   - Check: `kubectl describe pod <name>` (Last State: Terminated, Reason: OOMKilled)
Snippet
5. Probe failures
   - Readiness fails → Pod marked NotReady, taken out of Service
   - Liveness fails → Pod restarts repeatedly
   - Often: probe too aggressive, app slow to start, or actual bug
Snippet
6. Resource exhaustion on nodes
   - Even Pods that fit can be evicted if nodes get OOM
   - QoS class matters here — Guaranteed survives, BestEffort dies first
Snippet
7. Networking weirdness
   - DNS issues — `kubectl exec ... nslookup myservice`
   - NetworkPolicy blocking unexpected traffic
   - Service selector doesn't match Pod labels

The debugging mantra:

Bash
kubectl get pods                  # broad view
kubectl describe pod <name>       # what's going on
kubectl logs <pod>                # what did it say
kubectl logs <pod> --previous     # what did the LAST instance say
kubectl get events --sort-by='.metadata.creationTimestamp'
kubectl exec -it <pod> -- sh      # poke around inside

Honest take on K8s: it's powerful but it has a steep learning curve and a long tail of operational concerns. Use managed K8s (EKS, GKE, AKS) — the cloud provider handles the control plane and most of the upgrade pain. Even better, consider whether simpler platforms (Cloud Run, ECS, Fly.io, Render) cover your needs.


⁂ Back to all modules