Kubernetes in Practice
Helm, autoscaling, persistent volumes, RBAC — what you actually need to run production K8s.
Helm — Templating Your YAML
K8s YAML is verbose. Multiply it across environments (dev, staging, production) and you copy-paste a lot.
Helm is the package manager for Kubernetes. You write templates with variables; Helm renders them with values.
Anatomy of a chart:
my-chart/
Chart.yaml # metadata
values.yaml # default values
values-prod.yaml # production overrides
templates/
deployment.yaml # uses {{ .Values.image.tag }}
service.yaml
ingress.yaml
A templated deployment:
# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ .Release.Name }}
spec:
replicas: {{ .Values.replicaCount }}
template:
spec:
containers:
- name: app
image: {{ .Values.image.repository }}:{{ .Values.image.tag }}
resources:
{{- toYaml .Values.resources | nindent 12 }}
Default values:
# values.yaml
replicaCount: 1
image:
repository: myapp
tag: latest
resources:
requests:
cpu: 100m
memory: 128Mi
Production overrides:
# values-prod.yaml
replicaCount: 5
resources:
requests:
cpu: 500m
memory: 512Mi
Install:
helm install myapp ./my-chart # uses defaults
helm install myapp ./my-chart -f values-prod.yaml # production values
helm upgrade myapp ./my-chart -f values-prod.yaml # update
helm rollback myapp 1 # rollback to revision 1
helm uninstall myapp # remove
Use community charts via Helm Hub (https://artifacthub.io):
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install postgres bitnami/postgresql --set auth.password=secret
Alternative: Kustomize (built into kubectl) — patches base YAML with overlays. Simpler than Helm for small overlays, but Helm wins for distributable charts.
Autoscaling
K8s offers three kinds of autoscaling:
1. Horizontal Pod Autoscaler (HPA) — adds/removes Pods based on metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # scale to keep avg CPU at 70%
Custom metrics — scale on requests/sec or queue depth:
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
Requires a metrics adapter (Prometheus Adapter is the standard).
2. Vertical Pod Autoscaler (VPA) — adjusts resource requests/limits
Useful for finding the right size, but restart on changes makes it tricky for stateful workloads.
3. Cluster Autoscaler — adds/removes NODES from the cluster
When Pods can't be scheduled (pending due to no capacity), add a node. When nodes are underutilized, remove them.
Managed K8s (EKS, GKE) often handles this automatically.
KEDA (Kubernetes Event-Driven Autoscaling) — scale based on external events (Kafka lag, queue depth, cron schedules). Increasingly popular in 2026 for event-driven workloads.
Persistent Storage
Pods are ephemeral. For data that must survive Pod restarts, use PersistentVolumes.
The flow:
1. Cluster admin (or cloud provider) defines StorageClasses (e.g., AWS gp3, GCP SSD).
2. App developer requests storage via a PersistentVolumeClaim (PVC).
3. K8s provisions a matching PersistentVolume (PV) — typically a cloud disk.
4. Pod mounts the PVC as a volume.
Example for a database:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
spec:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 100Gi
storageClassName: gp3
---
apiVersion: apps/v1
kind: StatefulSet # not Deployment — see below
metadata:
name: postgres
spec:
serviceName: postgres
replicas: 1
selector:
matchLabels: { app: postgres }
template:
metadata:
labels: { app: postgres }
spec:
containers:
- name: postgres
image: postgres:16
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata: { name: data }
spec:
accessModes: [ReadWriteOnce]
storageClassName: gp3
resources:
requests:
storage: 100Gi
StatefulSet vs Deployment:
• Deployment — Pods are interchangeable. Each gets a random name.
• StatefulSet — Pods have stable identities (myapp-0, myapp-1) and stable storage. Used for databases, queues, anything that cares about its identity.
Honest note: running stateful systems (Postgres, Kafka) in K8s is doable but operationally complex. Most teams use managed services (RDS, MSK, Cloud SQL) for stateful infrastructure and only run stateless apps on K8s. This is fine and recommended.
RBAC — Who Can Do What
Role-Based Access Control limits what users (and ServiceAccounts) can do.
Three concepts:
• Role / ClusterRole — a set of permissions
• RoleBinding / ClusterRoleBinding — assign a Role to a user/group/SA
• ServiceAccount — identity for processes (your Pods)
Read-only access for a developer:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: viewer
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: alice-viewer
subjects:
- kind: User
name: alice@example.com
roleRef:
kind: ClusterRole
name: viewer
apiGroup: rbac.authorization.k8s.io
For your apps — give them only the API access they need:
apiVersion: v1
kind: ServiceAccount
metadata:
name: myapp
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: myapp-config-reader
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list"]
resourceNames: ["app-config"] # specific config, not all
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: myapp
subjects:
- kind: ServiceAccount
name: myapp
roleRef:
kind: Role
name: myapp-config-reader
apiGroup: rbac.authorization.k8s.io
Reference the ServiceAccount in your Pod spec:
spec:
serviceAccountName: myapp
Default ServiceAccount has more permissions than most apps need. Always create a dedicated one and grant minimum permissions.
Tools that audit RBAC:
• kubectl-who-can — "who can delete pods in production?"
• rbac-lookup — comprehensive RBAC inspection
Deploys & Rollouts
When you change a Deployment's image, K8s does a rolling update by default:
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25% # at most 25% of Pods can be down
maxSurge: 25% # at most 25% extra Pods during rollout
Trigger a deploy by changing the image:
kubectl set image deployment/myapp app=myapp:v2
# OR
kubectl apply -f deployment.yml # with new image tag
Watch the rollout:
kubectl rollout status deployment/myapp
kubectl rollout history deployment/myapp
kubectl rollout undo deployment/myapp # rollback to previous
kubectl rollout undo deployment/myapp --to-revision=2
For more advanced strategies (canary, blue-green, automated rollback):
• Argo Rollouts — purpose-built progressive delivery
• Flagger — works with service meshes (Istio, Linkerd)
Argo Rollouts example for canary:
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: myapp
spec:
strategy:
canary:
steps:
- setWeight: 10
- pause: { duration: 5m }
- setWeight: 50
- pause: { duration: 10m }
- setWeight: 100
analysis:
templates:
- templateName: error-rate-check # if error rate spikes, rollback
Production K8s deploys go through tools like ArgoCD or Flux (GitOps) — covered in a later lesson — rather than direct kubectl. Git is the source of truth; the cluster reconciles to match.
Networking & Service Mesh
We covered Services and Ingress in the previous lesson. Two more pieces:
NetworkPolicy — firewall rules between Pods
By default, every Pod can reach every other Pod. NetworkPolicies restrict this.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: db-from-api-only
spec:
podSelector:
matchLabels:
app: postgres
policyTypes: [Ingress]
ingress:
- from:
- podSelector:
matchLabels:
app: api
ports:
- port: 5432
Now only Pods labeled app=api can reach the postgres Pods on port 5432. Defense in depth.
NetworkPolicies require a CNI plugin that supports them (Calico, Cilium). Some default cloud K8s installs (basic GKE, basic EKS) don't support them out of the box.
Service Mesh — for advanced traffic management
A service mesh adds a sidecar proxy to every Pod. The proxy handles encryption (mTLS between services), retries, traffic splitting, observability — without app code changes.
Popular meshes:
• Istio — feature-rich, complex, heavy
• Linkerd — simpler, lighter
• Cilium Service Mesh — uses eBPF, no sidecars, very efficient
When to adopt:
• You have many services (10+)
• You need mTLS everywhere (compliance)
• You want fine-grained traffic control (canary by header, region-based routing)
• You can afford the operational complexity
When NOT to:
• You have 2 services. Don't.
• Your team is still figuring out basic K8s.
• You haven't measured what mesh would actually solve.
Service meshes are one of those things that look essential at conferences and feel like overhead in production. Adopt deliberately.
Operating K8s — What Goes Wrong
The most common K8s production issues:
1. ImagePullBackOff — can't pull the image
- Wrong tag, missing image, registry auth issue
- Check: `kubectl describe pod <name>`
2. CrashLoopBackOff — container starts and immediately dies
- App bug, missing config, can't reach dependency
- Check: `kubectl logs <pod> --previous`
3. Pending Pods — can't schedule
- No node has enough resources
- Node selector / taints/tolerations mismatch
- Cluster autoscaler should add nodes; if not, investigate
4. OOMKilled — container killed for using too much memory
- Memory limit too low or app has a leak
- Check: `kubectl describe pod <name>` (Last State: Terminated, Reason: OOMKilled)
5. Probe failures
- Readiness fails → Pod marked NotReady, taken out of Service
- Liveness fails → Pod restarts repeatedly
- Often: probe too aggressive, app slow to start, or actual bug
6. Resource exhaustion on nodes
- Even Pods that fit can be evicted if nodes get OOM
- QoS class matters here — Guaranteed survives, BestEffort dies first
7. Networking weirdness
- DNS issues — `kubectl exec ... nslookup myservice`
- NetworkPolicy blocking unexpected traffic
- Service selector doesn't match Pod labels
The debugging mantra:
kubectl get pods # broad view
kubectl describe pod <name> # what's going on
kubectl logs <pod> # what did it say
kubectl logs <pod> --previous # what did the LAST instance say
kubectl get events --sort-by='.metadata.creationTimestamp'
kubectl exec -it <pod> -- sh # poke around inside
Honest take on K8s: it's powerful but it has a steep learning curve and a long tail of operational concerns. Use managed K8s (EKS, GKE, AKS) — the cloud provider handles the control plane and most of the upgrade pain. Even better, consider whether simpler platforms (Cloud Run, ECS, Fly.io, Render) cover your needs.
⁂ Back to all modules