Shell Scripting & Automation
Bash scripting from variables to traps. The glue that holds DevOps automation together.
Why Bash Still Matters
In 2026, you might wonder: why learn bash when Python, Go, and TypeScript exist? Because bash is the ONLY language guaranteed to be everywhere — every Linux server, every Docker image, every CI runner, every Kubernetes init container.
When you SSH into a fresh EC2 instance to debug a problem, bash is what's there. When you write a Dockerfile RUN step, that's bash. When CI runs a "before" hook, bash. When systemd starts your app, bash launches it.
You don't need to write 1000-line bash programs (you shouldn't — switch to Python or Go around 200 lines). But you DO need to be comfortable enough that quick automation, glue scripts, and ad-hoc fixes flow naturally.
This lesson takes you from variables to trap handlers — enough to write production-quality automation.
Script Anatomy & The Shebang
Every shell script starts with a shebang line — #! followed by the interpreter path:
#!/bin/bash
echo "Hello, world"
Variants you'll see:
#!/bin/sh # POSIX sh — most portable, fewer features
#!/bin/bash # bash — most common on Linux servers
#!/usr/bin/env bash # finds bash in PATH — more portable across systems
#!/usr/bin/env python3 # works for Python too
Convention: use #!/usr/bin/env bash for portability, or #!/bin/bash if you control the environment.
To run the script:
chmod +x script.sh # mark executable (one time)
./script.sh # run it
bash script.sh # also works — runs even without +x
Strict mode — start every serious script with this:
#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'
What each does:
• -e — exit immediately on any command failure (no silent broken scripts)
• -u — error on referencing undefined variables (catches typos)
• -o pipefail — a pipeline fails if any command in it fails (without this, only the last command's status counts)
• IFS=$'\n\t' — safer word splitting (no surprises with spaces in filenames)
This single boilerplate is the difference between hobbyist and production bash. Adopt it.
Variables & Quoting
Variables are simple but full of subtle traps.
Defining and using:
NAME="alice" # no spaces around =
PORT=8080
MESSAGE="hello $NAME"
echo "$MESSAGE" # → hello alice
Always quote variable expansions:
FILE="my file.txt"
rm $FILE # WRONG — runs `rm my file.txt` = two arguments
rm "$FILE" # CORRECT — single argument
Single quotes vs double quotes:
• Double quotes — variables expand: "$NAME" → alice
• Single quotes — literal: '$NAME' → $NAME
Curly braces for clarity / disambiguation:
echo "${NAME}_backup" # → alice_backup
echo "$NAME_backup" # WRONG — bash looks for variable NAME_backup
Default values and parameter expansion:
NAME=${NAME:-"default"} # use NAME if set, otherwise "default"
NAME=${NAME:?"NAME required"} # error if NAME unset
FILE="path/to/document.tar.gz"
echo "${FILE##*/}" # → document.tar.gz (basename)
echo "${FILE%/*}" # → path/to (dirname)
echo "${FILE%.*}" # → path/to/document.tar (strip last extension)
echo "${FILE%%.*}" # → path/to/document (strip all extensions)
Command substitution — capture command output:
DATE=$(date +%Y-%m-%d)
USER_COUNT=$(wc -l < /etc/passwd)
LATEST=$(ls -t /var/log/*.log | head -1)
# Old-style backticks also work but $(...) is preferred (nestable):
DATE=`date +%Y-%m-%d`
Environment variables — passed to child processes:
export DATABASE_URL="postgres://..." # available to programs you run
unset DATABASE_URL # remove it
env # list all environment variables
printenv DATABASE_URL # print one
Conditionals
Bash's conditionals are quirky but straightforward once you've seen them.
If statements:
if [ "$NAME" = "alice" ]; then
echo "hi alice"
elif [ "$NAME" = "bob" ]; then
echo "hi bob"
else
echo "who are you"
fi
The double-bracket version ([[ ]]) is bash-specific and safer:
if [[ "$NAME" == "alice" ]]; then
echo "matches"
fi
Common test conditions:
# String tests
[[ -z "$VAR" ]] # empty / unset
[[ -n "$VAR" ]] # non-empty
[[ "$A" == "$B" ]] # equal
[[ "$A" != "$B" ]] # not equal
[[ "$A" =~ ^[0-9]+$ ]] # regex match (only in [[ ]])
# Numeric tests (note: -eq, -lt, NOT == or <)
[[ "$N" -eq 0 ]] # equal
[[ "$N" -gt 10 ]] # greater than
[[ "$N" -lt 5 ]] # less than
[[ "$N" -ge 0 ]] # greater or equal
[[ "$N" -ne 0 ]] # not equal
# File tests
[[ -e "$FILE" ]] # exists
[[ -f "$FILE" ]] # is regular file
[[ -d "$DIR" ]] # is directory
[[ -r "$FILE" ]] # readable
[[ -w "$FILE" ]] # writable
[[ -x "$FILE" ]] # executable
[[ -s "$FILE" ]] # exists and is non-empty
# Combining
[[ -f "$FILE" && -r "$FILE" ]] # AND
[[ -z "$A" || -z "$B" ]] # OR
[[ ! -d "$DIR" ]] # NOT
Case statements — cleaner than long if/elif chains:
case "$ENVIRONMENT" in
production|prod)
DB_URL="prod-db.example.com"
;;
staging|stage)
DB_URL="stage-db.example.com"
;;
dev|development|"")
DB_URL="localhost"
;;
*)
echo "Unknown environment: $ENVIRONMENT" >&2
exit 1
;;
esac
Exit codes — every command returns one:
some_command
echo "$?" # 0 = success, anything else = failure
if some_command; then
echo "succeeded"
fi
# One-liner: AND/OR shortcuts
some_command && echo "succeeded"
some_command || echo "failed"
mkdir -p /backup || exit 1
Loops
For loop over a list:
for FRUIT in apple banana cherry; do
echo "I like $FRUIT"
done
# Over files
for FILE in /var/log/*.log; do
echo "Processing $FILE"
gzip "$FILE"
done
# C-style — bash specific
for ((i=0; i<10; i++)); do
echo "$i"
done
# Range
for i in {1..10}; do echo $i; done
for i in {0..100..5}; do echo $i; done # step of 5
# From command output
for SVC in $(systemctl list-units --type=service --state=failed --no-legend | awk '{print $1}'); do
echo "Failed: $SVC"
done
While loop:
COUNT=0
while [[ $COUNT -lt 5 ]]; do
echo "count is $COUNT"
COUNT=$((COUNT + 1))
done
# Read a file line by line — the IDIOMATIC way
while IFS= read -r LINE; do
echo "got: $LINE"
done < input.txt
# Loop until something succeeds (with timeout)
TIMEOUT=60
ELAPSED=0
until curl -sf http://localhost:8080/health > /dev/null; do
if [[ $ELAPSED -ge $TIMEOUT ]]; then
echo "Service didn't come up in $TIMEOUT seconds"
exit 1
fi
sleep 1
ELAPSED=$((ELAPSED + 1))
done
echo "Service is healthy"
Watch out: don't pipe into a while loop without subshell awareness:
COUNT=0
cat file | while read line; do
COUNT=$((COUNT + 1))
done
echo $COUNT # ← still 0! pipe creates a subshell
# Better: use process substitution
COUNT=0
while read line; do
COUNT=$((COUNT + 1))
done < <(cat file)
echo $COUNT # ← actually counted
Functions & Arguments
Functions in bash:
greet() {
local NAME="$1"
local GREETING="${2:-Hello}"
echo "$GREETING, $NAME"
}
greet "Alice" # → Hello, Alice
greet "Bob" "Hi" # → Hi, Bob
Key points:
• local keeps variables function-scoped (without it, they're global)
• $1, $2, ... are positional arguments
• $@ is all arguments as separate strings (use "$@" to preserve spaces)
• $* is all arguments as one string
• $# is the count of arguments
• $0 is the script name
• Return a value via echo and capture with $(...). The return keyword sets exit status only.
Script arguments work the same:
#!/usr/bin/env bash
echo "Script: $0"
echo "First arg: $1"
echo "All args: $@"
echo "Count: $#"
Argument parsing — for serious scripts, use getopts or a library. For quick scripts:
ENVIRONMENT="dev"
DRY_RUN=false
while [[ $# -gt 0 ]]; do
case "$1" in
-e|--env)
ENVIRONMENT="$2"
shift 2
;;
--dry-run)
DRY_RUN=true
shift
;;
-h|--help)
echo "Usage: $0 [-e env] [--dry-run]"
exit 0
;;
*)
echo "Unknown: $1" >&2
exit 1
;;
esac
done
echo "Environment: $ENVIRONMENT, Dry run: $DRY_RUN"
Error Handling & Traps
set -euo pipefail (strict mode from earlier) catches most failures. For finer control:
Manual error handling:
some_command || {
echo "Command failed!" >&2
exit 1
}
# Or check explicitly
if ! some_command; then
echo "failed" >&2
exit 1
fi
stderr — error messages should go to stderr, not stdout:
echo "Something went wrong" >&2 # the >&2 redirects to stderr
This matters because callers may pipe stdout into another command — they want results, not error noise.
Trap — clean up on exit / signal:
TEMPDIR=$(mktemp -d)
cleanup() {
rm -rf "$TEMPDIR"
echo "Cleaned up $TEMPDIR"
}
trap cleanup EXIT # runs on normal exit, error, or interrupt
# ... do work in $TEMPDIR ...
# ... if anything fails (because of set -e), trap still fires
Traps fire on signals too:
trap 'echo Interrupted; exit 130' INT TERM
This is how you write scripts that:
• Always clean up temp files
• Always release locks
• Always log "exit cleanly" messages
• Handle Ctrl-C gracefully
Real-world example combining everything:
#!/usr/bin/env bash
set -euo pipefail
# Defaults
ENV="${1:-dev}"
LOG_DIR="/var/log/myapp"
WORK_DIR=$(mktemp -d)
# Cleanup on exit
trap 'rm -rf "$WORK_DIR"' EXIT
# Helpers
log() {
echo "[$(date +%FT%T)] $*" >&2
}
die() {
log "FATAL: $*"
exit 1
}
# Validate
[[ "$ENV" =~ ^(dev|staging|prod)$ ]] || die "Invalid env: $ENV"
[[ -d "$LOG_DIR" ]] || die "Log dir missing: $LOG_DIR"
# Work
log "Backing up logs from $ENV"
cd "$LOG_DIR"
tar czf "$WORK_DIR/logs-$(date +%F).tar.gz" *.log
log "Backup created"
# Upload (assume aws cli configured)
aws s3 cp "$WORK_DIR/logs-$(date +%F).tar.gz" "s3://my-backups/$ENV/" \
|| die "Upload failed"
log "Done"
When to Stop Using Bash
Bash is great for short scripts that orchestrate other commands. It's bad for:
- Complex data manipulation (use Python/Go)
- Anything with many edge cases (bash gets unreadable fast)
- Scripts that need unit tests (bash testing is painful)
- Anything over ~200 lines (you've outgrown bash)
When you hit those limits, write Python (with the same set -e discipline):
#!/usr/bin/env python3
# Backup script — Python version
import argparse
import subprocess
import sys
import tempfile
from pathlib import Path
from datetime import datetime
def run(cmd):
# Run a command, exit on failure
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode:
print(f"FAILED: {' '.join(cmd)}\n{result.stderr}", file=sys.stderr)
sys.exit(1)
return result.stdout
def main():
p = argparse.ArgumentParser()
p.add_argument('--env', default='dev', choices=['dev','staging','prod'])
args = p.parse_args()
with tempfile.TemporaryDirectory() as tmp:
archive = Path(tmp) / f"logs-{datetime.now():%F}.tar.gz"
run(['tar', 'czf', str(archive), '-C', '/var/log/myapp', '.'])
run(['aws', 's3', 'cp', str(archive), f's3://my-backups/{args.env}/'])
print("Done")
if __name__ == '__main__':
main()
Same script, more readable, testable, easier to extend with new features.
Rule of thumb: write the bash version first. When you reach for "I need a hash map / I need real error handling / this is getting complex" — that's the signal to rewrite in Python.
The next lesson covers Git — version control deeper than add/commit/push. Most DevOps automation is glue code that orchestrates Git, builds, deployments, and monitoring. Understanding all those primitives unlocks everything that follows.
⁂ Back to all modules