Home
DevOps & Cloud Engineering / Lesson 4 — Shell Scripting & Automation

Shell Scripting & Automation

Bash scripting from variables to traps. The glue that holds DevOps automation together.


Why Bash Still Matters

In 2026, you might wonder: why learn bash when Python, Go, and TypeScript exist? Because bash is the ONLY language guaranteed to be everywhere — every Linux server, every Docker image, every CI runner, every Kubernetes init container.

When you SSH into a fresh EC2 instance to debug a problem, bash is what's there. When you write a Dockerfile RUN step, that's bash. When CI runs a "before" hook, bash. When systemd starts your app, bash launches it.

You don't need to write 1000-line bash programs (you shouldn't — switch to Python or Go around 200 lines). But you DO need to be comfortable enough that quick automation, glue scripts, and ad-hoc fixes flow naturally.

This lesson takes you from variables to trap handlers — enough to write production-quality automation.


Script Anatomy & The Shebang

Every shell script starts with a shebang line — #! followed by the interpreter path:

Bash
#!/bin/bash
echo "Hello, world"

Variants you'll see:

Bash
#!/bin/sh                # POSIX sh — most portable, fewer features
#!/bin/bash              # bash — most common on Linux servers
#!/usr/bin/env bash      # finds bash in PATH — more portable across systems
#!/usr/bin/env python3   # works for Python too

Convention: use #!/usr/bin/env bash for portability, or #!/bin/bash if you control the environment.

To run the script:

Bash
chmod +x script.sh        # mark executable (one time)
./script.sh               # run it
bash script.sh            # also works — runs even without +x

Strict mode — start every serious script with this:

Bash
#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'

What each does:
-e — exit immediately on any command failure (no silent broken scripts)
-u — error on referencing undefined variables (catches typos)
-o pipefail — a pipeline fails if any command in it fails (without this, only the last command's status counts)
IFS=$'\n\t' — safer word splitting (no surprises with spaces in filenames)

This single boilerplate is the difference between hobbyist and production bash. Adopt it.


Variables & Quoting

Variables are simple but full of subtle traps.

Defining and using:

Bash
NAME="alice"             # no spaces around =
PORT=8080
MESSAGE="hello $NAME"
echo "$MESSAGE"          # → hello alice

Always quote variable expansions:

Bash
FILE="my file.txt"
rm $FILE                 # WRONG — runs `rm my file.txt` = two arguments
rm "$FILE"               # CORRECT — single argument

Single quotes vs double quotes:
• Double quotes — variables expand: "$NAME"alice
• Single quotes — literal: '$NAME'$NAME

Curly braces for clarity / disambiguation:

Bash
echo "${NAME}_backup"    # → alice_backup
echo "$NAME_backup"      # WRONG — bash looks for variable NAME_backup

Default values and parameter expansion:

Bash
NAME=${NAME:-"default"}        # use NAME if set, otherwise "default"
NAME=${NAME:?"NAME required"}  # error if NAME unset

FILE="path/to/document.tar.gz"
echo "${FILE##*/}"             # → document.tar.gz   (basename)
echo "${FILE%/*}"              # → path/to           (dirname)
echo "${FILE%.*}"              # → path/to/document.tar  (strip last extension)
echo "${FILE%%.*}"             # → path/to/document      (strip all extensions)

Command substitution — capture command output:

Bash
DATE=$(date +%Y-%m-%d)
USER_COUNT=$(wc -l < /etc/passwd)
LATEST=$(ls -t /var/log/*.log | head -1)

# Old-style backticks also work but $(...) is preferred (nestable):
DATE=`date +%Y-%m-%d`

Environment variables — passed to child processes:

Bash
export DATABASE_URL="postgres://..."   # available to programs you run
unset DATABASE_URL                     # remove it

env                                    # list all environment variables
printenv DATABASE_URL                  # print one

Conditionals

Bash's conditionals are quirky but straightforward once you've seen them.

If statements:

Bash
if [ "$NAME" = "alice" ]; then
    echo "hi alice"
elif [ "$NAME" = "bob" ]; then
    echo "hi bob"
else
    echo "who are you"
fi

The double-bracket version ([[ ]]) is bash-specific and safer:

Bash
if [[ "$NAME" == "alice" ]]; then
    echo "matches"
fi

Common test conditions:

Bash
# String tests
[[ -z "$VAR" ]]              # empty / unset
[[ -n "$VAR" ]]              # non-empty
[[ "$A" == "$B" ]]           # equal
[[ "$A" != "$B" ]]           # not equal
[[ "$A" =~ ^[0-9]+$ ]]       # regex match (only in [[ ]])

# Numeric tests (note: -eq, -lt, NOT == or <)
[[ "$N" -eq 0 ]]             # equal
[[ "$N" -gt 10 ]]            # greater than
[[ "$N" -lt 5 ]]             # less than
[[ "$N" -ge 0 ]]             # greater or equal
[[ "$N" -ne 0 ]]             # not equal

# File tests
[[ -e "$FILE" ]]             # exists
[[ -f "$FILE" ]]             # is regular file
[[ -d "$DIR" ]]              # is directory
[[ -r "$FILE" ]]             # readable
[[ -w "$FILE" ]]             # writable
[[ -x "$FILE" ]]             # executable
[[ -s "$FILE" ]]             # exists and is non-empty

# Combining
[[ -f "$FILE" && -r "$FILE" ]]   # AND
[[ -z "$A" || -z "$B" ]]         # OR
[[ ! -d "$DIR" ]]                # NOT

Case statements — cleaner than long if/elif chains:

Bash
case "$ENVIRONMENT" in
    production|prod)
        DB_URL="prod-db.example.com"
        ;;
    staging|stage)
        DB_URL="stage-db.example.com"
        ;;
    dev|development|"")
        DB_URL="localhost"
        ;;
    *)
        echo "Unknown environment: $ENVIRONMENT" >&2
        exit 1
        ;;
esac

Exit codes — every command returns one:

Bash
some_command
echo "$?"                # 0 = success, anything else = failure

if some_command; then
    echo "succeeded"
fi

# One-liner: AND/OR shortcuts
some_command && echo "succeeded"
some_command || echo "failed"
mkdir -p /backup || exit 1

Loops

For loop over a list:

Bash
for FRUIT in apple banana cherry; do
    echo "I like $FRUIT"
done

# Over files
for FILE in /var/log/*.log; do
    echo "Processing $FILE"
    gzip "$FILE"
done

# C-style — bash specific
for ((i=0; i<10; i++)); do
    echo "$i"
done

# Range
for i in {1..10}; do echo $i; done
for i in {0..100..5}; do echo $i; done   # step of 5

# From command output
for SVC in $(systemctl list-units --type=service --state=failed --no-legend | awk '{print $1}'); do
    echo "Failed: $SVC"
done

While loop:

Bash
COUNT=0
while [[ $COUNT -lt 5 ]]; do
    echo "count is $COUNT"
    COUNT=$((COUNT + 1))
done

# Read a file line by line — the IDIOMATIC way
while IFS= read -r LINE; do
    echo "got: $LINE"
done < input.txt

# Loop until something succeeds (with timeout)
TIMEOUT=60
ELAPSED=0
until curl -sf http://localhost:8080/health > /dev/null; do
    if [[ $ELAPSED -ge $TIMEOUT ]]; then
        echo "Service didn't come up in $TIMEOUT seconds"
        exit 1
    fi
    sleep 1
    ELAPSED=$((ELAPSED + 1))
done
echo "Service is healthy"

Watch out: don't pipe into a while loop without subshell awareness:

Bash
COUNT=0
cat file | while read line; do
    COUNT=$((COUNT + 1))
done
echo $COUNT     # ← still 0! pipe creates a subshell

# Better: use process substitution
COUNT=0
while read line; do
    COUNT=$((COUNT + 1))
done < <(cat file)
echo $COUNT     # ← actually counted

Functions & Arguments

Functions in bash:

Bash
greet() {
    local NAME="$1"
    local GREETING="${2:-Hello}"
    echo "$GREETING, $NAME"
}

greet "Alice"             # → Hello, Alice
greet "Bob" "Hi"          # → Hi, Bob

Key points:
local keeps variables function-scoped (without it, they're global)
$1, $2, ... are positional arguments
$@ is all arguments as separate strings (use "$@" to preserve spaces)
$* is all arguments as one string
$# is the count of arguments
$0 is the script name
• Return a value via echo and capture with $(...). The return keyword sets exit status only.

Script arguments work the same:

Bash
#!/usr/bin/env bash
echo "Script: $0"
echo "First arg: $1"
echo "All args: $@"
echo "Count: $#"

Argument parsing — for serious scripts, use getopts or a library. For quick scripts:

Bash
ENVIRONMENT="dev"
DRY_RUN=false
while [[ $# -gt 0 ]]; do
    case "$1" in
        -e|--env)
            ENVIRONMENT="$2"
            shift 2
            ;;
        --dry-run)
            DRY_RUN=true
            shift
            ;;
        -h|--help)
            echo "Usage: $0 [-e env] [--dry-run]"
            exit 0
            ;;
        *)
            echo "Unknown: $1" >&2
            exit 1
            ;;
    esac
done

echo "Environment: $ENVIRONMENT, Dry run: $DRY_RUN"

Error Handling & Traps

set -euo pipefail (strict mode from earlier) catches most failures. For finer control:

Manual error handling:

Bash
some_command || {
    echo "Command failed!" >&2
    exit 1
}

# Or check explicitly
if ! some_command; then
    echo "failed" >&2
    exit 1
fi

stderr — error messages should go to stderr, not stdout:

Bash
echo "Something went wrong" >&2     # the >&2 redirects to stderr

This matters because callers may pipe stdout into another command — they want results, not error noise.

Trap — clean up on exit / signal:

Bash
TEMPDIR=$(mktemp -d)

cleanup() {
    rm -rf "$TEMPDIR"
    echo "Cleaned up $TEMPDIR"
}

trap cleanup EXIT          # runs on normal exit, error, or interrupt

# ... do work in $TEMPDIR ...
# ... if anything fails (because of set -e), trap still fires

Traps fire on signals too:

Bash
trap 'echo Interrupted; exit 130' INT TERM

This is how you write scripts that:
• Always clean up temp files
• Always release locks
• Always log "exit cleanly" messages
• Handle Ctrl-C gracefully

Real-world example combining everything:

Bash
#!/usr/bin/env bash
set -euo pipefail

# Defaults
ENV="${1:-dev}"
LOG_DIR="/var/log/myapp"
WORK_DIR=$(mktemp -d)

# Cleanup on exit
trap 'rm -rf "$WORK_DIR"' EXIT

# Helpers
log() {
    echo "[$(date +%FT%T)] $*" >&2
}

die() {
    log "FATAL: $*"
    exit 1
}

# Validate
[[ "$ENV" =~ ^(dev|staging|prod)$ ]] || die "Invalid env: $ENV"
[[ -d "$LOG_DIR" ]] || die "Log dir missing: $LOG_DIR"

# Work
log "Backing up logs from $ENV"
cd "$LOG_DIR"
tar czf "$WORK_DIR/logs-$(date +%F).tar.gz" *.log
log "Backup created"

# Upload (assume aws cli configured)
aws s3 cp "$WORK_DIR/logs-$(date +%F).tar.gz" "s3://my-backups/$ENV/" \
    || die "Upload failed"

log "Done"

When to Stop Using Bash

Bash is great for short scripts that orchestrate other commands. It's bad for:

When you hit those limits, write Python (with the same set -e discipline):

Python
#!/usr/bin/env python3
# Backup script — Python version
import argparse
import subprocess
import sys
import tempfile
from pathlib import Path
from datetime import datetime

def run(cmd):
    # Run a command, exit on failure
    result = subprocess.run(cmd, capture_output=True, text=True)
    if result.returncode:
        print(f"FAILED: {' '.join(cmd)}\n{result.stderr}", file=sys.stderr)
        sys.exit(1)
    return result.stdout

def main():
    p = argparse.ArgumentParser()
    p.add_argument('--env', default='dev', choices=['dev','staging','prod'])
    args = p.parse_args()

    with tempfile.TemporaryDirectory() as tmp:
        archive = Path(tmp) / f"logs-{datetime.now():%F}.tar.gz"
        run(['tar', 'czf', str(archive), '-C', '/var/log/myapp', '.'])
        run(['aws', 's3', 'cp', str(archive), f's3://my-backups/{args.env}/'])
    print("Done")

if __name__ == '__main__':
    main()

Same script, more readable, testable, easier to extend with new features.

Rule of thumb: write the bash version first. When you reach for "I need a hash map / I need real error handling / this is getting complex" — that's the signal to rewrite in Python.

The next lesson covers Git — version control deeper than add/commit/push. Most DevOps automation is glue code that orchestrates Git, builds, deployments, and monitoring. Understanding all those primitives unlocks everything that follows.


⁂ Back to all modules