Home
DevOps & Cloud Engineering / Lesson 20 — Terraform — Infrastructure as Code

Terraform — Infrastructure as Code

Declarative infrastructure. Providers, state, modules — and why your cloud should be defined by code review.


Why Infrastructure as Code

Without IaC, infrastructure lives in:
• People's heads ("only Bob knows how the load balancer is configured")
• Click-ops in cloud consoles (untracked changes, no review)
• Hand-written scripts that drift over time
• Confluence pages that go out of date

The problems compound:
• Different environments diverge (production has settings nobody copied to staging)
• Changes break things mysteriously (no audit trail)
• Recovering from disaster requires reconstructing from memory
• New environments take days/weeks to set up
• Reviewing changes is impossible

Infrastructure as Code (IaC) fixes this. You describe your infrastructure in code, store it in Git, review it like application code, and apply it via automation.

The benefits:
• Every change is a Git commit — full history, rollback possible
• Code review catches mistakes before they hit production
• Same code creates dev, staging, production — no drift
• Disaster recovery: re-run the code
• New environments take minutes
• Onboarding: "read these files, that's our infrastructure"

The 2026 standard is Terraform (or its open-source fork OpenTofu). Pulumi, AWS CDK, and configuration management tools (covered in Module 22) are alternatives. This lesson is Terraform-focused because it's the most widely-used.


Terraform Mental Model

Terraform is declarative: you describe the desired state, Terraform figures out how to get there.

The core loop:
1. You write .tf files describing what you want (an EC2 instance, an S3 bucket, etc.)
2. Run terraform plan — Terraform compares your code to what currently exists in the cloud and shows the diff
3. Review the plan
4. Run terraform apply — Terraform makes the changes

Terraform tracks state — what it has created. The state file is critical: it maps your code's resources (aws_instance.web) to real cloud resources (i-1234567890abcdef0).

A trivial Terraform file:

Hcl
# main.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"

  tags = {
    Name = "my-web-server"
  }
}

Run:

Bash
terraform init      # download provider plugins
terraform plan      # show what would change
terraform apply     # make it so
terraform destroy   # remove everything Terraform created

The init step downloads the AWS provider — code that knows how to translate Terraform resource definitions into AWS API calls. Providers exist for AWS, GCP, Azure, Kubernetes, GitHub, Datadog, hundreds more.

This means Terraform isn't AWS-specific. The same tool manages your AWS account, your DNS at Cloudflare, your GitHub repos, your Datadog monitors — all in one place.


Resources, Variables, Outputs

Terraform's core building blocks:

Resources — things you create

Hcl
resource "aws_s3_bucket" "uploads" {
  bucket = "my-app-uploads"
}

resource "aws_s3_bucket_public_access_block" "uploads" {
  bucket = aws_s3_bucket.uploads.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

Reference between resources via <type>.<name>.<attribute>. Terraform builds a dependency graph from these references.

Variables — make code reusable

Hcl
variable "environment" {
  type        = string
  description = "Environment name (dev, staging, prod)"
}

variable "instance_type" {
  type    = string
  default = "t3.medium"
}

resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = var.instance_type
  tags = {
    Name        = "${var.environment}-web"
    Environment = var.environment
  }
}

Pass values:

Bash
terraform apply -var="environment=production"
# OR via a tfvars file
terraform apply -var-file="prod.tfvars"

Outputs — expose values for humans or other tools

Hcl
output "instance_public_ip" {
  value       = aws_instance.web.public_ip
  description = "Public IP of the web server"
}

output "bucket_name" {
  value = aws_s3_bucket.uploads.id
}

After apply, Terraform prints the outputs. You can also query them:

Bash
terraform output instance_public_ip

Data sources — reference existing things you didn't create

Hcl
data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"]   # Canonical's AWS account
  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
  }
}

resource "aws_instance" "web" {
  ami = data.aws_ami.ubuntu.id     # always uses latest Ubuntu 22.04
  ...
}

State Management

The state file is critical. It contains:
• Mapping from your code to cloud resources
• Resource attributes (some not retrievable from APIs)
• Sensitive values (passwords, keys — yes, plain text)

Default: state in a local file (terraform.tfstate). FINE for learning, AWFUL for production:
• Two engineers running terraform apply simultaneously corrupt state
• State file lost = Terraform thinks nothing exists, tries to create everything fresh
• Plain-text secrets in a file you might commit

Production setup: REMOTE STATE with locking.

AWS S3 + DynamoDB lock:

Hcl
terraform {
  backend "s3" {
    bucket         = "my-tf-state"
    key            = "production/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "tf-state-lock"
    encrypt        = true
  }
}

S3 stores state. DynamoDB enforces locking — only one apply at a time. Encryption protects secrets at rest.

GCS for GCP:

Hcl
terraform {
  backend "gcs" {
    bucket = "my-tf-state"
    prefix = "production"
  }
}

Terraform Cloud (HashiCorp's hosted) — free tier handles state, locking, and run history. Often easiest for small teams.

Critical state operations:

Bash
terraform state list                # all tracked resources
terraform state show aws_instance.web    # details of one
terraform state rm aws_instance.web      # stop tracking (doesn't delete from cloud)
terraform import aws_instance.web i-1234abcd   # adopt existing resource
terraform refresh                   # update state from real-world (deprecated, use plan)

Importing existing infrastructure into Terraform is the path most teams take. You don't have to start fresh — terraform import lets you bring existing cloud resources under management one at a time.


Modules — Reuse and Composition

Modules are the unit of reuse in Terraform. A module is just a directory containing .tf files.

Example: a reusable VPC module.

Text
modules/vpc/
  main.tf       # the actual resources
  variables.tf  # inputs
  outputs.tf    # values to expose
Hcl
# modules/vpc/main.tf
resource "aws_vpc" "this" {
  cidr_block = var.cidr_block
  tags = {
    Name = var.name
  }
}

resource "aws_subnet" "private" {
  count = length(var.private_subnets)
  
  vpc_id            = aws_vpc.this.id
  cidr_block        = var.private_subnets[count.index]
  availability_zone = var.azs[count.index]
}

# ... etc

Use the module:

Hcl
# main.tf
module "production_vpc" {
  source          = "./modules/vpc"
  name            = "production"
  cidr_block      = "10.0.0.0/16"
  azs             = ["us-east-1a", "us-east-1b", "us-east-1c"]
  private_subnets = ["10.0.11.0/24", "10.0.12.0/24", "10.0.13.0/24"]
  public_subnets  = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
}

Modules compose. Your environment-specific code becomes a thin shell over reusable modules.

Public modules — the Terraform Registry has thousands. Don't reinvent the wheel:

Hcl
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"
  
  name = "production"
  cidr = "10.0.0.0/16"
  # ... rest
}

The terraform-aws-modules organization on GitHub maintains battle-tested modules for VPC, EKS, RDS, ALB, etc. Use them as starting points. They handle 90% of normal use cases and you can fork for the rest.

Anti-pattern: gigantic modules that "do everything." Better: small focused modules (one for VPC, one for ECS, one for monitoring) that you compose at the top level.


Workspaces & Environments

How do you manage dev, staging, and production with the same Terraform code?

Two main approaches:

1. Separate directories per environment (recommended)

Text
infrastructure/
  modules/
    vpc/
    eks/
    rds/
  environments/
    dev/
      main.tf      # imports modules with dev parameters
      backend.tf   # state in dev S3 bucket
    staging/
      main.tf
      backend.tf
    production/
      main.tf
      backend.tf

Each environment has its own state, its own backend. Changes to one can't accidentally affect another. Code is shared via modules.

2. Workspaces — same code, different state files:

Bash
terraform workspace new dev
terraform workspace new production
terraform workspace select production
terraform apply

Workspaces share code AND backend config — only the state file path differs. Less isolation, but quicker to set up. Best for small teams or simple cases.

Most production setups use approach 1 — separate directories with shared modules. Clearer boundaries, harder to make mistakes.


Terraform in CI/CD

Manual terraform apply from a developer's laptop is risky:
• Whose AWS credentials? Long-lived keys?
• Did they run plan first?
• Where's the audit trail?

Production Terraform runs in CI:

YAML
# .github/workflows/terraform.yml
name: Terraform

on:
  pull_request:
    paths: ['terraform/**']
  push:
    branches: [main]
    paths: ['terraform/**']

jobs:
  plan:
    runs-on: ubuntu-latest
    permissions:
      id-token: write     # OIDC
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123:role/tf-plan
          aws-region: us-east-1
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
        working-directory: terraform/production
      - run: terraform plan -no-color
        working-directory: terraform/production
        # POST plan output as PR comment

  apply:
    needs: plan
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: production    # required reviewer
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123:role/tf-apply
          aws-region: us-east-1
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
        working-directory: terraform/production
      - run: terraform apply -auto-approve
        working-directory: terraform/production

The pattern:
• PRs trigger plan. Plan output posted as PR comment for review.
• Merging to main triggers apply (with environment protection requiring approval).
• Plan and apply use SEPARATE IAM roles. Plan gets read-only; apply gets write.
• Atlantis or Spacelift can automate this even further.

For more advanced needs: Terraform Cloud, Spacelift, env0, Scalr — purpose-built CI for Terraform with policy enforcement, drift detection, and cost estimation.


Terraform Best Practices

Things that separate amateur Terraform from production-grade:

1. Pin provider and module versions

Hcl
required_providers {
  aws = {
    source  = "hashicorp/aws"
    version = "~> 5.20"   # accept 5.20.x updates, not 6.x
  }
}

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.5.1"
}

Otherwise updates break you at the worst times.

2. Use formatters and linters

Bash
terraform fmt -recursive
terraform validate
tflint
checkov         # security scanning

Run in CI. Reject PRs that aren't formatted.

3. Plan EVERY change in CI before apply
Never apply without a reviewed plan.

4. Don't store secrets in Terraform
• Use AWS Secrets Manager / GCP Secret Manager + a data source to read them
• Or use SOPS to encrypt YAML files
• Or pass secrets in via environment variables at apply time

5. Tag EVERYTHING

Hcl
provider "aws" {
  default_tags {
    tags = {
      Environment = var.environment
      ManagedBy   = "terraform"
      Project     = "myapp"
      Owner       = "team-platform"
    }
  }
}

Cost allocation, security audits, ownership tracking — all need tags.

6. Run terraform plan regularly (drift detection)
Manual changes happen ("just a quick fix in the console"). Drift detection catches them. terraform plan with no expected changes should always show "No changes."

7. Refactor with moved blocks
When you reorganize code, use moved blocks instead of destroy-and-recreate:

Hcl
moved {
  from = aws_instance.web
  to   = module.web.aws_instance.this
}

8. Limit blast radius with separate state files
A typo in a giant monolith Terraform setup can take down everything. Split by domain (networking, compute, data) so each apply affects a smaller surface.

The next lesson covers code-first IaC alternatives (Pulumi, CDK) — Terraform's declarative HCL is the standard, but sometimes you want full programming languages.


⁂ Back to all modules