Terraform — Infrastructure as Code
Declarative infrastructure. Providers, state, modules — and why your cloud should be defined by code review.
Why Infrastructure as Code
Without IaC, infrastructure lives in:
• People's heads ("only Bob knows how the load balancer is configured")
• Click-ops in cloud consoles (untracked changes, no review)
• Hand-written scripts that drift over time
• Confluence pages that go out of date
The problems compound:
• Different environments diverge (production has settings nobody copied to staging)
• Changes break things mysteriously (no audit trail)
• Recovering from disaster requires reconstructing from memory
• New environments take days/weeks to set up
• Reviewing changes is impossible
Infrastructure as Code (IaC) fixes this. You describe your infrastructure in code, store it in Git, review it like application code, and apply it via automation.
The benefits:
• Every change is a Git commit — full history, rollback possible
• Code review catches mistakes before they hit production
• Same code creates dev, staging, production — no drift
• Disaster recovery: re-run the code
• New environments take minutes
• Onboarding: "read these files, that's our infrastructure"
The 2026 standard is Terraform (or its open-source fork OpenTofu). Pulumi, AWS CDK, and configuration management tools (covered in Module 22) are alternatives. This lesson is Terraform-focused because it's the most widely-used.
Terraform Mental Model
Terraform is declarative: you describe the desired state, Terraform figures out how to get there.
The core loop:
1. You write .tf files describing what you want (an EC2 instance, an S3 bucket, etc.)
2. Run terraform plan — Terraform compares your code to what currently exists in the cloud and shows the diff
3. Review the plan
4. Run terraform apply — Terraform makes the changes
Terraform tracks state — what it has created. The state file is critical: it maps your code's resources (aws_instance.web) to real cloud resources (i-1234567890abcdef0).
A trivial Terraform file:
# main.tf
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "us-east-1"
}
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
tags = {
Name = "my-web-server"
}
}
Run:
terraform init # download provider plugins
terraform plan # show what would change
terraform apply # make it so
terraform destroy # remove everything Terraform created
The init step downloads the AWS provider — code that knows how to translate Terraform resource definitions into AWS API calls. Providers exist for AWS, GCP, Azure, Kubernetes, GitHub, Datadog, hundreds more.
This means Terraform isn't AWS-specific. The same tool manages your AWS account, your DNS at Cloudflare, your GitHub repos, your Datadog monitors — all in one place.
Resources, Variables, Outputs
Terraform's core building blocks:
Resources — things you create
resource "aws_s3_bucket" "uploads" {
bucket = "my-app-uploads"
}
resource "aws_s3_bucket_public_access_block" "uploads" {
bucket = aws_s3_bucket.uploads.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
Reference between resources via <type>.<name>.<attribute>. Terraform builds a dependency graph from these references.
Variables — make code reusable
variable "environment" {
type = string
description = "Environment name (dev, staging, prod)"
}
variable "instance_type" {
type = string
default = "t3.medium"
}
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = var.instance_type
tags = {
Name = "${var.environment}-web"
Environment = var.environment
}
}
Pass values:
terraform apply -var="environment=production"
# OR via a tfvars file
terraform apply -var-file="prod.tfvars"
Outputs — expose values for humans or other tools
output "instance_public_ip" {
value = aws_instance.web.public_ip
description = "Public IP of the web server"
}
output "bucket_name" {
value = aws_s3_bucket.uploads.id
}
After apply, Terraform prints the outputs. You can also query them:
terraform output instance_public_ip
Data sources — reference existing things you didn't create
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"] # Canonical's AWS account
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
}
}
resource "aws_instance" "web" {
ami = data.aws_ami.ubuntu.id # always uses latest Ubuntu 22.04
...
}
State Management
The state file is critical. It contains:
• Mapping from your code to cloud resources
• Resource attributes (some not retrievable from APIs)
• Sensitive values (passwords, keys — yes, plain text)
Default: state in a local file (terraform.tfstate). FINE for learning, AWFUL for production:
• Two engineers running terraform apply simultaneously corrupt state
• State file lost = Terraform thinks nothing exists, tries to create everything fresh
• Plain-text secrets in a file you might commit
Production setup: REMOTE STATE with locking.
AWS S3 + DynamoDB lock:
terraform {
backend "s3" {
bucket = "my-tf-state"
key = "production/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "tf-state-lock"
encrypt = true
}
}
S3 stores state. DynamoDB enforces locking — only one apply at a time. Encryption protects secrets at rest.
GCS for GCP:
terraform {
backend "gcs" {
bucket = "my-tf-state"
prefix = "production"
}
}
Terraform Cloud (HashiCorp's hosted) — free tier handles state, locking, and run history. Often easiest for small teams.
Critical state operations:
terraform state list # all tracked resources
terraform state show aws_instance.web # details of one
terraform state rm aws_instance.web # stop tracking (doesn't delete from cloud)
terraform import aws_instance.web i-1234abcd # adopt existing resource
terraform refresh # update state from real-world (deprecated, use plan)
Importing existing infrastructure into Terraform is the path most teams take. You don't have to start fresh — terraform import lets you bring existing cloud resources under management one at a time.
Modules — Reuse and Composition
Modules are the unit of reuse in Terraform. A module is just a directory containing .tf files.
Example: a reusable VPC module.
modules/vpc/
main.tf # the actual resources
variables.tf # inputs
outputs.tf # values to expose
# modules/vpc/main.tf
resource "aws_vpc" "this" {
cidr_block = var.cidr_block
tags = {
Name = var.name
}
}
resource "aws_subnet" "private" {
count = length(var.private_subnets)
vpc_id = aws_vpc.this.id
cidr_block = var.private_subnets[count.index]
availability_zone = var.azs[count.index]
}
# ... etc
Use the module:
# main.tf
module "production_vpc" {
source = "./modules/vpc"
name = "production"
cidr_block = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnets = ["10.0.11.0/24", "10.0.12.0/24", "10.0.13.0/24"]
public_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
}
Modules compose. Your environment-specific code becomes a thin shell over reusable modules.
Public modules — the Terraform Registry has thousands. Don't reinvent the wheel:
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = "production"
cidr = "10.0.0.0/16"
# ... rest
}
The terraform-aws-modules organization on GitHub maintains battle-tested modules for VPC, EKS, RDS, ALB, etc. Use them as starting points. They handle 90% of normal use cases and you can fork for the rest.
Anti-pattern: gigantic modules that "do everything." Better: small focused modules (one for VPC, one for ECS, one for monitoring) that you compose at the top level.
Workspaces & Environments
How do you manage dev, staging, and production with the same Terraform code?
Two main approaches:
1. Separate directories per environment (recommended)
infrastructure/
modules/
vpc/
eks/
rds/
environments/
dev/
main.tf # imports modules with dev parameters
backend.tf # state in dev S3 bucket
staging/
main.tf
backend.tf
production/
main.tf
backend.tf
Each environment has its own state, its own backend. Changes to one can't accidentally affect another. Code is shared via modules.
2. Workspaces — same code, different state files:
terraform workspace new dev
terraform workspace new production
terraform workspace select production
terraform apply
Workspaces share code AND backend config — only the state file path differs. Less isolation, but quicker to set up. Best for small teams or simple cases.
Most production setups use approach 1 — separate directories with shared modules. Clearer boundaries, harder to make mistakes.
Terraform in CI/CD
Manual terraform apply from a developer's laptop is risky:
• Whose AWS credentials? Long-lived keys?
• Did they run plan first?
• Where's the audit trail?
Production Terraform runs in CI:
# .github/workflows/terraform.yml
name: Terraform
on:
pull_request:
paths: ['terraform/**']
push:
branches: [main]
paths: ['terraform/**']
jobs:
plan:
runs-on: ubuntu-latest
permissions:
id-token: write # OIDC
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123:role/tf-plan
aws-region: us-east-1
- uses: hashicorp/setup-terraform@v3
- run: terraform init
working-directory: terraform/production
- run: terraform plan -no-color
working-directory: terraform/production
# POST plan output as PR comment
apply:
needs: plan
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: production # required reviewer
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123:role/tf-apply
aws-region: us-east-1
- uses: hashicorp/setup-terraform@v3
- run: terraform init
working-directory: terraform/production
- run: terraform apply -auto-approve
working-directory: terraform/production
The pattern:
• PRs trigger plan. Plan output posted as PR comment for review.
• Merging to main triggers apply (with environment protection requiring approval).
• Plan and apply use SEPARATE IAM roles. Plan gets read-only; apply gets write.
• Atlantis or Spacelift can automate this even further.
For more advanced needs: Terraform Cloud, Spacelift, env0, Scalr — purpose-built CI for Terraform with policy enforcement, drift detection, and cost estimation.
Terraform Best Practices
Things that separate amateur Terraform from production-grade:
1. Pin provider and module versions
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.20" # accept 5.20.x updates, not 6.x
}
}
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.5.1"
}
Otherwise updates break you at the worst times.
2. Use formatters and linters
terraform fmt -recursive
terraform validate
tflint
checkov # security scanning
Run in CI. Reject PRs that aren't formatted.
3. Plan EVERY change in CI before apply
Never apply without a reviewed plan.
4. Don't store secrets in Terraform
• Use AWS Secrets Manager / GCP Secret Manager + a data source to read them
• Or use SOPS to encrypt YAML files
• Or pass secrets in via environment variables at apply time
5. Tag EVERYTHING
provider "aws" {
default_tags {
tags = {
Environment = var.environment
ManagedBy = "terraform"
Project = "myapp"
Owner = "team-platform"
}
}
}
Cost allocation, security audits, ownership tracking — all need tags.
6. Run terraform plan regularly (drift detection)
Manual changes happen ("just a quick fix in the console"). Drift detection catches them. terraform plan with no expected changes should always show "No changes."
7. Refactor with moved blocks
When you reorganize code, use moved blocks instead of destroy-and-recreate:
moved {
from = aws_instance.web
to = module.web.aws_instance.this
}
8. Limit blast radius with separate state files
A typo in a giant monolith Terraform setup can take down everything. Split by domain (networking, compute, data) so each apply affects a smaller surface.
The next lesson covers code-first IaC alternatives (Pulumi, CDK) — Terraform's declarative HCL is the standard, but sometimes you want full programming languages.
⁂ Back to all modules