Lesson 01 • Foundation • 12 min read • Updated May 07, 2026

What is DevOps, Really?

Beyond the buzzwords. The cultural shift that turned operations from a separate team into shared responsibility.

Before DevOps Existed

To understand DevOps, you have to understand the world it replaced.

For most of software's history, building software and running software were separate jobs done by separate teams who often barely talked.

Developers wrote code, tested it on their laptops, and at some point handed off a build to operations: "ship it." Operations engineers — also called sysadmins or infrastructure engineers — took that build and figured out how to deploy and run it on production servers, monitor it, and keep it alive at 3 AM when something broke.

This sounds clean. In practice it was anything but.

The classic failure mode was the wall of confusion: developers pushing changes that worked on their laptops, ops trying to deploy them and finding they fail in production for environmental reasons (different OS version, missing dependencies, network differences). Each side blamed the other.

Devs would say: "It works on my machine."
Ops would say: "Don't deploy on Friday."

Releases were rare and terrifying. A release window meant the entire team blocking time on a Friday night, with rollback plans, war rooms, and a lot of coffee. Big-bang releases meant big-bang failures.

The two roles also had opposing incentives:

Devs wanted to ship fast — that's how they showed value
Ops wanted stability — that's how they avoided pages

The tension wasn't anyone's fault. It was structural. The system was designed to fail.

Then, around 2008–2010, a few people started arguing this wall shouldn't exist.

The Cultural Shift

DevOps as a movement was named at a 2009 Belgian conference (DevOpsDays). The core idea was simple: developers and operations should be one team with shared responsibility for the entire lifecycle of software.

That meant:

Developers stay involved after deployment. They're paged when their service breaks. They write monitoring. They care about uptime.
Operations engineers participate in design. They review code that affects deployability. They build tools developers use, not gates developers go through.
The same team writes the code AND runs it. "You build it, you run it" became the mantra (originally from Werner Vogels at Amazon).

This is genuinely a cultural change, not a technology change. You can adopt every DevOps tool ever invented and still not have DevOps if your dev and ops teams don't trust each other.

The mindset shift in plain language:

Text

   OLD WORLD                         NEW WORLD (DevOps)
   ─────────                         ─────────────────
   Dev throws code over wall         Dev and ops share the work
   Ops deploys (carefully)           Anyone can deploy (safely)
   Releases are events               Releases are routine
   Failures = blame                  Failures = learning
   Manual processes                  Automated pipelines
   Pet servers (named, special)      Cattle servers (interchangeable)
   "Don't touch it, it works"        "Rebuild it from code"

The famous "pets vs cattle" analogy: in the old world, your servers had names like db-prod-01.example.com and were lovingly hand-tuned. Losing one was a tragedy. In the new world, your servers are interchangeable units; if one dies, you spin up another from the same template and don't care which physical machine it lands on.

This is why DevOps is bound up with cloud computing — clouds make cattle-style infrastructure trivially cheap to provision.

The Practices That Define DevOps

Over time, a set of practices emerged as the practical implementation of DevOps culture. You'll see all of these covered in detail in later lessons:

1. Version control for everything — code, infrastructure config, deployment scripts. If it's not in Git, it doesn't exist.

2. Continuous Integration (CI) — every code change is automatically built, tested, and validated against the main branch. Bugs are caught in minutes, not weeks.

3. Continuous Delivery / Deployment (CD) — every passing build is automatically (or one-click) deployable to production. No more release windows.

4. Infrastructure as Code (IaC) — servers, networks, databases are described in text files (Terraform, CloudFormation) and provisioned by automation. No more clicking around in the AWS console.

5. Configuration management — server configurations are codified (Ansible, Chef) so any server can be rebuilt identically.

6. Containerization — applications are packaged with their dependencies (Docker) so "works on my machine" becomes "works everywhere".

7. Orchestration — containers are managed at scale (Kubernetes) with automated scheduling, scaling, and recovery.

8. Monitoring & observability — every service emits metrics, logs, and traces. You know what's happening BEFORE customers tell you.

9. Automated testing — unit, integration, end-to-end, security, performance — all running in CI.

10. Blameless postmortems — when something fails, the team analyzes the system that allowed it, not the human who pressed enter.

These practices reinforce each other. CI is pointless without version control. CD is reckless without monitoring. IaC is dangerous without testing. They form an interconnected web — adopting them piecemeal gives you partial benefits, but the real value compounds when they all work together.

DevOps vs SRE vs Platform Engineering

A few terms get used interchangeably, but they're distinct:

DevOps — the cultural and practical movement. Not a job title (despite the proliferation of "DevOps Engineer" job postings). It's a way of working.

SRE (Site Reliability Engineering) — Google's specific implementation of DevOps, formalized in their famous book. SRE puts engineering rigor on operations: error budgets, SLIs, SLOs, blameless postmortems, runbooks. SRE is more prescriptive than DevOps.

Platform Engineering — a more recent evolution. The platform team builds the internal tools and abstractions that make it easy for product teams to ship. Instead of every team rolling their own CI/CD, K8s configs, and monitoring, the platform team provides a "golden path" that handles the boring 90%.

A practical view of the progression:

Text

   1. Old world: Dev → Ops handoff (broken)
   2. DevOps:    Dev + Ops merge into one team (better)
   3. SRE:       Ops becomes engineering discipline (rigorous)
   4. Platform:  Ops becomes a product for devs (scaled)

In a small startup, you wear all these hats. In a 50-person company, you might have a "DevOps Engineer" who builds your CI/CD and infrastructure. In a 500-person company, you might have a Platform Engineering team that builds an internal developer platform. In a 5,000-person company like Google, you have specialized SRE teams paired with each major service.

What this series teaches: the foundations and practices that apply at every scale. By the end you'll be able to set up a complete CI/CD pipeline, deploy to AWS or GCP, manage infrastructure as code, monitor production systems, and respond to incidents — whether you're building this for a side project or running it inside a Fortune 500.

What Success Looks Like

When DevOps works, here's what you see:

Deploys are boring. They happen many times a day. Nobody schedules them. Nobody fears them.
Recovery from failures is fast. The MTTR (mean time to recovery) is minutes, not hours.
Engineers feel ownership. They know what their service does in production. They're proud when it runs well.
Operational knowledge is shared. There's no single person who has to be paged because they're the only one who knows how something works.
Infrastructure changes through code review. You don't get a Slack message at 2 AM because someone was poking around in the console.
Postmortems lead to systemic improvements. The same outage doesn't happen twice.

The DORA research (DevOps Research and Assessment, now part of Google) measures four key metrics that distinguish high-performing teams:

Text

1. Deployment Frequency
   Low performers:    Once per month
   Elite performers:  Multiple times per day

2. Lead Time for Changes (commit → production)
   Low performers:    1-6 months
   Elite performers:  Less than 1 hour

3. Mean Time to Recovery (MTTR)
   Low performers:    1 week to 1 month
   Elite performers:  Less than 1 hour

4. Change Failure Rate
   Low performers:    46-60% of changes cause incidents
   Elite performers:  0-15%

Notice elite performers ship MORE often AND fail LESS often. That's the lie at the heart of the old "stability vs speed" trade-off — done right, more frequent deploys mean smaller changes mean fewer bugs mean more stability.

This series is about building the practices that get you toward those numbers. We'll start from the bottom — the Linux fundamentals every DevOps engineer needs — and work all the way up to platform engineering and SRE practices.

The journey is long. The reward is engineering work that actually feels good — where shipping software is calm, predictable, and even fun.

⁂ Back to all modules

What is DevOps, Really?

Before DevOps Existed

The Cultural Shift

The Practices That Define DevOps

DevOps vs SRE vs Platform Engineering

What Success Looks Like

Continue reading

Linux Fundamentals for DevOps

Networking Essentials

Shell Scripting & Automation