Deployment Strategies in Practice
Blue-green, canary, feature flags — picking the right strategy for the risk and the team.
Why Strategy Matters
Module 7 introduced deployment strategies conceptually. This lesson is about choosing and implementing the right one for your situation.
The honest truth: most teams should start simple and add complexity only as they hit the limits of simpler approaches.
Progression for a typical team:
1. Manual deploys, occasional downtime — fine for early-stage startups
2. Rolling deploys, no downtime — fine for most established apps
3. Canary deploys with metric-based gates — when stakes are high
4. Blue-green for stateful services — when migrations are risky
5. Feature flags for fine-grained control — when you ship multiple times a day
Each step has its own complexity. Don't skip ahead unless you've earned the need.
Blue-Green Done Right
Blue-green keeps two environments. Live (Blue) serves traffic. Idle (Green) is the new version. Switch traffic from Blue to Green; if it works, you're done; if not, switch back instantly.
Before:
Internet ──► LB ──► Blue (v1, live) Green (v2, deploying)
Cutover:
Internet ──► LB ──► Blue (v1, idle) Green (v2, live)
Rollback (instant):
Internet ──► LB ──► Blue (v1, live) Green (v2, kept briefly)
How to do it on AWS:
• Two target groups behind one ALB
• Listener rules direct traffic to one or the other
• Use AWS CodeDeploy to automate the switch with health validation
• Or use weighted routing in Route 53 for DNS-level cutover
Pros:
• Instant rollback (just flip)
• Validate Green fully before traffic flips
• No partial-state issues
Cons:
• Doubles infrastructure cost during deploys
• Database migrations are tricky (Blue and Green can't both run different schemas)
• Stateful sessions may not survive cutover
Best for: critical workloads where instant rollback is essential.
Canary Deploys with Metrics
Canary sends a small percentage of traffic to the new version, gradually increasing if metrics stay healthy.
The flow:
1. Deploy v2 alongside v1
2. Send 1% of traffic to v2
3. Monitor: error rate, latency, business metrics
4. Healthy after 5 minutes → 10% to v2
5. Healthy after 5 minutes → 50% to v2
6. Healthy after 5 minutes → 100% to v2
7. Unhealthy at any step → rollback to 100% v1
Tools that automate this:
• Argo Rollouts (Kubernetes) — declarative canary with metric analysis
• Flagger (Kubernetes) — same idea, integrates with service meshes
• AWS CodeDeploy — for ECS, Lambda
Argo Rollouts example with Prometheus analysis:
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: myapp
spec:
strategy:
canary:
steps:
- setWeight: 10
- pause: { duration: 5m }
- setWeight: 50
- pause: { duration: 10m }
- setWeight: 100
analysis:
templates:
- templateName: success-rate
startingStep: 1
If success rate drops below threshold, the rollout halts and rolls back automatically. Engineers wake up to "canary failed at 10%, rolled back" — not "production is down."
Best for: high-traffic services, anything user-facing where bugs have real cost.
Feature Flags — Decoupling Deploy from Release
Feature flags ship the code but hide the feature behind a runtime toggle.
if feature_flag("new_checkout_flow", user_id=user.id):
return new_checkout(cart)
else:
return legacy_checkout(cart)
The toggle can be:
• ON for everyone
• OFF for everyone
• ON for specific users (early access, beta testers)
• ON for X% of users (percentage rollout)
• ON in specific environments
• ON for users matching criteria (paid plan, country)
Powers this gives you:
• Deploy code Friday afternoon, enable feature Monday morning
• Roll out to 1% → 10% → 100% gradually
• Kill a bad feature instantly without redeploy
• A/B test variants
• Per-customer rollouts
Tools:
• LaunchDarkly — most popular commercial
• Flagsmith, Unleash, Split — alternatives
• OpenFeature — open standard
• Self-built — for simple cases
Watch out for:
• Flag debt — flags accumulate. Each flag = code complexity. Remove old flags ruthlessly.
• Performance — every check is a function call (or worse, a network call). Use cached SDK.
• Testing — every combination of flags multiplies test scenarios.
Mature pattern:
1. Add a flag for any user-visible change
2. Deploy with flag OFF
3. Test in production with flag ON for internal users
4. Roll out gradually (1% → 10% → 50% → 100%)
5. Once stable for a week, remove the flag and the old code path
Database Migrations Without Downtime
The hardest deployments involve database changes. Apps and schemas have to coexist during the rollout.
The expand-contract pattern (mentioned in Module 7, expanded here):
To rename email to email_address:
Phase 1: Expand
ALTER TABLE users ADD COLUMN email_address VARCHAR(255);
UPDATE users SET email_address = email;
Both columns exist. App still uses email.
Phase 2: Dual-write
Deploy code that WRITES to both columns, READS from email. Old app instances still work.
Phase 3: Read from new
Deploy code that READS from email_address. WRITES still go to both.
Phase 4: Stop writing old
Deploy code that only writes to email_address. Old column ignored.
Phase 5: Contract
ALTER TABLE users DROP COLUMN email;
This is multi-week work for one column rename. But it's zero-downtime and rollback-safe at every phase.
Other tricky migrations:
• Adding a NOT NULL column → first add nullable, backfill, then add the constraint
• Dropping a column → first stop reading, then stop writing, then drop
• Splitting a table → dual-write, gradual migration
Tools:
• Migration runners (Flyway, Liquibase, framework-native ones)
• Online schema change tools (gh-ost, pt-online-schema-change for MySQL)
The discipline: NEVER write a migration that requires a specific app version to be running. Apps and schemas evolve independently.
Environment Strategy
How many environments? What does each do?
Common setups:
Three environments (most teams):
• development — engineers' personal sandboxes, possibly local + cloud
• staging / pre-prod — production-like, integration testing, demos
• production — the real thing
Sometimes useful:
• QA — manual testing
• UAT — user acceptance testing
• Performance — load testing
• DR — disaster recovery (warm standby)
Per-PR ephemeral environments — deploy each PR to its own environment for review. Vercel and Netlify do this for frontend; Render, Railway have native support; Custom: Terraform + GitHub Actions.
For staging to be useful, it must MIRROR production:
• Same architecture
• Similar (anonymized) data
• Same monitoring
• Same deploy process
Anti-pattern: "staging works fine, but production has different behavior." Either staging isn't production-like enough OR production has manual changes that aren't reflected anywhere. Either way, fix it.
Avoid letting staging rot. Treat it as production for a smaller user base. If staging is broken for a week and nobody notices, it's not actually testing anything.
⁂ Back to all modules