Cloud Networking
VPCs, peering, transit gateways, private connectivity — the network plumbing that ties cloud workloads together.
VPCs Revisited — The Cloud Boundary
We covered VPCs in Module 3 from a Linux/networking perspective. Here we go deeper from a cloud architecture perspective.
A VPC is your private network in the cloud. Inside it, you control:
• IP ranges (CIDR blocks)
• Subnets (slices in different AZs)
• Routing (which traffic goes where)
• Internet access (or not)
• Connections to other networks (other VPCs, on-premises, other clouds)
The 2026 best-practice VPC layout:
Production VPC (10.0.0.0/16)
─────────────────────────────────────────────
AZ-a AZ-b AZ-c
Public subnets: 10.0.1.0/24 10.0.2.0/24 10.0.3.0/24
(LBs, NAT, bastions)
Private app: 10.0.11.0/24 10.0.12.0/24 10.0.13.0/24
(your apps)
Private data: 10.0.21.0/24 10.0.22.0/24 10.0.23.0/24
(RDS, ElastiCache)
Three subnet tiers per AZ. Public for internet-facing things. Private app for your code. Private data for state. Apps reach the internet outbound through a NAT in the public subnet.
Connecting Multiple VPCs
Real organizations have many VPCs:
• Per-environment (production, staging, dev)
• Per-team (data team's VPC, platform team's VPC)
• Per-account in multi-account setups
• Shared services VPCs (centralized DNS, monitoring)
Three primary connectivity options:
VPC Peering — direct one-to-one connection
VPC A ◄──── peering ────► VPC B
Pros: simple, low latency, no extra cost (just data transfer).
Cons: doesn't scale. N VPCs need N(N-1)/2 connections (full mesh). Non-transitive: A↔B and B↔C does NOT mean A↔C.
Transit Gateway (AWS) / Network Connectivity Center (GCP) — hub-and-spoke
Transit Gateway
│
┌───────────┼───────────┐
│ │ │
VPC A VPC B VPC C
Pros: scales to hundreds of VPCs. Each VPC connects once. Routing is flexible. Connects on-prem too.
Cons: more expensive than peering ($/hour + per-GB charges in AWS).
VPC Lattice / Cloud Service Mesh — application-layer connectivity
Newer approach: connect SERVICES not networks. Define which apps can call which apps regardless of network topology. Most useful at large scale.
For most small-to-medium setups: VPC Peering. For enterprise scale: Transit Gateway.
Private Connectivity to Cloud Services
When your private subnet calls AWS S3, GCP Cloud Storage, etc., the traffic goes:
1. Out through your NAT
2. To the cloud provider's public API
3. (Authenticated, encrypted, but going over the public internet)
4. Costs you NAT data processing fees + egress
Better: keep traffic on the cloud's internal network with private endpoints.
AWS VPC Endpoints:
• Gateway endpoints — for S3 and DynamoDB. Free. Add a route to your route table.
• Interface endpoints (PrivateLink) — for most other services. ENI in your subnet, you call it like the service.
# S3 gateway endpoint — saves NAT costs and improves latency
aws ec2 create-vpc-endpoint \
--vpc-id vpc-12345 \
--service-name com.amazonaws.us-east-1.s3 \
--route-table-ids rtb-12345
GCP equivalent: Private Google Access (for Google APIs) and Private Service Connect (for general).
For talking to OTHER organizations' services privately:
• AWS PrivateLink — the other org publishes their service via PrivateLink; you connect from your VPC.
• GCP Private Service Connect — same concept.
Common use case: Snowflake, Datadog, MongoDB Atlas all support PrivateLink. You can use them without exposing your data to the public internet.
Hybrid Connectivity — Cloud to On-Prem
Many organizations have on-premises infrastructure they need to connect to cloud:
• Existing datacenters
• Office networks
• Edge devices
• Hardware that can't be virtualized (specialized servers)
Two main options:
Site-to-Site VPN
• Encrypted tunnel over the public internet
• Cheap (~$50/month base, plus data transfer)
• Easy to set up
• Limited bandwidth (typically 1.25 Gbps per tunnel in AWS)
• Latency depends on internet routing
• Good for: small offices, dev/test environments, low-bandwidth needs
Direct Connect (AWS) / Cloud Interconnect (GCP)
• Dedicated physical fiber from your datacenter to the cloud
• Predictable bandwidth (1, 10, 100 Gbps)
• Lower, more consistent latency
• More expensive ($300+/month port + per-port-hour + data transfer)
• Setup involves a cross-connect at a colocation facility
• Good for: production, high-bandwidth needs, regulated industries
Real-world hybrid pattern:
• Production traffic over Direct Connect (consistency, security)
• Backup tunnel via Site-to-Site VPN (failover)
• Both terminate in a Transit Gateway in the cloud
• Routes propagate via BGP between sites
DNS in Cloud
DNS is critical infrastructure. Cloud providers offer their own DNS services:
AWS Route 53:
• Public DNS — for your domain (example.com → IP)
• Private DNS — internal names within a VPC (db.internal → 10.0.0.5)
• Multi-region failover, weighted routing, latency-based routing
• ALIAS records for AWS resources (point apex domain to ALB without CNAME)
GCP Cloud DNS:
• Similar capabilities
• Managed zones for both public and private
• Integrates with Cloud Load Balancing for global anycast
Patterns:
• Use the cloud's DNS for cloud resources. The integrations (alias records, automatic updates) save real time.
• For your registrar, use whatever you like (Namecheap, Cloudflare, AWS Route 53). Cloudflare is popular as registrar + DNS + CDN bundle.
• Set DNS records via Terraform. Manual DNS changes during incidents are how outages drag on.
Critical DNS gotcha in cloud: TTL during migrations. We covered this in Module 3. Plan migrations: lower TTL days before, do the cutover, raise TTL after.
Load Balancing Patterns
We covered load balancers in Module 3. A few patterns specific to cloud:
Global load balancing — one IP, traffic routes to nearest region
• AWS: CloudFront + multiple regional ALBs OR Global Accelerator
• GCP: Global HTTP(S) Load Balancer (genuinely global anycast IP — one of GCP's strengths)
• Cloudflare: built-in (CF runs at the edge naturally)
Multi-tier load balancing:
Internet
│
▼
CDN (CloudFront / Cloudflare)
│ caches static, forwards dynamic
▼
Regional load balancer (ALB)
│ L7 routing by path/host
▼
Internal services
CDN in front of LB is standard. CDN handles:
• Caching of static assets (massive cost savings)
• DDoS protection
• TLS termination at edge
• Geographic distribution
The LB handles:
• Application routing
• Backend health
• Session affinity if needed
Critical: configure the CDN to forward the X-Forwarded-For header so your app sees real client IPs, not CDN IPs.
Cost-Conscious Networking
Cloud networking is where surprise bills come from. The egregious offenders:
1. Egress (data leaving the cloud)
• $0.05-$0.09/GB depending on volume and destination
• A 1 TB monthly egress = $50-$90/month
• 100 TB = $5,000-$9,000
2. NAT Gateway data processing
• $0.045/GB of data processed (AWS), on top of egress
• Apps that talk a lot to S3 from private subnets pay double-fees
• Solution: VPC Gateway Endpoints (free for S3 and DynamoDB)
3. Cross-AZ traffic
• $0.01/GB in each direction within AWS
• Often invisible — your microservices chatting between AZs adds up
• Some teams go single-AZ for non-critical workloads to cut costs
4. Cross-region traffic
• $0.02-$0.09/GB depending on regions
• Multi-region setups can cost more in egress than compute
5. Idle resources
• NAT Gateway: $0.045/hour = ~$32/month each, even idle
• Elastic IPs not attached to a running instance: $0.005/hour = ~$3.65/month each
• Load balancers without backends: still charged hourly
Cost-cutting moves:
• CloudFront in front of S3 → cuts egress in half (CDN egress is cheaper)
• VPC endpoints for AWS services → saves NAT charges
• Keep chatty services in the same AZ
• Monitor with AWS Cost Explorer / GCP Billing reports
• Tag everything for cost allocation (covered in FinOps lesson later)
Networking cost optimization is one of the highest-leverage activities a cloud engineer can do. A few hours of analysis + simple changes can save thousands of dollars per month at moderate scale.
⁂ Back to all modules