Module 10: Kubernetes & Cloud Native Distributed Systems
How Kubernetes changes distributed-systems design — cluster architecture, service mesh, ingress, autoscaling, and the operational primitives that everything else now sits on top of.
5 hours. 3 hands-on labs. Free course module.
Learning Objectives
- Read a Kubernetes cluster architecture (control plane, kubelet, kube-proxy, etcd, CNI)
- Use Services, Ingress, and Gateway API correctly for distributed workloads
- Compare service meshes (Istio, Linkerd, Cilium) and pick one with eyes open
- Run StatefulSets, PVCs, and storage classes for stateful workloads
- Operate workloads with HPA, VPA, Karpenter, and PodDisruptionBudgets in production
Why This Matters
Kubernetes is now the default substrate for modern infrastructure; understanding its components is part of distributed-systems literacy. Engineers who can read a cluster architecture, reason about Service / Ingress / Gateway API, pick the right service mesh, and operate StatefulSets correctly are the engineers who get trusted with platform-engineering roles. The ones who treat Kubernetes as “just docker but bigger” eventually pay for it during the first multi-AZ incident.
Lesson Content
Kubernetes is the substrate that runs most modern distributed systems. It is, itself, a distributed system — with consensus (etcd / Raft), partitioning (resources scheduled across nodes), replication (pods), and observability built in. Understanding how Kubernetes is constructed is now part of distributed-systems literacy.
Cluster Architecture
The control plane consists of:
- kube-apiserver: the only component that writes to etcd; every other component talks to apiserver. Stateless; horizontally scalable.
- etcd: the source of truth for cluster state; Raft-replicated; 3 or 5 nodes.
- kube-scheduler: assigns pending pods to nodes based on resource fit, affinity, and topology.
- kube-controller-manager: runs reconciliation loops (Deployment, ReplicaSet, Node, Endpoints, etc.). Each controller leader-elects via etcd.
- cloud-controller-manager: runs cloud-specific controllers (load balancers, persistent volumes, node lifecycle).
Each node runs:
- kubelet: the node agent; pulls container images, runs containers via the container runtime, reports node and pod status to apiserver.
- kube-proxy: implements Service abstraction via iptables / IPVS rules. (Modern alternative: Cilium with no kube-proxy.)
- CNI plugin: pod networking (Cilium, Calico, AWS VPC CNI, etc.). Provides pod IPs, NetworkPolicy enforcement, often eBPF observability.
- Container runtime: containerd or CRI-O; runs the containers.
Service, Ingress, Gateway API
- Service: stable virtual IP for a set of pods; load-balances internal traffic; ClusterIP for in-cluster, LoadBalancer for cloud LB, NodePort for external on a port.
- Ingress: L7 routing for HTTP/HTTPS; needs an Ingress Controller (nginx-ingress, AWS ALB, Envoy-based, etc.). Older API; many extensions baked into annotations.
- Gateway API: the modern replacement for Ingress; richer, role-separated (Gateway/HTTPRoute/etc.), portable across implementations. The right choice for new infra.
Service Mesh
A service mesh adds a sidecar proxy (Envoy) to every pod to handle service-to-service: mTLS, retries, circuit breaking, observability, traffic shifting, authorization. Three major options:
- Istio: most feature-rich; substantial complexity. Best when you need the full toolkit.
- Linkerd: simpler, performance-focused; written in Rust; zero-config mTLS. Best for “mesh basics, fast”.
- Cilium service mesh: eBPF-based, no sidecar, integrated with the Cilium CNI. Best when you want one tool for networking + mesh.
Service meshes implement most of the resilience patterns from Module 7 (timeouts, retries, circuit breakers) for free at the data plane. The cost is operational complexity and the latency tax of every request going through a sidecar.
Stateful Workloads
StatefulSets give pods stable identities (predictable name, predictable network address) and stable storage (PersistentVolumeClaims that follow the pod). The right pattern for databases, message queues, and any workload where pod identity matters.
StorageClasses define dynamic provisioning of PersistentVolumes from cloud-provider disks (EBS, PD, Azure Disk) or storage operators (Rook/Ceph, Longhorn). Choose access mode (ReadWriteOnce / ReadWriteMany), reclaim policy (Delete / Retain), and binding mode (Immediate / WaitForFirstConsumer) deliberately.
Autoscaling, PDBs, and Operational Sanity
- HPA: scale pods on metrics. Always set
minReplicas >= 2for HA. - VPA: rightsize resource requests; clashes with HPA on the same metric.
- Cluster Autoscaler / Karpenter: scale nodes. Karpenter is the modern default on AWS.
- PodDisruptionBudget: cap the number of unavailable pods during voluntary disruption (drain, scale-down, eviction). Without PDBs, the autoscaler will happily evict every replica simultaneously.
Operational Practice
The Kubernetes operational discipline:
- Always run multi-AZ for production. Use
topologySpreadConstraintsto enforce it. - etcd: 5 nodes across 3 AZs, KMS-backed encryption at rest, tested backup/restore.
- RBAC: deny by default; explicit allow per ServiceAccount; treat
cluster-adminas root. - NetworkPolicy: default-deny per namespace; explicit allow rules.
- PodSecurity admission: restricted profile by default, exceptions audited.
Module 8 of the Cloud Native Security Engineering course covers Kubernetes hardening in depth. The Kubernetes Security Simulator exercises the misconfigurations that cause real outages.
Service Mesh Traffic Flow
Self-Check Quiz
- You have a Deployment with 3 replicas. The cluster autoscaler scales down a node. All 3 pods on that node get evicted. Why? (Answer: no PodDisruptionBudget. Define
maxUnavailable: 1so only one replica goes down at a time.) - Postgres in a Deployment vs StatefulSet — what changes? (Answer: StatefulSet gives stable pod identity (postgres-0, postgres-1) and stable PVCs that follow each pod. Required for any stateful workload.)
- Service mesh adds 1ms latency per hop. Across 5 hops you pay 5ms. When is it worth it? (Answer: when the mesh-provided features (mTLS, retries, observability, traffic shifting) are worth more than 5ms. For mature production systems, almost always.)
- You enable Istio mesh-wide STRICT mTLS on day one of rollout. What happens? (Answer: external load balancer health probes fail; non-meshed services can no longer talk to meshed services; outage. Phase: PERMISSIVE first, observe, promote namespace by namespace.)
For Kubernetes hardening, the Kubernetes Security cheatsheet is the operational reference. For service-mesh patterns the Service Mesh cheatsheet covers Istio/Linkerd/Cilium patterns — see the service mesh glossary entry for the conceptual definition. The Kubernetes cheatsheet is the day-to-day reference for kubectl operational patterns. Practice with the Kubernetes Security Simulator.
Real-World Use Cases
- Spotify runs over 1500 microservices on Kubernetes with a custom service mesh (Backstage / Apollo).
- Pinterest migrated their entire fleet to Kubernetes over 3 years; the migration was as much a culture shift as a technology one.
- Reddit runs everything on Kubernetes after a multi-year migration from EC2.
- Google's GKE Autopilot is essentially Kubernetes with the operational complexity hidden — for teams that want the API but not the infrastructure overhead.
Production Notes
- Always run multi-AZ. Use topologySpreadConstraints to enforce it; do not rely on luck.
- PodDisruptionBudgets are mandatory for production workloads. Without them, autoscalers will happily evict every replica.
- etcd: 5 nodes across 3 AZs, KMS-backed encryption at rest, tested backup/restore.
- Resource requests at p95 of actual usage; do not let dev defaults of “500m CPU” ship to prod.
Common Mistakes
- Running stateful workloads as Deployments. Use StatefulSet so PVCs follow the pod identity.
- PodSecurity admission set to “privileged” in production namespaces. Use restricted with audited exceptions.
- Ingress per service in a flat namespace. Use Gateway API with role-separated Gateway/Route for new infra.
- Setting CPU limits = requests. CPU CFS throttling kicks in even when other cores are free; latency suffers.
Security Risks to Watch
- Default-permissive RBAC and PodSecurity; restricted profile must be explicit per-namespace.
- Service mesh sidecars run as elevated workloads; an unverified mesh component is a cluster-wide attack vector.
- Public LoadBalancer Services bypass NetworkPolicy; verify origin restrictions are enforced at the LB.
- Container image pulls without signature verification accept anything from the registry. Use cosign + admission policy.
Design Tradeoffs
Service mesh: Istio
Pros
- Most feature-rich
- Strong community
- Rich traffic management
Cons
- Heavy operationally
- Steeper learning curve
Service mesh: Linkerd
Pros
- Simpler
- Faster (Rust)
- Zero-config mTLS
Cons
- Fewer advanced features
Service mesh: Cilium (eBPF)
Pros
- Sidecar-free
- Integrated with CNI
- Lower latency tax
Cons
- Newer; ecosystem still maturing
No mesh; just K8s primitives
Pros
- Less operational complexity
- Lower latency
Cons
- No automatic mTLS
- Resilience patterns in app code
Production Alternatives
- Self-managed Kubernetes (kubeadm, RKE2, k0s): Full control; full operational responsibility.
- EKS / GKE / AKS: Managed control plane; you operate the workloads.
- GKE Autopilot / EKS Auto Mode: Managed control plane AND nodes; closest to “just deploy a Pod”.
- HashiCorp Nomad: Lighter alternative; works for non-container workloads.
- Cloud-specific PaaS (Cloud Run, App Runner, Container Apps): Skip Kubernetes entirely for stateless web workloads.
Think Like an Engineer
- Read every Kubernetes manifest as a contract with the scheduler. Resource requests, anti-affinity, and PDBs are the operational levers.
- Treat Kubernetes upgrades like deployment changes — staged across canary clusters before prod.
- For every workload class (web, batch, ML), define which features (HPA, PDB, topology spread) it must use as a baseline.
Production Story
A team migrated to Kubernetes and immediately deployed their stateful Postgres as a Deployment with a single replica, no PVC, “just to get something running”. Three weeks later a node was reaped during cluster upgrade; the pod was rescheduled; Postgres started fresh on a new node with empty disk. They lost a week of customer data. The runbook never changed: stateful workloads use StatefulSet from day one, with PVCs, with backups verified weekly.
Key Terms
- Kubernetes Service
- Stable virtual IP and DNS name for a set of pods; load-balances internal traffic.
- Service mesh
- Sidecar-based infrastructure for mTLS, retries, observability between services.
- StatefulSet
- Workload controller for pods that need stable identity and storage.
- PodDisruptionBudget
- Cap on simultaneous voluntary disruptions to a workload; prevents accidental full-replica eviction.
- Karpenter
- Modern Kubernetes node autoscaler on AWS; replaces Cluster Autoscaler with faster, more flexible provisioning.
Hands-On Labs
-
Lab 10.1 — Kind Cluster from Scratch
Stand up a multi-node kind cluster, deploy a 3-tier app, expose via Ingress.
90 minutes - Beginner
- Create kind cluster with 3 worker nodes
- Install nginx-ingress
- Deploy frontend / API / DB
- Verify external access via Ingress
-
Lab 10.2 — Linkerd Service Mesh
Install Linkerd; observe automatic mTLS; verify mesh observability.
90 minutes - Intermediate
- Install Linkerd CLI and control plane
- Inject sidecars into namespace
- Verify mTLS via Linkerd Viz
- Inject failure with Toxiproxy; observe retry behaviour
-
Lab 10.3 — StatefulSet for a Database
Deploy Postgres as a StatefulSet with persistent storage; verify pod identity stability.
60 minutes - Intermediate
- Define StatefulSet with PVC template
- Deploy 3 replicas
- Kill a pod; verify PVC reattaches to the same logical pod
- Demonstrate stable network identity
Key Takeaways
- Kubernetes is itself a distributed system; understanding its components is now part of distributed-systems literacy
- Service / Ingress / Gateway API: pick Gateway API for new infra
- Service mesh is optional but powerful; pick Istio for full features, Linkerd for simplicity, Cilium for unified networking
- StatefulSets + PVCs handle stateful workloads correctly; do not run databases as Deployments
- PDBs, multi-AZ topology spread, and tested etcd backups are the operational must-haves