Skip to main content

Module 10: Kubernetes & Cloud Native Distributed Systems

How Kubernetes changes distributed-systems design — cluster architecture, service mesh, ingress, autoscaling, and the operational primitives that everything else now sits on top of.

5 hours. 3 hands-on labs. Free course module.

Learning Objectives

  • Read a Kubernetes cluster architecture (control plane, kubelet, kube-proxy, etcd, CNI)
  • Use Services, Ingress, and Gateway API correctly for distributed workloads
  • Compare service meshes (Istio, Linkerd, Cilium) and pick one with eyes open
  • Run StatefulSets, PVCs, and storage classes for stateful workloads
  • Operate workloads with HPA, VPA, Karpenter, and PodDisruptionBudgets in production

Why This Matters

Kubernetes is now the default substrate for modern infrastructure; understanding its components is part of distributed-systems literacy. Engineers who can read a cluster architecture, reason about Service / Ingress / Gateway API, pick the right service mesh, and operate StatefulSets correctly are the engineers who get trusted with platform-engineering roles. The ones who treat Kubernetes as “just docker but bigger” eventually pay for it during the first multi-AZ incident.

KUBERNETES CLUSTER ARCHITECTURECONTROL PLANEapiserverschedulerctrl-mgrcloud-ctrletcd (Raft)Node 1kubeletkube-proxy / CNIpod: app-Apod: app-BNode 2kubeletkube-proxy / CNIpod: app-Apod: app-CNode 3kubeletkube-proxy / CNIpod: app-Bpod: app-C
Architecture diagram for Module 10: Kubernetes & Cloud Native Distributed Systems.

Lesson Content

Kubernetes is the substrate that runs most modern distributed systems. It is, itself, a distributed system — with consensus (etcd / Raft), partitioning (resources scheduled across nodes), replication (pods), and observability built in. Understanding how Kubernetes is constructed is now part of distributed-systems literacy.

Cluster Architecture

The control plane consists of:

  • kube-apiserver: the only component that writes to etcd; every other component talks to apiserver. Stateless; horizontally scalable.
  • etcd: the source of truth for cluster state; Raft-replicated; 3 or 5 nodes.
  • kube-scheduler: assigns pending pods to nodes based on resource fit, affinity, and topology.
  • kube-controller-manager: runs reconciliation loops (Deployment, ReplicaSet, Node, Endpoints, etc.). Each controller leader-elects via etcd.
  • cloud-controller-manager: runs cloud-specific controllers (load balancers, persistent volumes, node lifecycle).

Each node runs:

  • kubelet: the node agent; pulls container images, runs containers via the container runtime, reports node and pod status to apiserver.
  • kube-proxy: implements Service abstraction via iptables / IPVS rules. (Modern alternative: Cilium with no kube-proxy.)
  • CNI plugin: pod networking (Cilium, Calico, AWS VPC CNI, etc.). Provides pod IPs, NetworkPolicy enforcement, often eBPF observability.
  • Container runtime: containerd or CRI-O; runs the containers.

Service, Ingress, Gateway API

  • Service: stable virtual IP for a set of pods; load-balances internal traffic; ClusterIP for in-cluster, LoadBalancer for cloud LB, NodePort for external on a port.
  • Ingress: L7 routing for HTTP/HTTPS; needs an Ingress Controller (nginx-ingress, AWS ALB, Envoy-based, etc.). Older API; many extensions baked into annotations.
  • Gateway API: the modern replacement for Ingress; richer, role-separated (Gateway/HTTPRoute/etc.), portable across implementations. The right choice for new infra.

Service Mesh

A service mesh adds a sidecar proxy (Envoy) to every pod to handle service-to-service: mTLS, retries, circuit breaking, observability, traffic shifting, authorization. Three major options:

  • Istio: most feature-rich; substantial complexity. Best when you need the full toolkit.
  • Linkerd: simpler, performance-focused; written in Rust; zero-config mTLS. Best for “mesh basics, fast”.
  • Cilium service mesh: eBPF-based, no sidecar, integrated with the Cilium CNI. Best when you want one tool for networking + mesh.

Service meshes implement most of the resilience patterns from Module 7 (timeouts, retries, circuit breakers) for free at the data plane. The cost is operational complexity and the latency tax of every request going through a sidecar.

Stateful Workloads

StatefulSets give pods stable identities (predictable name, predictable network address) and stable storage (PersistentVolumeClaims that follow the pod). The right pattern for databases, message queues, and any workload where pod identity matters.

StorageClasses define dynamic provisioning of PersistentVolumes from cloud-provider disks (EBS, PD, Azure Disk) or storage operators (Rook/Ceph, Longhorn). Choose access mode (ReadWriteOnce / ReadWriteMany), reclaim policy (Delete / Retain), and binding mode (Immediate / WaitForFirstConsumer) deliberately.

Autoscaling, PDBs, and Operational Sanity

  • HPA: scale pods on metrics. Always set minReplicas >= 2 for HA.
  • VPA: rightsize resource requests; clashes with HPA on the same metric.
  • Cluster Autoscaler / Karpenter: scale nodes. Karpenter is the modern default on AWS.
  • PodDisruptionBudget: cap the number of unavailable pods during voluntary disruption (drain, scale-down, eviction). Without PDBs, the autoscaler will happily evict every replica simultaneously.

Operational Practice

The Kubernetes operational discipline:

  • Always run multi-AZ for production. Use topologySpreadConstraints to enforce it.
  • etcd: 5 nodes across 3 AZs, KMS-backed encryption at rest, tested backup/restore.
  • RBAC: deny by default; explicit allow per ServiceAccount; treat cluster-admin as root.
  • NetworkPolicy: default-deny per namespace; explicit allow rules.
  • PodSecurity admission: restricted profile by default, exceptions audited.

Module 8 of the Cloud Native Security Engineering course covers Kubernetes hardening in depth. The Kubernetes Security Simulator exercises the misconfigurations that cause real outages.

Service Mesh Traffic Flow

SERVICE MESH TRAFFIC FLOW (sidecar-based)Pod Aapp-AEnvoy sidecarlocalhostPod BEnvoy sidecarapp-BmTLS + retries + telemetryhandled in sidecar, not app codeApplication code stays simple. The sidecar handles mTLS, retries, circuit breaking, observability.

Self-Check Quiz

  1. You have a Deployment with 3 replicas. The cluster autoscaler scales down a node. All 3 pods on that node get evicted. Why? (Answer: no PodDisruptionBudget. Define maxUnavailable: 1 so only one replica goes down at a time.)
  2. Postgres in a Deployment vs StatefulSet — what changes? (Answer: StatefulSet gives stable pod identity (postgres-0, postgres-1) and stable PVCs that follow each pod. Required for any stateful workload.)
  3. Service mesh adds 1ms latency per hop. Across 5 hops you pay 5ms. When is it worth it? (Answer: when the mesh-provided features (mTLS, retries, observability, traffic shifting) are worth more than 5ms. For mature production systems, almost always.)
  4. You enable Istio mesh-wide STRICT mTLS on day one of rollout. What happens? (Answer: external load balancer health probes fail; non-meshed services can no longer talk to meshed services; outage. Phase: PERMISSIVE first, observe, promote namespace by namespace.)

For Kubernetes hardening, the Kubernetes Security cheatsheet is the operational reference. For service-mesh patterns the Service Mesh cheatsheet covers Istio/Linkerd/Cilium patterns — see the service mesh glossary entry for the conceptual definition. The Kubernetes cheatsheet is the day-to-day reference for kubectl operational patterns. Practice with the Kubernetes Security Simulator.

Real-World Use Cases

  • Spotify runs over 1500 microservices on Kubernetes with a custom service mesh (Backstage / Apollo).
  • Pinterest migrated their entire fleet to Kubernetes over 3 years; the migration was as much a culture shift as a technology one.
  • Reddit runs everything on Kubernetes after a multi-year migration from EC2.
  • Google's GKE Autopilot is essentially Kubernetes with the operational complexity hidden — for teams that want the API but not the infrastructure overhead.

Production Notes

  • Always run multi-AZ. Use topologySpreadConstraints to enforce it; do not rely on luck.
  • PodDisruptionBudgets are mandatory for production workloads. Without them, autoscalers will happily evict every replica.
  • etcd: 5 nodes across 3 AZs, KMS-backed encryption at rest, tested backup/restore.
  • Resource requests at p95 of actual usage; do not let dev defaults of “500m CPU” ship to prod.

Common Mistakes

  • Running stateful workloads as Deployments. Use StatefulSet so PVCs follow the pod identity.
  • PodSecurity admission set to “privileged” in production namespaces. Use restricted with audited exceptions.
  • Ingress per service in a flat namespace. Use Gateway API with role-separated Gateway/Route for new infra.
  • Setting CPU limits = requests. CPU CFS throttling kicks in even when other cores are free; latency suffers.

Security Risks to Watch

  • Default-permissive RBAC and PodSecurity; restricted profile must be explicit per-namespace.
  • Service mesh sidecars run as elevated workloads; an unverified mesh component is a cluster-wide attack vector.
  • Public LoadBalancer Services bypass NetworkPolicy; verify origin restrictions are enforced at the LB.
  • Container image pulls without signature verification accept anything from the registry. Use cosign + admission policy.

Design Tradeoffs

Service mesh: Istio

Pros

  • Most feature-rich
  • Strong community
  • Rich traffic management

Cons

  • Heavy operationally
  • Steeper learning curve

Service mesh: Linkerd

Pros

  • Simpler
  • Faster (Rust)
  • Zero-config mTLS

Cons

  • Fewer advanced features

Service mesh: Cilium (eBPF)

Pros

  • Sidecar-free
  • Integrated with CNI
  • Lower latency tax

Cons

  • Newer; ecosystem still maturing

No mesh; just K8s primitives

Pros

  • Less operational complexity
  • Lower latency

Cons

  • No automatic mTLS
  • Resilience patterns in app code

Production Alternatives

  • Self-managed Kubernetes (kubeadm, RKE2, k0s): Full control; full operational responsibility.
  • EKS / GKE / AKS: Managed control plane; you operate the workloads.
  • GKE Autopilot / EKS Auto Mode: Managed control plane AND nodes; closest to “just deploy a Pod”.
  • HashiCorp Nomad: Lighter alternative; works for non-container workloads.
  • Cloud-specific PaaS (Cloud Run, App Runner, Container Apps): Skip Kubernetes entirely for stateless web workloads.

Think Like an Engineer

  • Read every Kubernetes manifest as a contract with the scheduler. Resource requests, anti-affinity, and PDBs are the operational levers.
  • Treat Kubernetes upgrades like deployment changes — staged across canary clusters before prod.
  • For every workload class (web, batch, ML), define which features (HPA, PDB, topology spread) it must use as a baseline.

Production Story

A team migrated to Kubernetes and immediately deployed their stateful Postgres as a Deployment with a single replica, no PVC, “just to get something running”. Three weeks later a node was reaped during cluster upgrade; the pod was rescheduled; Postgres started fresh on a new node with empty disk. They lost a week of customer data. The runbook never changed: stateful workloads use StatefulSet from day one, with PVCs, with backups verified weekly.

Key Terms

Kubernetes Service
Stable virtual IP and DNS name for a set of pods; load-balances internal traffic.
Service mesh
Sidecar-based infrastructure for mTLS, retries, observability between services.
StatefulSet
Workload controller for pods that need stable identity and storage.
PodDisruptionBudget
Cap on simultaneous voluntary disruptions to a workload; prevents accidental full-replica eviction.
Karpenter
Modern Kubernetes node autoscaler on AWS; replaces Cluster Autoscaler with faster, more flexible provisioning.

Hands-On Labs

  1. Lab 10.1 — Kind Cluster from Scratch

    Stand up a multi-node kind cluster, deploy a 3-tier app, expose via Ingress.

    90 minutes - Beginner

    • Create kind cluster with 3 worker nodes
    • Install nginx-ingress
    • Deploy frontend / API / DB
    • Verify external access via Ingress

    View lab files on GitHub

  2. Lab 10.2 — Linkerd Service Mesh

    Install Linkerd; observe automatic mTLS; verify mesh observability.

    90 minutes - Intermediate

    • Install Linkerd CLI and control plane
    • Inject sidecars into namespace
    • Verify mTLS via Linkerd Viz
    • Inject failure with Toxiproxy; observe retry behaviour

    View lab files on GitHub

  3. Lab 10.3 — StatefulSet for a Database

    Deploy Postgres as a StatefulSet with persistent storage; verify pod identity stability.

    60 minutes - Intermediate

    • Define StatefulSet with PVC template
    • Deploy 3 replicas
    • Kill a pod; verify PVC reattaches to the same logical pod
    • Demonstrate stable network identity

    View lab files on GitHub

Key Takeaways

  • Kubernetes is itself a distributed system; understanding its components is now part of distributed-systems literacy
  • Service / Ingress / Gateway API: pick Gateway API for new infra
  • Service mesh is optional but powerful; pick Istio for full features, Linkerd for simplicity, Cilium for unified networking
  • StatefulSets + PVCs handle stateful workloads correctly; do not run databases as Deployments
  • PDBs, multi-AZ topology spread, and tested etcd backups are the operational must-haves