Module 10 of 12

Kubernetes & Cloud Native Distributed Systems

How Kubernetes changes distributed-systems design — cluster architecture, service mesh, ingress, autoscaling, and the operational primitives that everything else now sits on top of.

5 hours3 labsFree

Start here

Learning objectives

  • Read a Kubernetes cluster architecture (control plane, kubelet, kube-proxy, etcd, CNI)
  • Use Services, Ingress, and Gateway API correctly for distributed workloads
  • Compare service meshes (Istio, Linkerd, Cilium) and pick one with eyes open
  • Run StatefulSets, PVCs, and storage classes for stateful workloads
  • Operate workloads with HPA, VPA, Karpenter, and PodDisruptionBudgets in production

Before

  • Pets-not-cattle; long-lived nodes that can't be replaced
  • Shell scripts deploying directly to VMs; no declarative state
  • No autoscaling; capacity provisioned for peak, idle most of the time
  • Stateful workloads as single VMs; failure = data loss

After

  • Cattle-not-pets; nodes are interchangeable and routinely cycled
  • GitOps with Argo CD or Flux; declarative state, audited changes
  • HPA + Karpenter; capacity scales with demand within minutes
  • StatefulSets + PVCs + tested backups; node failure = pod reschedule, not data loss
KUBERNETES CLUSTER ARCHITECTURECONTROL PLANEapiserverschedulerctrl-mgrcloud-ctrletcd (Raft)Node 1kubeletkube-proxy / CNIpod: app-Apod: app-BNode 2kubeletkube-proxy / CNIpod: app-Apod: app-CNode 3kubeletkube-proxy / CNIpod: app-Bpod: app-C

Kubernetes is the substrate that runs most modern distributed systems. It is, itself, a distributed system — with consensus (etcd / Raft), partitioning (resources scheduled across nodes), replication (pods), and observability built in. Understanding how Kubernetes is constructed is now part of distributed-systems literacy.

Cluster Architecture

The control plane consists of:

  • kube-apiserver: the only component that writes to etcd; every other component talks to apiserver. Stateless; horizontally scalable.
  • etcd: the source of truth for cluster state; Raft-replicated; 3 or 5 nodes.
  • kube-scheduler: assigns pending pods to nodes based on resource fit, affinity, and topology.
  • kube-controller-manager: runs reconciliation loops (Deployment, ReplicaSet, Node, Endpoints, etc.). Each controller leader-elects via etcd.
  • cloud-controller-manager: runs cloud-specific controllers (load balancers, persistent volumes, node lifecycle).

Each node runs:

  • kubelet: the node agent; pulls container images, runs containers via the container runtime, reports node and pod status to apiserver.
  • kube-proxy: implements Service abstraction via iptables / IPVS rules. (Modern alternative: Cilium with no kube-proxy.)
  • CNI plugin: pod networking (Cilium, Calico, AWS VPC CNI, etc.). Provides pod IPs, NetworkPolicy enforcement, often eBPF observability.
  • Container runtime: containerd or CRI-O; runs the containers.

Service, Ingress, Gateway API

  • Service: stable virtual IP for a set of pods; load-balances internal traffic; ClusterIP for in-cluster, LoadBalancer for cloud LB, NodePort for external on a port.
  • Ingress: L7 routing for HTTP/HTTPS; needs an Ingress Controller (nginx-ingress, AWS ALB, Envoy-based, etc.). Older API; many extensions baked into annotations.
  • Gateway API: the modern replacement for Ingress; richer, role-separated (Gateway/HTTPRoute/etc.), portable across implementations. The right choice for new infra.

Service Mesh

A service mesh adds a sidecar proxy (Envoy) to every pod to handle service-to-service: mTLS, retries, circuit breaking, observability, traffic shifting, authorization. Three major options:

  • Istio: most feature-rich; substantial complexity. Best when you need the full toolkit.
  • Linkerd: simpler, performance-focused; written in Rust; zero-config mTLS. Best for “mesh basics, fast”.
  • Cilium service mesh: eBPF-based, no sidecar, integrated with the Cilium CNI. Best when you want one tool for networking + mesh.

Service meshes implement most of the resilience patterns from Module 7 (timeouts, retries, circuit breakers) for free at the data plane. The cost is operational complexity and the latency tax of every request going through a sidecar.

Stateful Workloads

StatefulSets give pods stable identities (predictable name, predictable network address) and stable storage (PersistentVolumeClaims that follow the pod). The right pattern for databases, message queues, and any workload where pod identity matters.

StorageClasses define dynamic provisioning of PersistentVolumes from cloud-provider disks (EBS, PD, Azure Disk) or storage operators (Rook/Ceph, Longhorn). Choose access mode (ReadWriteOnce / ReadWriteMany), reclaim policy (Delete / Retain), and binding mode (Immediate / WaitForFirstConsumer) deliberately.

Autoscaling, PDBs, and Operational Sanity

  • HPA: scale pods on metrics. Always set minReplicas >= 2 for HA.
  • VPA: rightsize resource requests; clashes with HPA on the same metric.
  • Cluster Autoscaler / Karpenter: scale nodes. Karpenter is the modern default on AWS.
  • PodDisruptionBudget: cap the number of unavailable pods during voluntary disruption (drain, scale-down, eviction). Without PDBs, the autoscaler will happily evict every replica simultaneously.

Operational Practice

The Kubernetes operational discipline:

  • Always run multi-AZ for production. Use topologySpreadConstraints to enforce it.
  • etcd: 5 nodes across 3 AZs, KMS-backed encryption at rest, tested backup/restore.
  • RBAC: deny by default; explicit allow per ServiceAccount; treat cluster-admin as root.
  • NetworkPolicy: default-deny per namespace; explicit allow rules.
  • PodSecurity admission: restricted profile by default, exceptions audited.

Module 8 of the Cloud Native Security Engineering course covers Kubernetes hardening in depth. The Kubernetes Security Simulator exercises the misconfigurations that cause real outages.

Service Mesh Traffic Flow

SERVICE MESH TRAFFIC FLOW (sidecar-based)Pod Aapp-AEnvoy sidecarlocalhostPod BEnvoy sidecarapp-BmTLS + retries + telemetryhandled in sidecar, not app codeApplication code stays simple. The sidecar handles mTLS, retries, circuit breaking, observability.

Self-Check Quiz

  1. You have a Deployment with 3 replicas. The cluster autoscaler scales down a node. All 3 pods on that node get evicted. Why? (Answer: no PodDisruptionBudget. Define maxUnavailable: 1 so only one replica goes down at a time.)
  2. Postgres in a Deployment vs StatefulSet — what changes? (Answer: StatefulSet gives stable pod identity (postgres-0, postgres-1) and stable PVCs that follow each pod. Required for any stateful workload.)
  3. Service mesh adds 1ms latency per hop. Across 5 hops you pay 5ms. When is it worth it? (Answer: when the mesh-provided features (mTLS, retries, observability, traffic shifting) are worth more than 5ms. For mature production systems, almost always.)
  4. You enable Istio mesh-wide STRICT mTLS on day one of rollout. What happens? (Answer: external load balancer health probes fail; non-meshed services can no longer talk to meshed services; outage. Phase: PERMISSIVE first, observe, promote namespace by namespace.)

For Kubernetes hardening, the Kubernetes Security cheatsheet is the operational reference. For service-mesh patterns the Service Mesh cheatsheet covers Istio/Linkerd/Cilium patterns — see the service mesh glossary entry for the conceptual definition. The Kubernetes cheatsheet is the day-to-day reference for kubectl operational patterns. Practice with the Kubernetes Security Simulator.

Real world

Where this shows up

  • Spotify runs over 1500 microservices on Kubernetes with a custom service mesh (Backstage / Apollo).
  • Pinterest migrated their entire fleet to Kubernetes over 3 years; the migration was as much a culture shift as a technology one.
  • Reddit runs everything on Kubernetes after a multi-year migration from EC2.
  • Google's GKE Autopilot is essentially Kubernetes with the operational complexity hidden — for teams that want the API but not the infrastructure overhead.

Production notes

Keep these close

  • Always run multi-AZ. Use topologySpreadConstraints to enforce it; do not rely on luck.
  • PodDisruptionBudgets are mandatory for production workloads. Without them, autoscalers will happily evict every replica.
  • etcd: 5 nodes across 3 AZs, KMS-backed encryption at rest, tested backup/restore.
  • Resource requests at p95 of actual usage; do not let dev defaults of “500m CPU” ship to prod.

Common mistakes

What usually breaks

  • Running stateful workloads as Deployments. Use StatefulSet so PVCs follow the pod identity.
  • PodSecurity admission set to “privileged” in production namespaces. Use restricted with audited exceptions.
  • Ingress per service in a flat namespace. Use Gateway API with role-separated Gateway/Route for new infra.
  • Setting CPU limits = requests. CPU CFS throttling kicks in even when other cores are free; latency suffers.

Security risks

Threats to watch

  • Default-permissive RBAC and PodSecurity; restricted profile must be explicit per-namespace.
  • Service mesh sidecars run as elevated workloads; an unverified mesh component is a cluster-wide attack vector.
  • Public LoadBalancer Services bypass NetworkPolicy; verify origin restrictions are enforced at the LB.
  • Container image pulls without signature verification accept anything from the registry. Use cosign + admission policy.

Tradeoffs

Design choices you should be able to defend

Service mesh: Istio

Pros

  • Most feature-rich
  • Strong community
  • Rich traffic management

Cons

  • Heavy operationally
  • Steeper learning curve

Service mesh: Linkerd

Pros

  • Simpler
  • Faster (Rust)
  • Zero-config mTLS

Cons

  • Fewer advanced features

Service mesh: Cilium (eBPF)

Pros

  • Sidecar-free
  • Integrated with CNI
  • Lower latency tax

Cons

  • Newer; ecosystem still maturing

No mesh; just K8s primitives

Pros

  • Less operational complexity
  • Lower latency

Cons

  • No automatic mTLS
  • Resilience patterns in app code

Alternatives

Other production approaches

Self-managed Kubernetes (kubeadm, RKE2, k0s)

Full control; full operational responsibility.

EKS / GKE / AKS

Managed control plane; you operate the workloads.

GKE Autopilot / EKS Auto Mode

Managed control plane AND nodes; closest to “just deploy a Pod”.

HashiCorp Nomad

Lighter alternative; works for non-container workloads.

Cloud-specific PaaS (Cloud Run, App Runner, Container Apps)

Skip Kubernetes entirely for stateless web workloads.

Think like an engineer

Questions to answer before shipping

  • Read every Kubernetes manifest as a contract with the scheduler. Resource requests, anti-affinity, and PDBs are the operational levers.
  • Treat Kubernetes upgrades like deployment changes — staged across canary clusters before prod.
  • For every workload class (web, batch, ML), define which features (HPA, PDB, topology spread) it must use as a baseline.

Key terms

Vocabulary used in this module

Kubernetes Service

Stable virtual IP and DNS name for a set of pods; load-balances internal traffic.

Service mesh

Sidecar-based infrastructure for mTLS, retries, observability between services.

StatefulSet

Workload controller for pods that need stable identity and storage.

PodDisruptionBudget

Cap on simultaneous voluntary disruptions to a workload; prevents accidental full-replica eviction.

Karpenter

Modern Kubernetes node autoscaler on AWS; replaces Cluster Autoscaler with faster, more flexible provisioning.

Labs

Hands-on labs

90 minutesBeginner

Lab 10.1 — Kind Cluster from Scratch

Stand up a multi-node kind cluster, deploy a 3-tier app, expose via Ingress.

  1. Create kind cluster with 3 worker nodes
  2. Install nginx-ingress
  3. Deploy frontend / API / DB
  4. Verify external access via Ingress
View lab on GitHub
90 minutesIntermediate

Lab 10.2 — Linkerd Service Mesh

Install Linkerd; observe automatic mTLS; verify mesh observability.

  1. Install Linkerd CLI and control plane
  2. Inject sidecars into namespace
  3. Verify mTLS via Linkerd Viz
  4. Inject failure with Toxiproxy; observe retry behaviour
View lab on GitHub
60 minutesIntermediate

Lab 10.3 — StatefulSet for a Database

Deploy Postgres as a StatefulSet with persistent storage; verify pod identity stability.

  1. Define StatefulSet with PVC template
  2. Deploy 3 replicas
  3. Kill a pod; verify PVC reattaches to the same logical pod
  4. Demonstrate stable network identity
View lab on GitHub

Recap

Key takeaways

  • Kubernetes is itself a distributed system; understanding its components is now part of distributed-systems literacy
  • Service / Ingress / Gateway API: pick Gateway API for new infra
  • Service mesh is optional but powerful; pick Istio for full features, Linkerd for simplicity, Cilium for unified networking
  • StatefulSets + PVCs handle stateful workloads correctly; do not run databases as Deployments
  • PDBs, multi-AZ topology spread, and tested etcd backups are the operational must-haves

Related resources

Keep learning across CodersSecret