Module 10 of 12

Kubernetes & Cloud Native Distributed Systems

How Kubernetes changes distributed-systems design - cluster architecture, service mesh, ingress, autoscaling, and the operational primitives that everything else now sits on top of.

5 hours3 labsFree

Watch as Slides Course overview Lab code

Start here

Learning objectives

Read a Kubernetes cluster architecture (control plane, kubelet, kube-proxy, etcd, CNI)
Use Services, Ingress, and Gateway API correctly for distributed workloads
Compare service meshes (Istio, Linkerd, Cilium) and pick one with eyes open
Run StatefulSets, PVCs, and storage classes for stateful workloads
Operate workloads with HPA, VPA, Karpenter, and PodDisruptionBudgets in production

Before

Pets-not-cattle; long-lived nodes that can't be replaced
Shell scripts deploying directly to VMs; no declarative state
No autoscaling; capacity provisioned for peak, idle most of the time
Stateful workloads as single VMs; failure = data loss

After

Cattle-not-pets; nodes are interchangeable and routinely cycled
GitOps with Argo CD or Flux; declarative state, audited changes
HPA + Karpenter; capacity scales with demand within minutes
StatefulSets + PVCs + tested backups; node failure = pod reschedule, not data loss

Kubernetes is the substrate that runs most modern distributed systems. It is, itself, a distributed system - with consensus (etcd / Raft), partitioning (resources scheduled across nodes), replication (pods), and observability built in. Understanding how Kubernetes is constructed is now part of distributed-systems literacy.

Cluster Architecture

The control plane consists of:

kube-apiserver: the only component that writes to etcd; every other component talks to apiserver. Stateless; horizontally scalable.
etcd: the source of truth for cluster state; Raft-replicated; 3 or 5 nodes.
kube-scheduler: assigns pending pods to nodes based on resource fit, affinity, and topology.
kube-controller-manager: runs reconciliation loops (Deployment, ReplicaSet, Node, Endpoints, etc.). Each controller leader-elects via etcd.
cloud-controller-manager: runs cloud-specific controllers (load balancers, persistent volumes, node lifecycle).

Each node runs:

kubelet: the node agent; pulls container images, runs containers via the container runtime, reports node and pod status to apiserver.
kube-proxy: implements Service abstraction via iptables / IPVS rules. (Modern alternative: Cilium with no kube-proxy.)
CNI plugin: pod networking (Cilium, Calico, AWS VPC CNI, etc.). Provides pod IPs, NetworkPolicy enforcement, often eBPF observability.
Container runtime: containerd or CRI-O; runs the containers.

Service, Ingress, Gateway API

Service: stable virtual IP for a set of pods; load-balances internal traffic; ClusterIP for in-cluster, LoadBalancer for cloud LB, NodePort for external on a port.
Ingress: L7 routing for HTTP/HTTPS; needs an Ingress Controller (nginx-ingress, AWS ALB, Envoy-based, etc.). Older API; many extensions baked into annotations.
Gateway API: the modern replacement for Ingress; richer, role-separated (Gateway/HTTPRoute/etc.), portable across implementations. The right choice for new infra.

Service Mesh

A service mesh adds a sidecar proxy (Envoy) to every pod to handle service-to-service: mTLS, retries, circuit breaking, observability, traffic shifting, authorization. Three major options:

Istio: most feature-rich; substantial complexity. Best when you need the full toolkit.
Linkerd: simpler, performance-focused; written in Rust; zero-config mTLS. Best for “mesh basics, fast”.
Cilium service mesh: eBPF-based, no sidecar, integrated with the Cilium CNI. Best when you want one tool for networking + mesh.

Service meshes implement most of the resilience patterns from Module 7 (timeouts, retries, circuit breakers) for free at the data plane. The cost is operational complexity and the latency tax of every request going through a sidecar.

Stateful Workloads

StatefulSets give pods stable identities (predictable name, predictable network address) and stable storage (PersistentVolumeClaims that follow the pod). The right pattern for databases, message queues, and any workload where pod identity matters.

StorageClasses define dynamic provisioning of PersistentVolumes from cloud-provider disks (EBS, PD, Azure Disk) or storage operators (Rook/Ceph, Longhorn). Choose access mode (ReadWriteOnce / ReadWriteMany), reclaim policy (Delete / Retain), and binding mode (Immediate / WaitForFirstConsumer) deliberately.

Autoscaling, PDBs, and Operational Sanity

HPA: scale pods on metrics. Always set minReplicas >= 2 for HA.
VPA: rightsize resource requests; clashes with HPA on the same metric.
Cluster Autoscaler / Karpenter: scale nodes. Karpenter is the modern default on AWS.
PodDisruptionBudget: cap the number of unavailable pods during voluntary disruption (drain, scale-down, eviction). Without PDBs, the autoscaler will happily evict every replica simultaneously.

Operational Practice

The Kubernetes operational discipline:

Always run multi-AZ for production. Use topologySpreadConstraints to enforce it.
etcd: 5 nodes across 3 AZs, KMS-backed encryption at rest, tested backup/restore.
RBAC: deny by default; explicit allow per ServiceAccount; treat cluster-admin as root.
NetworkPolicy: default-deny per namespace; explicit allow rules.
PodSecurity admission: restricted profile by default, exceptions audited.

Module 8 of the Cloud Native Security Engineering course covers Kubernetes hardening in depth. The Kubernetes Security Simulator exercises the misconfigurations that cause real outages.

Service Mesh Traffic Flow

Self-Check Quiz

You have a Deployment with 3 replicas. The cluster autoscaler scales down a node. All 3 pods on that node get evicted. Why? (Answer: no PodDisruptionBudget. Define maxUnavailable: 1 so only one replica goes down at a time.)
Postgres in a Deployment vs StatefulSet - what changes? (Answer: StatefulSet gives stable pod identity (postgres-0, postgres-1) and stable PVCs that follow each pod. Required for any stateful workload.)
Service mesh adds 1ms latency per hop. Across 5 hops you pay 5ms. When is it worth it? (Answer: when the mesh-provided features (mTLS, retries, observability, traffic shifting) are worth more than 5ms. For mature production systems, almost always.)
You enable Istio mesh-wide STRICT mTLS on day one of rollout. What happens? (Answer: external load balancer health probes fail; non-meshed services can no longer talk to meshed services; outage. Phase: PERMISSIVE first, observe, promote namespace by namespace.)

For Kubernetes hardening, the Kubernetes Security cheatsheet is the operational reference. For service-mesh patterns the Service Mesh cheatsheet covers Istio/Linkerd/Cilium patterns - see the service mesh glossary entry for the conceptual definition. The Kubernetes cheatsheet is the day-to-day reference for kubectl operational patterns. Practice with the Kubernetes Security Simulator.

Real world

Where this shows up

Spotify runs over 1500 microservices on Kubernetes with a custom service mesh (Backstage / Apollo).
Pinterest migrated their entire fleet to Kubernetes over 3 years; the migration was as much a culture shift as a technology one.
Reddit runs everything on Kubernetes after a multi-year migration from EC2.
Google's GKE Autopilot is essentially Kubernetes with the operational complexity hidden - for teams that want the API but not the infrastructure overhead.

Production notes

Keep these close

Always run multi-AZ. Use topologySpreadConstraints to enforce it; do not rely on luck.
PodDisruptionBudgets are mandatory for production workloads. Without them, autoscalers will happily evict every replica.
etcd: 5 nodes across 3 AZs, KMS-backed encryption at rest, tested backup/restore.
Resource requests at p95 of actual usage; do not let dev defaults of “500m CPU” ship to prod.

Common mistakes

What usually breaks

Running stateful workloads as Deployments. Use StatefulSet so PVCs follow the pod identity.
PodSecurity admission set to “privileged” in production namespaces. Use restricted with audited exceptions.
Ingress per service in a flat namespace. Use Gateway API with role-separated Gateway/Route for new infra.
Setting CPU limits = requests. CPU CFS throttling kicks in even when other cores are free; latency suffers.

Security risks

Threats to watch

Default-permissive RBAC and PodSecurity; restricted profile must be explicit per-namespace.
Service mesh sidecars run as elevated workloads; an unverified mesh component is a cluster-wide attack vector.
Public LoadBalancer Services bypass NetworkPolicy; verify origin restrictions are enforced at the LB.
Container image pulls without signature verification accept anything from the registry. Use cosign + admission policy.

Tradeoffs

Design choices you should be able to defend

Service mesh: Istio

Pros

Most feature-rich
Strong community
Rich traffic management

Cons

Heavy operationally
Steeper learning curve

Service mesh: Linkerd

Pros

Simpler
Faster (Rust)
Zero-config mTLS

Cons

Fewer advanced features

Service mesh: Cilium (eBPF)

Pros

Sidecar-free
Integrated with CNI
Lower latency tax

Cons

Newer; ecosystem still maturing

No mesh; just K8s primitives

Pros

Less operational complexity
Lower latency

Cons

No automatic mTLS
Resilience patterns in app code

Alternatives

Other production approaches

Self-managed Kubernetes (kubeadm, RKE2, k0s)

Full control; full operational responsibility.

EKS / GKE / AKS

Managed control plane; you operate the workloads.

GKE Autopilot / EKS Auto Mode

Managed control plane AND nodes; closest to “just deploy a Pod”.

HashiCorp Nomad

Lighter alternative; works for non-container workloads.

Cloud-specific PaaS (Cloud Run, App Runner, Container Apps)

Skip Kubernetes entirely for stateless web workloads.

Think like an engineer

Questions to answer before shipping

Read every Kubernetes manifest as a contract with the scheduler. Resource requests, anti-affinity, and PDBs are the operational levers.
Treat Kubernetes upgrades like deployment changes - staged across canary clusters before prod.
For every workload class (web, batch, ML), define which features (HPA, PDB, topology spread) it must use as a baseline.

Key terms

Vocabulary used in this module

Kubernetes Service

Stable virtual IP and DNS name for a set of pods; load-balances internal traffic.

Service mesh

Sidecar-based infrastructure for mTLS, retries, observability between services.

StatefulSet

Workload controller for pods that need stable identity and storage.

PodDisruptionBudget

Cap on simultaneous voluntary disruptions to a workload; prevents accidental full-replica eviction.

Karpenter

Modern Kubernetes node autoscaler on AWS; replaces Cluster Autoscaler with faster, more flexible provisioning.

Labs

Hands-on labs

90 minutesBeginner

Lab 10.1 - Kind Cluster from Scratch

Stand up a multi-node kind cluster, deploy a 3-tier app, expose via Ingress.

Create kind cluster with 3 worker nodes
Install nginx-ingress
Deploy frontend / API / DB
Verify external access via Ingress

View lab on GitHub

90 minutesIntermediate

Lab 10.2 - Linkerd Service Mesh

Install Linkerd; observe automatic mTLS; verify mesh observability.

Install Linkerd CLI and control plane
Inject sidecars into namespace
Verify mTLS via Linkerd Viz
Inject failure with Toxiproxy; observe retry behaviour

View lab on GitHub

60 minutesIntermediate

Lab 10.3 - StatefulSet for a Database

Deploy Postgres as a StatefulSet with persistent storage; verify pod identity stability.

Define StatefulSet with PVC template
Deploy 3 replicas
Kill a pod; verify PVC reattaches to the same logical pod
Demonstrate stable network identity

View lab on GitHub

Recap

Key takeaways

Kubernetes is itself a distributed system; understanding its components is now part of distributed-systems literacy
Service / Ingress / Gateway API: pick Gateway API for new infra
Service mesh is optional but powerful; pick Istio for full features, Linkerd for simplicity, Cilium for unified networking
StatefulSets + PVCs handle stateful workloads correctly; do not run databases as Deployments
PDBs, multi-AZ topology spread, and tested etcd backups are the operational must-haves

Related resources

Kubernetes & Cloud Native Distributed Systems

Learning objectives

Cluster Architecture

Service, Ingress, Gateway API

Service Mesh

Stateful Workloads

Autoscaling, PDBs, and Operational Sanity

Operational Practice

Service Mesh Traffic Flow

Self-Check Quiz

Where this shows up

Keep these close

What usually breaks

Threats to watch

Design choices you should be able to defend

Service mesh: Istio

Service mesh: Linkerd

Service mesh: Cilium (eBPF)

No mesh; just K8s primitives

Other production approaches

Self-managed Kubernetes (kubeadm, RKE2, k0s)

EKS / GKE / AKS

GKE Autopilot / EKS Auto Mode

HashiCorp Nomad

Cloud-specific PaaS (Cloud Run, App Runner, Container Apps)

Questions to answer before shipping

Vocabulary used in this module

Kubernetes Service

Service mesh

StatefulSet

PodDisruptionBudget

Karpenter

Hands-on labs

Lab 10.1 - Kind Cluster from Scratch

Lab 10.2 - Linkerd Service Mesh

Lab 10.3 - StatefulSet for a Database

Key takeaways

Keep learning across CodersSecret

Related guides

Cheatsheets

Interactive labs

Glossary terms