Module 10: Kubernetes & Cloud Native Distributed Systems Slides
Slide walkthrough for Module 10 of Distributed Systems Engineering: Building Scalable, Reliable & Secure Systems: How Kubernetes changes...
This slide page is the visual review companion for the full course module. Use it to recap the architecture, examples, exercises, production warnings, and takeaways after reading the lesson.
Slide Outline
- Kubernetes & Cloud Native Distributed Systems - How Kubernetes changes distributed-systems design — cluster architecture, service mesh, ingress, autoscaling, and the operational primitives that everything else now sits on top of.
- Learning Objectives - 5 outcomes for this module
- Why This Module Matters - Kubernetes is now the default substrate for modern infrastructure; understanding its components is part of distributed-s
- Before vs After - The operational shift this module teaches
- Cluster Architecture - Lesson section from the full module
- Service, Ingress, Gateway API - Lesson section from the full module
- Service Mesh - Lesson section from the full module
- Stateful Workloads - Lesson section from the full module
- Autoscaling, PDBs, and Operational Sanity - Lesson section from the full module
- Operational Practice - Lesson section from the full module
- Service Mesh Traffic Flow - Lesson section from the full module
- Self-Check Quiz - Lesson section from the full module
- Real-World Use Cases - Spotify runs over 1500 microservices on Kubernetes with a custom service mesh (Backstage / Apollo)., Pinterest migrated their entire fleet to Kubernetes over 3 years; the migration was as much a culture shift as a technology one.
- Common Mistakes to Avoid - 4 mistakes covered
- Production Notes - 4 practical notes
- Security Risks to Watch - 4 risks covered
- Hands-On Labs - 3 hands-on labs
- Key Takeaways - 5 points to remember
Learning Objectives
- Read a Kubernetes cluster architecture (control plane, kubelet, kube-proxy, etcd, CNI)
- Use Services, Ingress, and Gateway API correctly for distributed workloads
- Compare service meshes (Istio, Linkerd, Cilium) and pick one with eyes open
- Run StatefulSets, PVCs, and storage classes for stateful workloads
- Operate workloads with HPA, VPA, Karpenter, and PodDisruptionBudgets in production
Why This Module Matters
Kubernetes is now the default substrate for modern infrastructure; understanding its components is part of distributed-systems literacy. Engineers who can read a cluster architecture, reason about Service / Ingress / Gateway API, pick the right service mesh, and operate StatefulSets correctly are the engineers who get trusted with platform-engineering roles. The ones who treat Kubernetes as “just docker but bigger” eventually pay for it during the first multi-AZ incident.
Production Notes
- Always run multi-AZ. Use topologySpreadConstraints to enforce it; do not rely on luck.
- PodDisruptionBudgets are mandatory for production workloads. Without them, autoscalers will happily evict every replica.
- etcd: 5 nodes across 3 AZs, KMS-backed encryption at rest, tested backup/restore.
- Resource requests at p95 of actual usage; do not let dev defaults of “500m CPU” ship to prod.
Common Mistakes
- Running stateful workloads as Deployments. Use StatefulSet so PVCs follow the pod identity.
- PodSecurity admission set to “privileged” in production namespaces. Use restricted with audited exceptions.
- Ingress per service in a flat namespace. Use Gateway API with role-separated Gateway/Route for new infra.
- Setting CPU limits = requests. CPU CFS throttling kicks in even when other cores are free; latency suffers.
Key Takeaways
- Kubernetes is itself a distributed system; understanding its components is now part of distributed-systems literacy
- Service / Ingress / Gateway API: pick Gateway API for new infra
- Service mesh is optional but powerful; pick Istio for full features, Linkerd for simplicity, Cilium for unified networking
- StatefulSets + PVCs handle stateful workloads correctly; do not run databases as Deployments
- PDBs, multi-AZ topology spread, and tested etcd backups are the operational must-haves
Hands-On Labs
-
Lab 10.1 — Kind Cluster from Scratch
Stand up a multi-node kind cluster, deploy a 3-tier app, expose via Ingress.
90 minutes - Beginner
- Create kind cluster with 3 worker nodes
- Install nginx-ingress
- Deploy frontend / API / DB
- Verify external access via Ingress
-
Lab 10.2 — Linkerd Service Mesh
Install Linkerd; observe automatic mTLS; verify mesh observability.
90 minutes - Intermediate
- Install Linkerd CLI and control plane
- Inject sidecars into namespace
- Verify mTLS via Linkerd Viz
- Inject failure with Toxiproxy; observe retry behaviour
-
Lab 10.3 — StatefulSet for a Database
Deploy Postgres as a StatefulSet with persistent storage; verify pod identity stability.
60 minutes - Intermediate
- Define StatefulSet with PVC template
- Deploy 3 replicas
- Kill a pod; verify PVC reattaches to the same logical pod
- Demonstrate stable network identity