Skip to main content

Module 6: Scalability Engineering Slides

Slide walkthrough for Module 6 of Distributed Systems Engineering: Building Scalable, Reliable & Secure Systems: Horizontal scaling, autoscaling, caching...

This slide page is the visual review companion for the full course module. Use it to recap the architecture, examples, exercises, production warnings, and takeaways after reading the lesson.

Slide Outline

  1. Scalability Engineering - Horizontal scaling, autoscaling, caching, CDNs, rate limiting — how production systems handle 10x and 100x traffic without 10x and 100x cost.
  2. Learning Objectives - 5 outcomes for this module
  3. Why This Module Matters - Scalability engineering separates the engineers who can ship a system that works at 1k RPS from the ones who can ship a
  4. Before vs After - The operational shift this module teaches
  5. Horizontal vs Vertical Scaling - Lesson section from the full module
  6. Caching as a Scaling Lever - Lesson section from the full module
  7. CDN - Caching at the Edge - Lesson section from the full module
  8. Autoscaling on Kubernetes - Lesson section from the full module
  9. Distributed Rate Limiting - Lesson section from the full module
  10. Identifying the Bottleneck - Lesson section from the full module
  11. Cache Hierarchy in Practice - Lesson section from the full module
  12. Distributed Rate Limiter Architecture - Lesson section from the full module
  13. Real-World Use Cases - AWS DynamoDB's burst capacity is a literal token-bucket implementation visible to users., Cloudflare absorbs trillions of requests per day at the edge with a layered cache that serves most reads before any origin is involved.
  14. Common Mistakes to Avoid - 3 mistakes covered
  15. Production Notes - 3 practical notes
  16. Security Risks to Watch - 4 risks covered
  17. Hands-On Labs - 3 hands-on labs
  18. Key Takeaways - 5 points to remember

Learning Objectives

  • Design stateless services that scale horizontally without coordination
  • Pick the right caching strategy (cache-aside, write-through, write-back) for the workload
  • Configure Kubernetes HPA, VPA, and Cluster Autoscaler so they actually work
  • Implement distributed rate limiting that survives multi-region
  • Identify the scalability bottleneck before it becomes the outage

Why This Module Matters

Scalability engineering separates the engineers who can ship a system that works at 1k RPS from the ones who can ship a system that works at 1M RPS. Most architectures hit a single bottleneck early; the engineering skill is identifying that bottleneck before it bites and moving it before users feel it. Once you internalise the “every system has a bottleneck” mindset, you stop being surprised when the database connection pool exhausts under load you thought was easy.

Production Notes

  • Profile workloads BEFORE setting resource requests. Most workloads request 2-3x what they use; right-sizing is direct cost savings.
  • Scale on the metric closest to user latency, not CPU. CPU at 30% with throttled latency means CPU is not your bottleneck.
  • For Karpenter on AWS, set node consolidation to be aggressive but combined with PodDisruptionBudgets so the consolidation does not cause outages.

Common Mistakes

  • Setting HPA on CPU when the database connection pool is the actual bottleneck.
  • Caching everything by default; sometimes the database is fast enough and the cache is just extra failure surface.
  • Cluster Autoscaler with no Pod Disruption Budgets; nodes scale down and take working pods with them.

Key Takeaways

  • Stateless services are the foundation of horizontal scale — remove state-leaking patterns first
  • Multi-layer caching multiplies capacity; pick a strategy per layer deliberately
  • Scale on the metric closest to user latency, not on CPU when CPU is not the bottleneck
  • Distributed rate limiting requires consensus or aggregation — pick the trade-off
  • Every system has a bottleneck; the engineering work is moving it before it bites

Hands-On Labs

  1. Lab 6.1 — HPA on Custom Metrics

    Configure HPA based on RPS or queue depth via Prometheus Adapter; observe scale-up under load.

    90 minutes - Intermediate

    • Deploy app + Prometheus + Prometheus Adapter
    • Define HPA on RPS metric
    • Generate load; watch replicas scale up
    • Cool down; watch scale down

    View lab files on GitHub

  2. Lab 6.2 — Cache-Aside with Stampede Protection

    Implement cache-aside with per-key locking to prevent thundering herd.

    60 minutes - Intermediate

    • Implement naive cache-aside
    • Reproduce stampede on cache expiry
    • Add per-key Redis lock for recompute
    • Verify single recompute under load

    View lab files on GitHub

  3. Lab 6.3 — Distributed Rate Limiter (Redis Lua)

    Implement an atomic token-bucket rate limiter as a Redis Lua script; load test it.

    60 minutes - Advanced

    • Write Lua script for atomic token bucket update
    • Hit it from many concurrent clients
    • Verify the rate is enforced globally

    View lab files on GitHub

Read the full module | Back to course curriculum