Module 8 of 8

Production Design: Security, Performance, and Scale

Turn the pattern into a production-grade design with latency budgets, failure modes, audits, and rollout plans.

120 minutes1 exercisesFree

Start here

Learning objectives

  • Evaluate the performance cost of JWT validation and external authorization
  • Design high availability for Envoy, JWKS, IdP, and auth services
  • Choose fail-closed, fail-open, and degraded-mode behavior deliberately
  • Write a production rollout checklist for centralized auth migration

Before

  • Product-specific auth logic with unknown failure modes
  • No shared audit contract
  • Static secrets and broad credentials
  • No auth latency budget

After

  • Envoy-enforced identity and policy checks
  • Shared audit fields across products
  • Short-lived tokens and scoped federated credentials
  • Measured decision latency and rollback plan
Production Design: Security, Performance, and Scale A central Envoy edge checks identity before traffic reaches product services. Traffic browser or service Envoy enforcement point HA Auth IdP or policy service when needed Products Kubernetes app Authenticate once, validate every request, authorize before forwarding.

Performance Model

JWT validation is usually fast after JWKS keys are cached. External authorization adds a network hop on the request path. That means auth service latency becomes user-facing latency. Use short timeouts, horizontal scaling, connection reuse, circuit breakers, and careful caching where policy allows it.

Production Resilience Example

This cluster snippet shows the types of controls you should think about for the central auth service: timeouts, circuit breakers, outlier detection, and stable Kubernetes service discovery.

clusters:
- name: central_authz_service
  type: STRICT_DNS
  connect_timeout: 300ms
  circuit_breakers:
    thresholds:
    - priority: DEFAULT
      max_connections: 500
      max_pending_requests: 1000
      max_requests: 2000
  outlier_detection:
    consecutive_5xx: 5
    interval: 5s
    base_ejection_time: 30s
  load_assignment:
    cluster_name: central_authz_service
    endpoints:
    - lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: authz.platform.svc.cluster.local
              port_value: 9000

Fail-Closed vs Fail-Open

Security-sensitive routes should usually fail closed: if Envoy cannot validate identity or ask the auth service, deny the request. Some read-only low-risk routes may have a documented degraded mode, but it must be explicit. Never let a timeout silently become broad access.

Scaling the Auth Layer

  • Run multiple Envoy replicas across nodes and zones.
  • Run multiple auth service replicas with autoscaling and health checks.
  • Cache JWKS and simple policy decisions carefully.
  • Keep IdP calls out of the hot path after login when possible.
  • Use metrics for allow rate, deny rate, error rate, decision latency, token failures, and route-level spikes.

Advantages

  • Consistent auth behavior across products.
  • Better audit logs and incident response.
  • Less duplicated token parsing and SSO code.
  • Easier migration from static secrets to short-lived tokens and federated credentials.

Disadvantages

  • Central auth becomes critical path infrastructure.
  • Bad central policy can break many products at once.
  • Teams need clear ownership for route policy, product policy, and data policy.
  • Performance work is required when every request asks for a decision.

Production Rollout Plan

  1. Inventory products, routes, token types, current auth logic, and risk level.
  2. Start with one low-risk product behind Envoy in report-only or shadow mode.
  3. Enable JWT validation for API routes with explicit issuer and audience.
  4. Add ext_authz for routes that need central policy or browser SSO decisions.
  5. Strip and rewrite trusted identity headers.
  6. Add dashboards and alerts before migrating high-risk products.
  7. Document emergency rollback and key-rotation procedures.

Real world

Where this shows up

  • A Kubernetes platform with many internal products that should share one login and one enforcement layer
  • Developer portals, data tools, admin consoles, APIs, and service-to-service calls protected at the edge
  • Migration from product-specific auth logic to centralized policy checks without rewriting every product first

Production notes

Keep these close

  • Set an explicit latency budget for auth decisions
  • Test IdP outage, JWKS failure, auth service 5xx, and Envoy config rollback
  • Separate policy authoring from emergency break-glass access

Common mistakes

What usually breaks

  • Making every request call the IdP directly
  • Failing open on sensitive routes because it feels more available
  • Migrating every product at once without shadow mode or route-level metrics

Security risks

Threats to watch

  • Auth service compromise can affect all protected products
  • Poorly protected break-glass flows can bypass the whole design
  • Missing audit data can turn an incident into guesswork

Tradeoffs

Design choices you should be able to defend

Fail closed by default

Pros

  • Safer for sensitive systems
  • Clear security posture
  • Prevents timeout-based bypass

Cons

  • Auth outages can become product outages
  • Requires strong reliability engineering

Selective degraded mode

Pros

  • Can protect read-only availability
  • Useful for low-risk status pages

Cons

  • Easy to misuse
  • Needs explicit route classification and audit

Think like an engineer

Questions to answer before shipping

  • Which identity is making the request: a human, a service, or a federated cloud principal?
  • Which check belongs at Envoy, and which check must remain inside the product domain model?
  • What happens when the identity provider, JWKS endpoint, or auth service is slow or unavailable?

Key terms

Vocabulary used in this module

Critical path

A component that must work for user requests to complete.

Fail closed

Deny access when the security decision cannot be made.

Circuit breaker

A control that limits calls to an unhealthy dependency before it causes broader failure.

Shadow mode

Running a new policy path without enforcing it yet, so teams can observe decisions safely.

Exercises

Practice inside the lesson

60-90 minutesProduction grade

Write the Production Auth Design Review

Produce a review-ready design for centralized Envoy authentication and authorization.

  1. Define route groups and token types
  2. Choose JWT validation rules and JWKS cache behavior
  3. Choose ext_authz timeout, failure mode, and scaling plan
  4. Write the identity header contract and spoofing protections
  5. Write the rollout plan, rollback plan, and audit dashboard requirements

Recap

Key takeaways

  • Central auth improves consistency but becomes critical infrastructure
  • JWT validation is fast; external authorization must be engineered for latency and availability
  • Production readiness means rollout, rollback, observability, and failure behavior are designed before migration

Related resources

Keep learning across CodersSecret