Module 12 of 13

Building a Complete Zero Trust Platform

Capstone project: assemble everything into a production architecture

4 hours1 labsFree

Start here

Learning objectives

  • Design an end-to-end zero trust platform architecture
  • Deploy SPIRE with Envoy mTLS and OPA authorization
  • Implement federation across two clusters
  • Create a reference architecture for your organization
ZERO TRUST PLATFORM — CAPSTONE ARCHITECTURESPIRE Server (HA) — Identity AuthorityCluster A (Production)Frontend + EnvoyAPI + EnvoyOrders + EnvoyPayments + EnvoyOPA (Policy Engine)SPIRE Agents (DaemonSet)Cluster B (Staging)Frontend + EnvoyAPI + EnvoyOPA (Policy Engine)SPIRE Agents (DaemonSet)mTLS

This is the capstone module. You will combine everything from the previous 11 modules into a complete, production-style zero trust platform. By the end, you will have a fully functional multi-cluster deployment with SPIRE, Envoy, OPA, and federation.

Architecture Overview

The capstone project deploys a microservice e-commerce application across two Kubernetes clusters with every service identified by SPIRE, all communication encrypted with Envoy mTLS, authorization enforced by OPA policies, cross-cluster communication via SPIFFE federation, and monitoring via Prometheus and Grafana.

What You Will Build

  1. Deploy SPIRE in HA mode on both clusters
  2. Configure automatic workload registration via Controller Manager
  3. Deploy Envoy sidecars for transparent mTLS
  4. Write and deploy OPA policies for service-to-service authorization
  5. Configure federation between the two clusters
  6. Deploy monitoring with SPIRE-specific dashboards
  7. Test failure scenarios: what happens when identity expires? When a policy changes?

Reference Architecture

This architecture serves as a template you can adapt for your organization. The key decisions documented: trust domain naming, SPIFFE ID schema, attestation plugin choices, certificate TTL settings, policy structure, and monitoring/alerting thresholds.

Common Pitfalls

  • Starting too big: Deploy SPIRE for one critical service first, then expand
  • Ignoring day-two operations: Monitoring and runbooks are not optional
  • Over-complicated policies: Start with broad allow rules, tighten incrementally
  • Not testing failure modes: What happens when SPIRE Server is down? Test it.

Real world

Where this shows up

  • Multi-cluster production deployment with HA and federation
  • Complete zero trust stack: identity + encryption + authorization
  • Production-style monitoring and incident response
  • Threat modeling: simulating a compromised service and verifying containment

Common mistakes

What usually breaks

  • Building everything at once instead of layering: identity first, then encryption, then authorization
  • Not testing failure scenarios: what happens when SPIRE Server goes down?
  • Skipping monitoring — deploying without dashboards means flying blind
  • Not documenting the architecture decisions for your team

Think like an engineer

Questions to answer before shipping

  • How would you present this architecture to a VP of Engineering for approval?
  • What is the total cost of running this stack? (compute, storage, operational overhead)
  • How would you migrate an existing service mesh to use SPIRE instead of built-in CA?
  • What compliance frameworks (SOC 2, PCI-DSS, HIPAA) does this architecture help satisfy?

Labs

Hands-on labs

Capstone: Build a Zero Trust Kubernetes Platform

Deploy the complete zero trust stack end-to-end.

  1. Create two Kind clusters for production and staging
  2. Deploy SPIRE Server (HA) and Agents on both clusters
  3. Deploy a microservice application with Envoy sidecars
  4. Configure OPA policies for service authorization
  5. Set up SPIFFE federation between clusters
  6. Deploy Prometheus monitoring with SPIRE dashboards
  7. Test by simulating a compromised service attempting unauthorized access
View lab on GitHub

Recap

Key takeaways

  • Zero trust is a system: identity (SPIRE) + encryption (Envoy) + authorization (OPA)
  • Start with one critical service path and expand incrementally
  • Document your trust domain schema, SPIFFE ID naming, and policy structure
  • Test failure modes: expired certs, server downtime, policy misconfiguration
  • This reference architecture is your template for production deployments

Related resources

Keep learning across CodersSecret