Module 8 of 12

Distributed Security & Zero Trust

How modern distributed systems authenticate workload-to-workload - mTLS, SPIFFE/SPIRE, OPA, and the Zero Trust patterns that replace network-perimeter security.

5 hours3 labsFree

Watch as Slides Course overview Lab code

Start here

Learning objectives

Explain Zero Trust as an architectural principle, not a product
Bootstrap mTLS between services with short-lived, automatically-rotated credentials
Use SPIFFE/SPIRE to issue cryptographic workload identity at scale
Enforce authorization with OPA / Rego at admission and at request time
Federate trust across clusters and clouds without leaking secrets

Before

Long-lived shared secrets in env vars; rotation is manual and slow
Trust based on network location (“it's inside the VPC, so it's safe”)
Authorization rules scattered in service code; impossible to audit
No federation across clusters; either VPN tunnels or shared global secret manager

After

Short-lived SVIDs auto-rotated by SPIRE; leak window measured in minutes
Authentication based on cryptographic identity, not network position
Centralised authorization as Rego policy in version control + CI
SPIFFE federation via bundle endpoints; cross-cluster identity flows automatically

The classical security model assumed a trusted internal network behind a firewall. That assumption broke the moment one application talked to another over the internet, and it broke entirely with cloud-native architectures where workloads spin up and down across clusters, regions, and clouds in seconds. Zero Trust is the response: do not trust any caller based on network position; verify identity, posture, and policy on every request.

This module is the load-bearing security wall of distributed-systems engineering. After this you should be able to design how every internal API call authenticates, authorises, and audits itself, even across cluster and cloud boundaries.

Zero Trust in One Sentence

“Never trust, always verify, assume breach.” That is the operational summary. The architectural translation: every caller has a cryptographically verifiable identity; every authorization decision uses that identity plus context; every channel is encrypted; every action is logged; and the system is designed so a compromised component does not give the attacker the keys to the kingdom.

mTLS - The Secure Channel

Mutual TLS is the foundation: both client and server present certificates and verify each other's identity. Unlike server-side TLS (where only the server is identified), mTLS gives you bidirectional cryptographic identity on every connection.

The catch: mTLS is hard at scale because of credential management. Long-lived certificates leak, get committed to git, and never rotate. Short-lived certificates require an identity issuance system. That system is what SPIFFE/SPIRE provides.

SPIFFE / SPIRE

SPIFFE (Secure Production Identity Framework For Everyone) is a CNCF specification defining a universal format for workload identity:

SPIFFE ID: a URI like spiffe://example.com/ns/orders/sa/orders-api that uniquely names a workload.
SVID (SPIFFE Verifiable Identity Document): a cryptographic document (X.509 certificate or JWT) that proves the holder owns the SPIFFE ID. Short-lived (minutes to an hour) and auto-rotated.
Workload API: a Unix-socket API workloads use to fetch their current SVID. No application code touches secrets directly.

SPIRE is the reference implementation: a SPIRE Server issues SVIDs after a SPIRE Agent attests the workload via selectors (Kubernetes namespace, ServiceAccount, container image hash, etc.). The result: every workload has a unique cryptographic identity, automatically issued and rotated, with no shared secrets.

The free Mastering SPIFFE & SPIRE course goes 13 modules deep on this topic. This module gives you the architectural picture; that course gives you the deployment.

Authorization with OPA

Authentication answers “who is calling?”; authorization answers “is this caller allowed to do this?”. OPA (Open Policy Agent) is the CNCF-graduated policy engine that lets you express authz rules as code (in the Rego language), evaluate them at admission time (Kubernetes admission webhook, Kyverno) or at request time (Envoy ext_authz, application middleware).

Sample Rego rule: “a workload from spiffe://example.com/ns/billing/sa/charger may call POST /charges if its tenant_id matches the charge's tenant_id”. The rule lives in version control, runs in CI, ships independently of application code.

API Security

For external API security - user authentication, token formats, OAuth, JWT - the patterns are different. Module 9 of the Cloud Native Security Engineering course covers these. The API Attack & Defense Simulator is the hands-on exercise.

For service-to-service inside your infrastructure: SPIFFE workload identity + mTLS + OPA authz is the production architecture. For human-to-API: OAuth + JWT + scope-based policy. The two patterns coexist; do not blur them.

Federation Across Trust Domains

Multi-cluster and multi-cloud distributed systems need workloads in one cluster to authenticate workloads in another. SPIFFE federation is the mechanism: each trust domain (cluster) exposes its trust bundle via a bundle endpoint; federated peers fetch and trust each other's bundles. SVIDs issued in one cluster are verifiable in another.

This is how you build cross-cluster service-to-service security without VPNs, shared secrets, or per-cluster identity sprawl. The Zero Trust Network Builder simulator walks through SPIFFE federation scenarios in production form.

Operational Practice

Issue SVIDs valid for 1 hour or less; rotate automatically; never let credentials accumulate validity beyond what an attacker could exploit.
Authorization decisions log every allow/deny with the principal's SPIFFE ID; this is your audit trail.
Default-deny at the policy layer; explicit allow rules for known patterns; everything else rejected.
Treat the workload identity provider (SPIRE) as a tier-0 dependency; HA cluster, backups, tested restoration.

mTLS Handshake Sequence

Self-Check Quiz

You issue SVIDs valid for 24 hours. The security team objects. Why? (Answer: the longer the validity, the larger the blast radius if a credential leaks. Industry default for SPIFFE SVIDs is 1 hour with 30-min rotation. Short-lived = self-healing.)
Your OPA policy denies a request. The application returns 500. What is wrong? (Answer: should return 403. 500 is “something broke”; 403 is “policy denied”. The distinction matters for triage.)
How do you authorise “only orders-service can call payments-service” in OPA? (Answer: input.peer.spiffe_id == "spiffe://example.com/ns/orders/sa/orders-svc" - or use a path-prefix match for groups of allowed callers.)
SPIFFE federation between two clusters fails 24 hours after rotation. What happened? (Answer: stale trust bundle. The federation peer needs to refresh from the bundle endpoint regularly. Static bundle copies always fail this way.)
Your service mesh (Istio) provides automatic mTLS. Do you still need SPIFFE? (Answer: Istio uses SPIFFE-style identity internally; explicit SPIFFE/SPIRE is needed for non-mesh workloads, federation across clusters, or richer authz.)

For implementation depth, take the free Mastering SPIFFE & SPIRE course. Reference the glossary on key primitives: SPIFFE, SPIRE, SVID, mTLS, workload identity, Zero Trust, OPA, and service mesh. The SPIFFE/SPIRE cheatsheet, OPA / Rego cheatsheet, and API Security cheatsheet are the operational quick references. Practice with the Zero Trust Network Builder.

Real world

Where this shows up

Bloomberg, Pinterest, Anthem, and Yahoo all run SPIRE in production for service identity at scale.
Netflix uses an internal SPIFFE-style identity system across thousands of services.
Most service meshes (Istio, Linkerd) implement SPIFFE-style identity internally even when not labelled as such.
Open Policy Agent powers Kubernetes admission control for thousands of organisations via Kyverno or Gatekeeper.

Production notes

Keep these close

Issue SVIDs valid for 1 hour or less; rotate automatically. Long-lived credentials are accumulated risk.
Default-deny at the policy layer; explicit allow rules; everything else rejected.
Treat SPIRE Server as tier-0: HA, KMS-backed encryption at rest, tested restoration runbook.
Log every authz decision with the principal's SPIFFE ID. That log is your audit trail.

Common mistakes

What usually breaks

Long-lived (24h+) certificates as a “safety margin”. The opposite is true - longer = larger blast radius if leaked.
OPA policies returning HTTP 500 on deny instead of 403. Triage gets confused; production stays on fire.
Substring matching on SPIFFE IDs (<code>strings.Contains(id, "orders")</code>) instead of structured comparison. Trivial to bypass.
Static trust-bundle copies for federation. Become stale at the next CA rotation.

Security risks

Threats to watch

Long-lived shared secrets are accumulating risk. Every leak compounds.
OPA policies are code - they need code review, CI, version control. Untested Rego is worse than no policy.
SPIFFE federation across mutually-untrusted clusters requires careful trust-bundle handling. Static copies leak credentials slowly.
Workload identity provider becomes the most-attacked component. Treat its operational hardening like the database tier.

Tradeoffs

Design choices you should be able to defend

Service-mesh-managed mTLS (Istio, Linkerd)

Pros

Zero application changes
Automatic rotation
Policy via mesh CRDs

Cons

Sidecar latency
Mesh operational complexity

SPIFFE/SPIRE direct integration

Pros

Works for non-mesh workloads
Cross-cluster federation
Richer authz options

Cons

Application code changes
Operate SPIRE

Long-lived secrets + manual rotation

Pros

No new infra

Cons

Accumulating risk
Manual rotation always lags
Wide blast radius on leak

Alternatives

Other production approaches

SPIFFE / SPIRE (vendor-neutral)

CNCF spec + reference implementation; works for non-mesh workloads, federations, K8s-or-not.

Istio mesh-managed identity

SPIFFE-style identity hidden inside Istio; simpler if you already run Istio.

Linkerd identity

Built-in mTLS using Linkerd's identity service; simplest mesh option.

AWS IAM Roles Anywhere / GCP Workload Identity Federation

Cloud-native identity for workloads outside Kubernetes; less portable.

Vault PKI engine

HashiCorp Vault as a CA for short-lived certs; works without SPIFFE conventions.

Think like an engineer

Questions to answer before shipping

Treat workload identity as your most-attacked component. Operate it with the rigor you give the database tier.
For every service-to-service call ask: who is calling, with what identity, against what policy, and where is the audit log?
Authorization rules in version-controlled code (Rego) beats authorization rules in service code; CI catches regressions.

Key terms

Vocabulary used in this module

Zero Trust

Security model that drops the assumption of a trusted internal network; verifies every request.

mTLS

Mutual TLS; both client and server authenticate via certificates.

SPIFFE

CNCF spec defining a universal workload identity format (SPIFFE ID + SVID).

SPIRE

CNCF reference implementation of SPIFFE; issues SVIDs after attesting workloads.

OPA

Open Policy Agent; CNCF policy engine for declarative authorization in Rego.

Labs

Hands-on labs

120 minutesIntermediate

Lab 8.1 - mTLS Between Two Services with SPIFFE

Deploy two services on Kubernetes; bootstrap mTLS using SPIRE-issued SVIDs.

Install SPIRE on kind cluster
Register workloads with SPIRE selectors
Implement mTLS server using go-spiffe
Verify peer identity on every connection

View lab on GitHub

90 minutesAdvanced

Lab 8.2 - OPA Authorization at Envoy

Add OPA ext_authz to Envoy; enforce SPIFFE-ID-based access policy.

Deploy Envoy + OPA sidecar pattern
Write Rego policy: only orders-api can call payments-api
Send authorized and unauthorized calls; verify deny path

View lab on GitHub

120 minutesAdvanced

Lab 8.3 - SPIFFE Federation Across Two Clusters

Stand up two kind clusters; federate trust; have a workload in cluster A authenticate to a workload in cluster B.

Stand up two kind clusters
Install SPIRE in each with distinct trust domains
Configure bundle endpoint exchange
Cross-cluster mTLS verified by SPIFFE ID

View lab on GitHub

Recap

Key takeaways

Zero Trust is an architectural principle: never trust caller location, always verify identity
mTLS gives bidirectional cryptographic identity; SPIFFE/SPIRE makes it scalable
Workload identity replaces shared secrets and long-lived credentials
OPA / Rego puts authorization policy into version control and CI
Federation extends Zero Trust across clusters and clouds without identity sprawl

Related resources

Distributed Security & Zero Trust

Learning objectives

Zero Trust in One Sentence

mTLS - The Secure Channel

SPIFFE / SPIRE

Authorization with OPA

API Security

Federation Across Trust Domains

Operational Practice

mTLS Handshake Sequence

Self-Check Quiz

Where this shows up

Keep these close

What usually breaks

Threats to watch

Design choices you should be able to defend

Service-mesh-managed mTLS (Istio, Linkerd)

SPIFFE/SPIRE direct integration

Long-lived secrets + manual rotation

Other production approaches

SPIFFE / SPIRE (vendor-neutral)

Istio mesh-managed identity

Linkerd identity

AWS IAM Roles Anywhere / GCP Workload Identity Federation

Vault PKI engine

Questions to answer before shipping

Vocabulary used in this module

Zero Trust

mTLS

SPIFFE

SPIRE

OPA

Hands-on labs

Lab 8.1 - mTLS Between Two Services with SPIFFE

Lab 8.2 - OPA Authorization at Envoy

Lab 8.3 - SPIFFE Federation Across Two Clusters

Key takeaways

Keep learning across CodersSecret

Related guides

Cheatsheets

Interactive labs

Glossary terms