Single-server SPIRE works for development. Production requires high availability, multi-cluster trust, and sometimes hierarchical deployments. This module covers the advanced architectures that large organizations deploy.
High Availability SPIRE
A single SPIRE Server is a single point of failure. HA SPIRE uses multiple server replicas with a shared database (PostgreSQL or MySQL), a load balancer in front of the server API, and leader election for CA operations.
# HA SPIRE Server configuration:
server {
bind_address = "0.0.0.0"
bind_port = "8081"
trust_domain = "example.org"
# Shared datastore (not SQLite!)
data_dir = "/run/spire/data"
}
plugins {
DataStore "sql" {
plugin_data {
database_type = "postgres"
connection_string = "host=db.internal dbname=spire sslmode=verify-full"
}
}
}
Nested SPIRE
Nested SPIRE creates a hierarchy where a child SPIRE Server gets its CA certificate from a parent SPIRE Server. This is useful for multi-team deployments where each team manages their own SPIRE Server, organizations with multiple environments (dev/staging/prod), and compliance requirements that mandate separate CA hierarchies.
SPIFFE Federation
Federation allows workloads in different trust domains to verify each other’s identities. Each SPIRE Server shares its trust bundle with the other, enabling cross-domain mTLS.
# Server A configuration (federates with cluster-b):
server {
trust_domain = "cluster-a.company.org"
federation {
bundle_endpoint {
address = "0.0.0.0"
port = 8443
}
}
}
# Register the federated trust domain:
spire-server bundle set -id spiffe://cluster-b.company.org \
-path /path/to/cluster-b-bundle.pem
Multi-Cloud Architectures
SPIRE works across AWS, GCP, Azure, and on-premise because identity is based on attestation, not cloud-specific constructs. Each environment has its own attestation plugins but all participate in the same trust domain (or federate across domains).
Migration Strategy: Adopting SPIFFE Incrementally
Most companies cannot switch to SPIFFE overnight. The proven migration path:
- Phase 1 — Deploy SPIRE alongside existing identity: Run SPIRE in parallel without changing any service. Just get SVIDs flowing.
- Phase 2 — Enable mTLS on one critical path: Pick one service-to-service connection (e.g., API → database proxy). Add SPIRE-based mTLS. Keep the old auth as fallback.
- Phase 3 — Expand incrementally: Service by service, switch from shared secrets to SVID-based authentication. Each switch is independent and reversible.
- Phase 4 — Remove legacy auth: Once all services use SVIDs, remove the old shared secrets, API keys, and static certificates.
- Phase 5 — Add authorization: Deploy OPA policies for fine-grained access control on top of the identity layer.
Key principle: coexistence, not replacement. SPIRE can run alongside existing PKI, Vault, and service mesh CAs during migration. You do not need to rip and replace.
Incident Thinking: What Happens If...
- SPIRE Server fails? Agents cache SVIDs locally. Existing workloads continue with cached certificates until TTL expires. New workloads cannot get SVIDs until the server recovers. This is why HA is critical.
- Datastore becomes unavailable? Server cannot create or modify registration entries but continues serving cached entries. Recovery requires datastore restoration.
- Trust bundle expires? All SVID verification fails across the trust domain. This is a catastrophic event — monitor CA TTL and rotate well before expiry.
- Federation breaks? Cross-cluster communication fails but intra-cluster communication continues. Each trust domain is independent.
- Compromised agent issues rogue SVIDs? The agent can only issue SVIDs for registered workloads on its node. Blast radius is limited to that node. Revoke the agent’s attestation to stop it.