Service Mesh Cheatsheet (Istio / Envoy)
Operational reference for Istio service mesh and Envoy proxy. mTLS configuration, AuthorizationPolicy patterns, traffic management, and the diagnostic commands that actually pinpoint why a request was rejected.
Mesh-wide mTLS
4 commandsPeerAuthentication mode: STRICTRequire mTLS on all incoming requests. The recommended production default once mesh adoption is complete.
Warning: Roll out via PERMISSIVE first to validate every workload talks mTLS, then promote to STRICT. Going straight to STRICT breaks any workload not yet in the mesh.
PeerAuthentication mode: PERMISSIVEAccept both mTLS and plaintext. The migration mode while you onboard services.
Production note: Use kiali or Istio metrics to verify every connection is mTLS before promoting to STRICT.
PeerAuthentication mode: DISABLEDon't do mTLS. Used to selectively exempt workloads that cannot speak mTLS (e.g. health probes).
Warning: Each DISABLE creates a hole in the security model. Document and review these regularly.
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: STRICTMesh-wide STRICT. Apply in the istio-system namespace and inherit across the mesh.
Production note: Override per-namespace by applying a PeerAuthentication in that namespace; per-workload via a workloadSelector.
AuthorizationPolicy (Istio L4/L7 authz)
5 commandsapiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: deny-all
namespace: foo
spec: {}Empty AuthorizationPolicy = deny all by default in the namespace. Then add explicit allow rules.
Production note: Default-deny is the Zero Trust pattern. Build allowlist of who can talk to whom from there.
spec:
selector: { matchLabels: { app: payments } }
action: ALLOW
rules:
- from:
- source:
principals: [cluster.local/ns/marketing/sa/marketing-api]Allow marketing-api ServiceAccount to call payments. Principal format is cluster-local SPIFFE-style ID.
Production note: For external callers (other clusters), use requestPrincipals (JWT subject) or namespaces.
rules:
- to:
- operation:
methods: [GET]
paths: [/api/v1/users/*]L7 method/path matching. Read-only access to a path subtree.
Production note: Path matchers support * wildcard. Be careful with /* — it matches /admin/* too unless explicitly denied.
rules:
- when:
- key: request.auth.claims[role]
values: [admin]JWT claim-based authz. Requires a RequestAuthentication that validates the JWT first.
Production note: Combine RequestAuthentication (validates the token) + AuthorizationPolicy (uses claims for decisions).
action: AUDITLog the decision but don't enforce. Useful for shadow-running a new policy.
Production note: Roll out new policies as AUDIT first; promote to ALLOW/DENY after the audit log shows the expected pattern.
RequestAuthentication (JWT validation)
2 commandsapiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
name: jwt-auth0
spec:
jwtRules:
- issuer: https://example.auth0.com/
jwksUri: https://example.auth0.com/.well-known/jwks.json
audiences: [https://api.example.com]Validate JWTs from a specific issuer. Failed tokens result in HTTP 401.
Warning: RequestAuthentication alone doesn't require a token — without an AuthorizationPolicy that demands authentication, requests with no token still pass through.
AuthorizationPolicy { rules: [- from: [- source: { requestPrincipals: ["*"] }]] }Pair with RequestAuthentication: require any valid JWT. Anonymous requests get 403.
Production note: requestPrincipals is "issuer/subject". Use specific principals for tighter authorization.
Traffic management
4 commandsVirtualService { http: [- match: [{ headers: { x-canary: { exact: "true" } } }], route: [{ destination: { host: api, subset: canary }, weight: 100 }] }Route by header — opt-in canary via a feature flag header.
Production note: Combine with weighted routing for percentage-based canaries: 95% baseline, 5% canary, then ramp.
DestinationRule { subsets: [{ name: canary, labels: { version: v2 } }], trafficPolicy: { connectionPool: { tcp: { maxConnections: 100 } } } }Define subsets (versions) of a service plus connection pool limits.
Production note: Connection pool limits prevent one consumer from overloading a backend. Critical for noisy-neighbour resilience.
VirtualService { http: [- timeout: 5s, retries: { attempts: 3, perTryTimeout: 2s, retryOn: 5xx,reset,connect-failure } ] }Timeouts and retries at the mesh layer. Centralizes resilience policy.
Warning: Retries amplify load on a struggling service. Limit attempts and use jittered backoff. Don't retry on POST without idempotency keys.
VirtualService { http: [- fault: { delay: { fixedDelay: 5s, percentage: { value: 100 } } } ] }Fault injection — inject latency or aborts. Use in pre-production chaos testing.
Warning: Never deploy fault injection to production. Confine to staging clusters or narrow workloadSelectors in dev environments.
istioctl: diagnostics that work
6 commandsistioctl analyzeStatic analysis across the mesh. Catches missing dependencies, misconfigurations, and deprecated fields.
Production note: Run in CI on every Istio config change — catches 80% of policy bugs before deploy.
istioctl proxy-statusShow every Envoy sidecar's sync state with the control plane (CDS, LDS, EDS, RDS).
Production note: Sidecars stuck in STALE/NOT SENT state mean the workload sees outdated routing — common cause of "I deployed an update but the change isn't live".
istioctl proxy-config cluster <pod>.<ns> --fqdn <svc>See what routing the sidecar believes for a given service. The truth, vs what your config says.
Production note: When traffic isn't routing as expected, this is the diff to compare config-as-written vs config-as-pushed.
istioctl proxy-config listeners <pod>.<ns>Show all Envoy listeners on the sidecar. Useful when ingress/egress isn't reaching the workload.
Production note: Listeners are bound to (port, IP) — mismatches between Service ports and listener ports are common.
istioctl proxy-config secret <pod>.<ns>Show certs in the Envoy SDS cache. Confirm the SVID is current and the trust bundle is right.
Production note: When mTLS handshakes fail, this is where to start: is the cert there, is it valid, is the chain right.
istioctl experimental authz check <pod>.<ns>Print the AuthorizationPolicy rules that apply to a specific pod. Maps from "who is denied" back to "which rule is doing it".
Production note: Debugging authz starts here — much faster than reading every namespace-scoped policy.
Envoy: when sidecar logs aren't enough
4 commandskubectl exec <pod> -c istio-proxy -- pilot-agent request GET stats | grep <pattern>Envoy stats endpoint. Counters for everything Envoy does.
Production note: Useful patterns: `cluster.<name>.upstream_rq_5xx` (backend 5xx counts), `listener.<addr>.downstream_cx_total` (ingress connections).
kubectl exec <pod> -c istio-proxy -- pilot-agent request GET config_dumpDump the full Envoy config from the running sidecar.
Production note: Pipe to jq to filter: `... | jq '.configs[] | select(.["@type"] | contains("Cluster"))'` for cluster config.
kubectl exec <pod> -c istio-proxy -- curl localhost:15000/clustersCluster (upstream) state — endpoints, health checks, circuit breakers.
Production note: EDS endpoints showing "unhealthy" point to a real connection failure between sidecar and backend.
level: debug scope: connection,upstream,routerBump Envoy log level for specific scopes. Use sparingly — extreme volume.
Production note: `pilot-agent request POST "logging?upstream=debug"` toggles at runtime; reset to info when done.
Common misconfigurations
The unsafe pattern, the replacement, and the reason the two are not equivalent in production.
Risky
# AuthorizationPolicy that "allows everything"
spec:
action: ALLOW
rules: []Hardened
# Default-deny + explicit allow
# 1) Deny all in namespace:
metadata: { name: deny-all, namespace: foo }
spec: {}
# 2) Then allow specific callers:
metadata: { name: allow-payments-from-marketing, namespace: foo }
spec:
selector: { matchLabels: { app: payments } }
action: ALLOW
rules:
- from:
- source: { principals: [cluster.local/ns/marketing/sa/marketing-api] }Why it matters: An AuthorizationPolicy with action: ALLOW and empty rules matches nothing — but combined with no other policy, the namespace-default is "no policy = allow all". The Zero Trust pattern is empty-spec deny-all + explicit allow rules per service-to-service relationship.
Risky
# RequestAuthentication only — assumes auth required
kind: RequestAuthentication
spec:
jwtRules:
- issuer: https://auth.example.com/Hardened
# RequestAuthentication + AuthorizationPolicy that DEMANDS the token
---
kind: RequestAuthentication
spec:
jwtRules:
- issuer: https://auth.example.com/
---
kind: AuthorizationPolicy
spec:
action: ALLOW
rules:
- from:
- source: { requestPrincipals: ["*"] }Why it matters: RequestAuthentication validates JWTs but doesn't require their presence — anonymous requests still pass through. The AuthorizationPolicy with requestPrincipals: ["*"] enforces "must have a valid JWT". Both are required for actual authentication.
Risky
# Roll out STRICT mTLS mesh-wide on day one:
kind: PeerAuthentication
metadata: { name: default, namespace: istio-system }
spec:
mtls: { mode: STRICT }Hardened
# Phase 1: PERMISSIVE
spec: { mtls: { mode: PERMISSIVE } }
# (verify in Kiali: every connection is now mTLS)
# Phase 2: STRICT per-namespace as it's confirmed
metadata: { name: default, namespace: payments }
spec: { mtls: { mode: STRICT } }
# Phase 3: STRICT mesh-wide once all namespaces are confirmedWhy it matters: STRICT immediately blocks any workload not yet talking mTLS — including ingress probes from external load balancers, sidecar-less DaemonSets, and apps that haven't adopted Istio yet. PERMISSIVE accepts both, lets you verify mTLS is happening, and then promote to STRICT one namespace at a time.
Related learning paths
Cloud Native Security Engineering — Service Mesh module
Module: build and secure a service mesh from first principles.
ContinueSPIFFE/SPIRE integration with service mesh
How SPIRE-issued SVIDs power Istio/Envoy mTLS.
ContinueService mesh glossary entry
What service mesh is and what it solves.
ContinuemTLS glossary entry
Mutual TLS — the universal handshake for service-to-service trust.
Continue