Production Reference

Service Mesh Cheatsheet (Istio / Envoy)

Operational reference for Istio service mesh and Envoy proxy. mTLS configuration, AuthorizationPolicy patterns, traffic management, and the diagnostic commands that actually pinpoint why a request was rejected.

Command-firstProduction notesSecurity warningsHardened patterns

Mesh-wide mTLS

4 commands
PeerAuthentication mode: STRICT

Require mTLS on all incoming requests. The recommended production default once mesh adoption is complete.

Warning: Roll out via PERMISSIVE first to validate every workload talks mTLS, then promote to STRICT. Going straight to STRICT breaks any workload not yet in the mesh.

PeerAuthentication mode: PERMISSIVE

Accept both mTLS and plaintext. The migration mode while you onboard services.

Production note: Use kiali or Istio metrics to verify every connection is mTLS before promoting to STRICT.

PeerAuthentication mode: DISABLE

Don't do mTLS. Used to selectively exempt workloads that cannot speak mTLS (e.g. health probes).

Warning: Each DISABLE creates a hole in the security model. Document and review these regularly.

apiVersion: security.istio.io/v1 kind: PeerAuthentication metadata: name: default namespace: istio-system spec: mtls: mode: STRICT

Mesh-wide STRICT. Apply in the istio-system namespace and inherit across the mesh.

Production note: Override per-namespace by applying a PeerAuthentication in that namespace; per-workload via a workloadSelector.

AuthorizationPolicy (Istio L4/L7 authz)

5 commands
apiVersion: security.istio.io/v1 kind: AuthorizationPolicy metadata: name: deny-all namespace: foo spec: {}

Empty AuthorizationPolicy = deny all by default in the namespace. Then add explicit allow rules.

Production note: Default-deny is the Zero Trust pattern. Build allowlist of who can talk to whom from there.

spec: selector: { matchLabels: { app: payments } } action: ALLOW rules: - from: - source: principals: [cluster.local/ns/marketing/sa/marketing-api]

Allow marketing-api ServiceAccount to call payments. Principal format is cluster-local SPIFFE-style ID.

Production note: For external callers (other clusters), use requestPrincipals (JWT subject) or namespaces.

rules: - to: - operation: methods: [GET] paths: [/api/v1/users/*]

L7 method/path matching. Read-only access to a path subtree.

Production note: Path matchers support * wildcard. Be careful with /* — it matches /admin/* too unless explicitly denied.

rules: - when: - key: request.auth.claims[role] values: [admin]

JWT claim-based authz. Requires a RequestAuthentication that validates the JWT first.

Production note: Combine RequestAuthentication (validates the token) + AuthorizationPolicy (uses claims for decisions).

action: AUDIT

Log the decision but don't enforce. Useful for shadow-running a new policy.

Production note: Roll out new policies as AUDIT first; promote to ALLOW/DENY after the audit log shows the expected pattern.

RequestAuthentication (JWT validation)

2 commands
apiVersion: security.istio.io/v1 kind: RequestAuthentication metadata: name: jwt-auth0 spec: jwtRules: - issuer: https://example.auth0.com/ jwksUri: https://example.auth0.com/.well-known/jwks.json audiences: [https://api.example.com]

Validate JWTs from a specific issuer. Failed tokens result in HTTP 401.

Warning: RequestAuthentication alone doesn't require a token — without an AuthorizationPolicy that demands authentication, requests with no token still pass through.

AuthorizationPolicy { rules: [- from: [- source: { requestPrincipals: ["*"] }]] }

Pair with RequestAuthentication: require any valid JWT. Anonymous requests get 403.

Production note: requestPrincipals is "issuer/subject". Use specific principals for tighter authorization.

Traffic management

4 commands
VirtualService { http: [- match: [{ headers: { x-canary: { exact: "true" } } }], route: [{ destination: { host: api, subset: canary }, weight: 100 }] }

Route by header — opt-in canary via a feature flag header.

Production note: Combine with weighted routing for percentage-based canaries: 95% baseline, 5% canary, then ramp.

DestinationRule { subsets: [{ name: canary, labels: { version: v2 } }], trafficPolicy: { connectionPool: { tcp: { maxConnections: 100 } } } }

Define subsets (versions) of a service plus connection pool limits.

Production note: Connection pool limits prevent one consumer from overloading a backend. Critical for noisy-neighbour resilience.

VirtualService { http: [- timeout: 5s, retries: { attempts: 3, perTryTimeout: 2s, retryOn: 5xx,reset,connect-failure } ] }

Timeouts and retries at the mesh layer. Centralizes resilience policy.

Warning: Retries amplify load on a struggling service. Limit attempts and use jittered backoff. Don't retry on POST without idempotency keys.

VirtualService { http: [- fault: { delay: { fixedDelay: 5s, percentage: { value: 100 } } } ] }

Fault injection — inject latency or aborts. Use in pre-production chaos testing.

Warning: Never deploy fault injection to production. Confine to staging clusters or narrow workloadSelectors in dev environments.

istioctl: diagnostics that work

6 commands
istioctl analyze

Static analysis across the mesh. Catches missing dependencies, misconfigurations, and deprecated fields.

Production note: Run in CI on every Istio config change — catches 80% of policy bugs before deploy.

istioctl proxy-status

Show every Envoy sidecar's sync state with the control plane (CDS, LDS, EDS, RDS).

Production note: Sidecars stuck in STALE/NOT SENT state mean the workload sees outdated routing — common cause of "I deployed an update but the change isn't live".

istioctl proxy-config cluster <pod>.<ns> --fqdn <svc>

See what routing the sidecar believes for a given service. The truth, vs what your config says.

Production note: When traffic isn't routing as expected, this is the diff to compare config-as-written vs config-as-pushed.

istioctl proxy-config listeners <pod>.<ns>

Show all Envoy listeners on the sidecar. Useful when ingress/egress isn't reaching the workload.

Production note: Listeners are bound to (port, IP) — mismatches between Service ports and listener ports are common.

istioctl proxy-config secret <pod>.<ns>

Show certs in the Envoy SDS cache. Confirm the SVID is current and the trust bundle is right.

Production note: When mTLS handshakes fail, this is where to start: is the cert there, is it valid, is the chain right.

istioctl experimental authz check <pod>.<ns>

Print the AuthorizationPolicy rules that apply to a specific pod. Maps from "who is denied" back to "which rule is doing it".

Production note: Debugging authz starts here — much faster than reading every namespace-scoped policy.

Envoy: when sidecar logs aren't enough

4 commands
kubectl exec <pod> -c istio-proxy -- pilot-agent request GET stats | grep <pattern>

Envoy stats endpoint. Counters for everything Envoy does.

Production note: Useful patterns: `cluster.<name>.upstream_rq_5xx` (backend 5xx counts), `listener.<addr>.downstream_cx_total` (ingress connections).

kubectl exec <pod> -c istio-proxy -- pilot-agent request GET config_dump

Dump the full Envoy config from the running sidecar.

Production note: Pipe to jq to filter: `... | jq '.configs[] | select(.["@type"] | contains("Cluster"))'` for cluster config.

kubectl exec <pod> -c istio-proxy -- curl localhost:15000/clusters

Cluster (upstream) state — endpoints, health checks, circuit breakers.

Production note: EDS endpoints showing "unhealthy" point to a real connection failure between sidecar and backend.

level: debug scope: connection,upstream,router

Bump Envoy log level for specific scopes. Use sparingly — extreme volume.

Production note: `pilot-agent request POST "logging?upstream=debug"` toggles at runtime; reset to info when done.

Hardened patterns

Common misconfigurations

The unsafe pattern, the replacement, and the reason the two are not equivalent in production.

FIXReview

Risky

# AuthorizationPolicy that "allows everything"
spec:
  action: ALLOW
  rules: []

Hardened

# Default-deny + explicit allow
# 1) Deny all in namespace:
metadata: { name: deny-all, namespace: foo }
spec: {}
# 2) Then allow specific callers:
metadata: { name: allow-payments-from-marketing, namespace: foo }
spec:
  selector: { matchLabels: { app: payments } }
  action: ALLOW
  rules:
  - from:
    - source: { principals: [cluster.local/ns/marketing/sa/marketing-api] }

Why it matters: An AuthorizationPolicy with action: ALLOW and empty rules matches nothing — but combined with no other policy, the namespace-default is "no policy = allow all". The Zero Trust pattern is empty-spec deny-all + explicit allow rules per service-to-service relationship.

FIXReview

Risky

# RequestAuthentication only — assumes auth required
kind: RequestAuthentication
spec:
  jwtRules:
  - issuer: https://auth.example.com/

Hardened

# RequestAuthentication + AuthorizationPolicy that DEMANDS the token
---
kind: RequestAuthentication
spec:
  jwtRules:
  - issuer: https://auth.example.com/
---
kind: AuthorizationPolicy
spec:
  action: ALLOW
  rules:
  - from:
    - source: { requestPrincipals: ["*"] }

Why it matters: RequestAuthentication validates JWTs but doesn't require their presence — anonymous requests still pass through. The AuthorizationPolicy with requestPrincipals: ["*"] enforces "must have a valid JWT". Both are required for actual authentication.

FIXReview

Risky

# Roll out STRICT mTLS mesh-wide on day one:
kind: PeerAuthentication
metadata: { name: default, namespace: istio-system }
spec:
  mtls: { mode: STRICT }

Hardened

# Phase 1: PERMISSIVE
spec: { mtls: { mode: PERMISSIVE } }
# (verify in Kiali: every connection is now mTLS)
# Phase 2: STRICT per-namespace as it's confirmed
metadata: { name: default, namespace: payments }
spec: { mtls: { mode: STRICT } }
# Phase 3: STRICT mesh-wide once all namespaces are confirmed

Why it matters: STRICT immediately blocks any workload not yet talking mTLS — including ingress probes from external load balancers, sidecar-less DaemonSets, and apps that haven't adopted Istio yet. PERMISSIVE accepts both, lets you verify mTLS is happening, and then promote to STRICT one namespace at a time.

Go deeper