-Read a TCP/IP packet flow and explain what each layer does in production
-Compare HTTP/1.1, HTTP/2, and gRPC and pick the right one per workload
-Implement service discovery without inventing a worse DNS
-Design retry, timeout, and load-balancing policies that survive load
-Diagnose and prevent retry storms before they cause outages
Before
-No timeouts on outbound calls; one slow downstream stalls the entire service
-Naive retry policies that amplify failures during brownouts
-DNS caching tuned by accident; outages from stale entries
-Long-running HTTP/1.1 connections; handshake overhead on every cold call
After
+Per-call timeouts at every layer, shorter than caller's deadline
+Retries with exponential backoff + jitter + budget; no amplification
+NodeLocal DNSCache + short TTLs + DNS error rate alerted
+gRPC over HTTP/2 with connection pooling; handshake cost amortised
Two services talking is not one network call. It is, on a typical Kubernetes cluster, a TLS handshake, a DNS lookup, a service-mesh sidecar interception, an L4 load-balancer pick, an L7 retry policy, an actual HTTP/2 stream, deserialization on the receiver, and an audit log on the way back. Most distributed-systems incidents are not algorithm bugs — they are network bugs that look like algorithm bugs.
This module unpacks the stack so the next time your p99 latency doubles you know which layer to suspect.
TCP/IP — What Lives Underneath
The two-line summary every backend engineer needs: TCP gives you reliability (ordered delivery, retransmission, flow control) and connection state (the three-way handshake takes 1 RTT before the first byte of payload). IP gives you routing (each packet finds its way through a graph of routers without the endpoints knowing the path).
The TCP three-way handshake (SYN → SYN/ACK → ACK) is the per-connection latency floor. TLS adds another 1–2 RTTs for the handshake. So the first request on a fresh connection costs you 2–3 RTTs of pure overhead before anything useful happens. That is why connection pooling matters: amortise the handshake cost across many requests on the same connection.
HTTP/1.1 vs HTTP/2 vs gRPC — What Each Buys You
HTTP/1.1: text-based request/response, head-of-line blocking on a single connection. Workaround: clients open many connections in parallel. Still the right answer for cacheable static content and many web APIs.
HTTP/2: binary framing, multiplexed streams over a single connection (no head-of-line blocking at HTTP level), header compression (HPACK). Same one-connection-many-requests philosophy. Required by gRPC.
gRPC: an RPC framework on top of HTTP/2 with Protocol Buffers (Protobuf) serialization. Strongly-typed interfaces, code generation in many languages, streaming RPCs, deadlines built into the protocol. The de-facto choice for service-to-service communication in modern infrastructure.
The practical guidance: use gRPC for service-to-service calls in your own infra (typed contracts, low overhead, streaming when you need it). Use HTTP/JSON for external APIs (browser-callable, tool-friendly, debuggable with curl). Avoid HTTP/1.1 for internal traffic unless you have a specific reason.
Service Discovery — How Services Find Each Other
Static IP addresses do not work in a world where pods restart, scale up, or move between nodes every few minutes. Service discovery is the indirection: clients ask “where is the orders service?” and get back the current set of healthy endpoints.
The mainstream patterns:
DNS-based — Kubernetes Services give every service a DNS name (orders.payments.svc.cluster.local) that resolves to the current Pod IPs. Simple, integrates with everything, but DNS TTL caching can lag.
Service registry — Consul, etcd, ZooKeeper. Services register on startup; clients query the registry. Fast updates; richer metadata (datacentre, health, weights).
Service mesh — the sidecar (Envoy) handles discovery via xDS protocol from a control plane (Istio, Linkerd). Clients call orders as if it were local; the sidecar resolves the actual endpoints.
Load Balancing
Load balancers turn a list of endpoints into a single virtual endpoint with traffic distribution. Two layers, distinct trade-offs:
L4 load balancers (AWS NLB, kube-proxy iptables/IPVS) operate at the TCP layer. Cheap, fast, opaque to the application. Best for raw connection distribution; cannot do per-request routing.
L7 load balancers (Envoy, NGINX, AWS ALB) understand HTTP. Can do header-based routing, path matching, retries, weighted shifting, mTLS termination, observability. Add latency (~1–5ms) but unlock the production-engineering toolbox.
Algorithm choice matters. Round robin is fine for uniform endpoints; least-request handles variable backend speed (Envoy default); EWMA tracks a smoothed latency estimate and prefers fast endpoints; ring hash / consistent hash sticks the same key to the same backend (useful for cache locality).
Retries, Timeouts, and the Storm
Two rules that, applied with discipline, prevent most outages:
Every call has a timeout. Default is “wait forever” in most languages. Override it. The timeout should be shorter than your caller's timeout (so retries can fire within the deadline budget).
Every retry has exponential backoff with jitter. Wait 1s, then 2s, then 4s — with random jitter to avoid synchronising retries. AWS's “Decorrelated Jitter” algorithm is the standard.
The retry storm is the canonical anti-pattern: a backend brownout causes timeouts; clients retry; their retries push more load onto the backend; the backend cannot recover; clients keep retrying. The defence is the retry budget: cap retries at a percentage of total RPS (e.g. retries cannot exceed 10% of in-flight requests). Envoy and gRPC client libraries support this directly.
The Rate Limiting Algorithms guide covers the related primitive of capping arrival rate; combined with retry budgets, it is the front-of-house resilience kit.
Connection Pooling
HTTP/2 lets one TCP connection carry many requests. For high-throughput service-to-service calls, every client should hold an open pool of connections to each upstream and reuse them. Pool sizing rules of thumb:
Min pool: p99_concurrency × 1.2 (avoid head-of-line blocking on the bottom).
Max pool: large enough to avoid queueing under burst, small enough to not exhaust the upstream's file descriptors.
Idle timeout: 30–60s. Short enough to recover from broken connections; long enough to amortise handshake.
DNS as Distributed-Systems Risk
DNS is the cause of more “unexplained” outages than any other piece of distributed-systems infrastructure. Common failure modes:
Resolver caches a stale entry; service is moved; clients call dead endpoints for the TTL window.
Coredns / kube-dns hits a query-rate limit; lookups time out; entire mesh stalls.
External resolver (8.8.8.8) is unreachable; in-cluster lookups slow because of fallback chains.
Mitigations: short TTLs for in-cluster names (5–30s), use NodeLocal DNSCache on Kubernetes, monitor DNS error rate as a first-class metric, prefer service-mesh discovery (sidecar handles endpoint changes via xDS, no DNS in the data path). For Kubernetes-specific networking patterns reach for the Kubernetes cheatsheet; for service-mesh pattern reference the Service Mesh cheatsheet; for API gateway / external API security the API Security cheatsheet.
TLS Handshake Sequence
DNS Resolution Flow on Kubernetes
Load Balancer Architecture
Retry Storm Propagation
Self-Check Quiz
Why does HTTP/1.1 lead to many TCP connections per browser tab while HTTP/2 needs only one? (Answer: HTTP/1.1 has head-of-line blocking on a single connection; clients open multiple connections to parallelise. HTTP/2 multiplexes streams over one connection.)
You set retries: 3 on every internal call. A downstream brownout begins. What happens? (Answer: a retry storm; total RPS to the failing service is 4x normal, preventing recovery. Need a retry budget capping retries at, say, 10% of RPS.)
What is the right service-discovery pattern for a 50-service Kubernetes cluster? (Answer: Kubernetes Services for in-cluster DNS by default; service mesh sidecar via xDS for richer mesh-aware discovery if you already run a mesh.)
Your p99 latency tripled overnight. The application code did not change. Where do you look first? (Answer: DNS error rate, recent CoreDNS changes, NodeLocal DNSCache health. DNS is the biggest source of unexplained latency anomalies in Kubernetes.)
When should you choose gRPC over HTTP/JSON? (Answer: service-to-service inside your infra where you control both sides and want strongly-typed contracts; not for browser-callable APIs where HTTP/JSON wins on tooling.)
Real world
Where this shows up
-Google uses gRPC internally for nearly all service-to-service traffic (it was open-sourced from their internal Stubby framework).
-Cloudflare reduced internal latency by ~30% by moving from HTTP/1.1 to HTTP/2 with connection pooling.
-Netflix's Hystrix circuit-breaker library (now retired in favour of Resilience4j) was created after a single dependency outage cascaded the entire viewing platform.
-AWS published the “decorrelated jitter” backoff algorithm after measuring how poorly synchronised retries handled their load.
Production notes
Keep these close
!Set per-call timeouts at every layer. Default of “wait forever” in standard libraries is the source of half of all production stalls.
!Run NodeLocal DNSCache on every Kubernetes cluster. The cost is one DaemonSet; the benefit is dropping DNS off the data path.
!Treat retry policies as part of the service contract; document them and review at deployment.
Common mistakes
What usually breaks
!Setting infinite retries on a non-idempotent endpoint — one downstream blip becomes duplicate side effects everywhere.
!Not setting per-call timeouts; the system inherits the default of “wait forever”.
!Using HTTP/1.1 for high-throughput internal communication; you pay for handshakes you do not need.
Security risks
Threats to watch
!Plaintext HTTP between services exposes payloads to passive network attackers. Default to mTLS for any internal call carrying sensitive data.
!Trusting X-Forwarded-For from untrusted upstream is a rate-limit / IP-allowlist bypass vector.
!Service discovery without authentication lets a compromised pod register fake endpoints and intercept traffic.
!gRPC reflection enabled in production exposes internal API schema to anyone who can hit the endpoint.
Tradeoffs
Design choices you should be able to defend
gRPC for service-to-service
Pros
+Strong typing via Protobuf
+2-5x lower wire size than JSON
+Streaming RPCs
+Built-in deadlines
Cons
-Harder to debug than HTTP/JSON
-Browser support requires gRPC-Web
-Smaller ecosystem than REST
HTTP/JSON for service-to-service
Pros
+Universal tooling (curl, Postman)
+Browser-callable
+Easy to log/debug
Cons
-Verbose wire format
-No built-in deadlines
-Weak typing
Service mesh (Envoy/Istio)
Pros
+mTLS / retries / circuit breakers free
+Centralised policy
+Rich observability
Cons
-Sidecar latency tax (~1-3ms/hop)
-Operational complexity
-Learning curve
Alternatives
Other production approaches
gRPC
Strongly-typed RPC over HTTP/2; standard for service-to-service in modern infra.
Connect / Twirp
Lighter alternatives to gRPC with simpler tooling; trade ecosystem maturity for ergonomics.
GraphQL Federation
For client-facing APIs that need composition across many backend services.
NATS / message-based RPC
When you need request-response without TCP connection overhead; useful for IoT/edge.
Think like an engineer
Questions to answer before shipping
?Before debating gRPC vs REST, ask: who calls this API? If browsers, REST. If your own services, gRPC unless there is a reason against.
?Every retry policy is a load multiplier. Calculate the worst-case load on the downstream when every caller hits its retry cap simultaneously.
?DNS is in the data path on every request. Treat its latency and error rate as first-class metrics, not infrastructure noise.
Key terms
Vocabulary used in this module
gRPC
Open-source RPC framework using HTTP/2 + Protocol Buffers; standard for service-to-service in modern infra.
Connection pooling
Reusing TCP/HTTP connections across requests to amortise handshake cost.
Retry storm
Failure mode where retries amplify load on an already-struggling backend, preventing recovery.
Retry budget
Cap on total retries as a percentage of RPS; prevents retry storms.
Service discovery
Mechanism by which clients find healthy endpoints for a service (DNS, registry, mesh).
Labs
Hands-on labs
1
60 minutesBeginner
Lab 2.1 — gRPC vs REST Latency Bake-off
Measure real latency and throughput of gRPC vs HTTP/JSON for the same logical workload.
-Implement the same service interface as gRPC and HTTP/JSON
-Generate identical client and server code
-Run a 5-minute load test at 100 / 1000 / 5000 RPS
-Capture p50/p95/p99 latency, throughput, CPU usage