gRPC: The High-Performance RPC Framework Every Backend Developer Should Know

A practical guide to gRPC — protocol buffers, service definitions, streaming, interceptors, and why gRPC is 10x faster than REST for service-to-service communication.

gRPC: The High-Performance RPC Framework Every Backend Developer Should Know illustration
On this page10 sections

REST has been the default for APIs for over a decade, but it wasn't designed for microservices talking to each other millions of times per second. gRPC (Google Remote Procedure Call) was built exactly for this — high-throughput, low-latency, strongly-typed communication between services. It's used by Google, Netflix, Slack, Square, and most serious microservice architectures.

What is gRPC?

gRPC is an open-source RPC (Remote Procedure Call) framework that uses HTTP/2 for transport and Protocol Buffers (protobuf) for serialization. Instead of sending JSON over HTTP/1.1 like REST, gRPC sends compact binary data over multiplexed HTTP/2 connections.

gRPC vs REST at a Glance
⚡ gRPC
📦FormatProtobuf (binary)
🌐TransportHTTP/2 (multiplexed)
📝ContractStrict (.proto file)
🔄StreamingBidirectional
🚀Speed10x faster than REST
🛠Code genAuto-generated clients
VS
🌐 REST
📦FormatJSON (text)
🌐TransportHTTP/1.1
📝ContractLoose (OpenAPI optional)
🔄StreamingRequest-response only
🚀SpeedGood for most use cases
🛠Code genManual or OpenAPI gen

How gRPC Works

gRPC Architecture
📝.protoDefine service
protocGenerate code
💻ServerImplement handlers
📡HTTP/2Binary transport
📱ClientAuto-generated stub

Step 1: Define Your Service (.proto)

Everything in gRPC starts with a .proto file — the contract between client and server:

// user_service.proto
syntax = "proto3";

package users;

// Service definition — like a REST controller
service UserService {
  // Unary RPC (request-response, like a normal REST call)
  rpc GetUser (GetUserRequest) returns (User);
  rpc CreateUser (CreateUserRequest) returns (User);
  rpc ListUsers (ListUsersRequest) returns (ListUsersResponse);

  // Server streaming (server sends multiple responses)
  rpc WatchUsers (WatchRequest) returns (stream UserEvent);

  // Client streaming (client sends multiple requests)
  rpc UploadUsers (stream CreateUserRequest) returns (UploadSummary);

  // Bidirectional streaming (both send multiple messages)
  rpc Chat (stream ChatMessage) returns (stream ChatMessage);
}

// Message definitions — like JSON schemas, but typed and compact
message User {
  string id = 1;
  string name = 2;
  string email = 3;
  int32 age = 4;
  Department department = 5;
  repeated string roles = 6;  // Array of strings
  google.protobuf.Timestamp created_at = 7;
}

message GetUserRequest {
  string id = 1;
}

message CreateUserRequest {
  string name = 1;
  string email = 2;
  int32 age = 3;
  Department department = 4;
}

message ListUsersRequest {
  int32 page_size = 1;
  string page_token = 2;
  string filter = 3;  // e.g., "department=ENGINEERING"
}

message ListUsersResponse {
  repeated User users = 1;
  string next_page_token = 2;
  int32 total_count = 3;
}

enum Department {
  UNKNOWN = 0;
  ENGINEERING = 1;
  MARKETING = 2;
  SALES = 3;
  PRODUCT = 4;
}

message UserEvent {
  string event_type = 1;  // "created", "updated", "deleted"
  User user = 2;
}

message WatchRequest {
  repeated string departments = 1;
}

message UploadSummary {
  int32 created = 1;
  int32 failed = 2;
}

message ChatMessage {
  string sender = 1;
  string content = 2;
}

Step 2: Generate Code

# Install protobuf compiler and gRPC plugins

# Python
pip install grpcio grpcio-tools
python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. user_service.proto
# Generates: user_service_pb2.py (messages) + user_service_pb2_grpc.py (service stubs)

# Go
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
protoc --go_out=. --go-grpc_out=. user_service.proto

# Node.js / TypeScript
npm install @grpc/grpc-js @grpc/proto-loader
# Or use static code generation:
npm install grpc_tools_node_protoc_ts
grpc_tools_node_protoc --js_out=. --grpc_out=. --ts_out=. user_service.proto

Step 3: Implement the Server (Python)

import grpc
from concurrent import futures
import user_service_pb2 as pb2
import user_service_pb2_grpc as pb2_grpc

# In-memory database
users_db = {}

class UserServicer(pb2_grpc.UserServiceServicer):
    def GetUser(self, request, context):
        user = users_db.get(request.id)
        if not user:
            context.set_code(grpc.StatusCode.NOT_FOUND)
            context.set_details(f"User {request.id} not found")
            return pb2.User()
        return user

    def CreateUser(self, request, context):
        import uuid
        user_id = str(uuid.uuid4())
        user = pb2.User(
            id=user_id,
            name=request.name,
            email=request.email,
            age=request.age,
            department=request.department,
        )
        users_db[user_id] = user
        return user

    def ListUsers(self, request, context):
        all_users = list(users_db.values())
        return pb2.ListUsersResponse(
            users=all_users,
            total_count=len(all_users),
        )

    # Server streaming — push events to the client
    def WatchUsers(self, request, context):
        import time
        while context.is_active():
            # In production, listen to a message queue
            time.sleep(1)
            yield pb2.UserEvent(
                event_type="heartbeat",
                user=pb2.User(name="system"),
            )

def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    pb2_grpc.add_UserServiceServicer_to_server(UserServicer(), server)
    server.add_insecure_port("[::]:50051")
    server.start()
    print("gRPC server running on port 50051")
    server.wait_for_termination()

if __name__ == "__main__":
    serve()

Step 4: Use the Client

import grpc
import user_service_pb2 as pb2
import user_service_pb2_grpc as pb2_grpc

# Create a channel and stub (auto-generated client)
channel = grpc.insecure_channel("localhost:50051")
stub = pb2_grpc.UserServiceStub(channel)

# Create a user — feels like calling a local function!
user = stub.CreateUser(pb2.CreateUserRequest(
    name="Alice",
    email="alice@example.com",
    age=30,
    department=pb2.ENGINEERING,
))
print(f"Created user: {user.id} - {user.name}")

# Get a user
user = stub.GetUser(pb2.GetUserRequest(id=user.id))
print(f"Got user: {user.name}, {user.email}")

# List all users
response = stub.ListUsers(pb2.ListUsersRequest(page_size=10))
for u in response.users:
    print(f"  - {u.name} ({u.email})")

# Server streaming — watch for events
for event in stub.WatchUsers(pb2.WatchRequest(departments=["ENGINEERING"])):
    print(f"Event: {event.event_type} - {event.user.name}")
    break  # Stop after first event for demo

The 4 Types of gRPC Communication

gRPC Communication Patterns
Pattern Description Use Case
UnaryClient sends 1 request, server returns 1 responseCRUD operations, auth
Server streamingClient sends 1 request, server streams N responsesReal-time feeds, logs, events
Client streamingClient streams N requests, server returns 1 responseFile upload, batch inserts
BidirectionalBoth sides stream simultaneouslyChat, multiplayer, live collab

Performance: Why gRPC is Faster

Serialization Speed (1M messages, lower is better)
JSON
MessagePack
Protobuf
FlatBuffers
  • Binary serialization: Protobuf is 3-10x smaller than JSON and 5-20x faster to serialize/deserialize.
  • HTTP/2 multiplexing: Multiple RPCs share a single TCP connection. No head-of-line blocking.
  • Header compression (HPACK): HTTP/2 compresses headers, reducing overhead for frequent calls.
  • Streaming: Long-lived connections for real-time data — no polling, no WebSocket hacks.
  • Code generation: Generated stubs are optimized for each language — no reflection, no runtime parsing.

Interceptors (Middleware for gRPC)

# gRPC interceptors work like Express middleware or Django middleware

class AuthInterceptor(grpc.ServerInterceptor):
    def intercept_service(self, continuation, handler_call_details):
        # Extract metadata (like HTTP headers)
        metadata = dict(handler_call_details.invocation_metadata)
        token = metadata.get("authorization", "")

        if not token.startswith("Bearer "):
            return grpc.unary_unary_rpc_method_handler(
                lambda req, ctx: self._unauthenticated(ctx)
            )
        # Validate token...
        return continuation(handler_call_details)

    def _unauthenticated(self, context):
        context.abort(grpc.StatusCode.UNAUTHENTICATED, "Invalid token")

class LoggingInterceptor(grpc.ServerInterceptor):
    def intercept_service(self, continuation, handler_call_details):
        method = handler_call_details.method
        print(f"gRPC call: {method}")
        return continuation(handler_call_details)

# Add interceptors to the server
server = grpc.server(
    futures.ThreadPoolExecutor(max_workers=10),
    interceptors=[AuthInterceptor(), LoggingInterceptor()],
)

When to Use gRPC vs REST

When to Use What
What are you building?
Public API for 3rd parties?
RESTUniversal, browser-friendly
Internal microservices?
gRPCFast, typed, streaming
Real-time / streaming?
gRPCBidirectional streams

gRPC isn't a replacement for REST — it's a complement. Use REST for public APIs where simplicity and browser compatibility matter. Use gRPC for internal service-to-service communication where performance, type safety, and streaming are critical. Many companies (including Google) use both: REST at the edge, gRPC between services.

Share this article

Stuck on implementation?

Get private, 1-on-1 help with system design, performance, scaling, or any technical challenge.

Book a Session

Related Production Resources

Course

Free learning tracks

Turn this guide into a structured production engineering path.

Lab

Interactive engineering labs

Practice the same ideas through scenario-based simulators.

Reference

Production cheatsheets

Keep the operational commands and checks nearby.

Glossary

Key terms

Review the vocabulary behind the architecture.

Discussion

Questions, corrections, or production notes? Add them here so other learners can benefit.

Continue Reading

Related practical guides from the same production engineering path.

DevOps 10 min read

Why Spark Jobs Become Slow: Shuffle, Skew, Partitions, and Memory

Spark jobs usually slow down for predictable reasons: too much shuffle, skewed keys, bad partition sizing, expensive file layouts, and memory pressure. Learn how to debug each one.

Spark Data Engineering
Backend 22 min read

Distributed Systems Algorithms: Consensus, Replication, and Coordination at Production Scale

How real distributed systems agree, replicate, and coordinate. Raft and Paxos consensus, leader election in etcd and Kafka, quorum reads in Cassandra, gossip in Redis Cluster, vector clocks, CRDTs, and the consistency models that determine what your system can promise.

Distributed Systems Consensus