Event-Driven Architecture Guide

In a traditional architecture, Service A calls Service B synchronously. If B is down, A fails too. If B is slow, A is slow too. Event-driven architecture breaks this coupling: Service A publishes an event, and any interested service processes it independently, at its own pace, on its own schedule.

Synchronous vs Event-Driven

# Synchronous: tight coupling
def create_order(data):
    order = save_order(data)
    payment_service.charge(order)        # Blocks. What if payment is down?
    inventory_service.reserve(order)      # Blocks. What if inventory is slow?
    email_service.send_confirmation(order) # Blocks. Email server timeout?
    return order  # Total time: sum of all calls

# Event-driven: loose coupling
def create_order(data):
    order = save_order(data)
    publish_event("order.created", order)  # Instant. Fire and forget.
    return order  # Total time: just the database write

# Consumers process independently:
# PaymentService listens for "order.created" -> charges customer
# InventoryService listens for "order.created" -> reserves items
# EmailService listens for "order.created" -> sends confirmation

Core Concepts

Event: An immutable fact that something happened (“OrderCreated”, “PaymentProcessed”, “UserRegistered”)
Producer: The service that publishes events
Consumer: A service that subscribes to and processes events
Broker: The infrastructure that routes events (Kafka, RabbitMQ, SNS/SQS)
Topic/Queue: A named channel where events are published

Kafka vs RabbitMQ

Feature	Apache Kafka	RabbitMQ
Model	Distributed log (append-only)	Message queue (consume and delete)
Retention	Configurable (days/weeks/forever)	Until consumed (or TTL)
Replay	Yes (re-read from any offset)	No (once consumed, gone)
Throughput	Millions of events/sec	Tens of thousands/sec
Ordering	Guaranteed per partition	Per queue (with single consumer)
Complexity	High (ZooKeeper/KRaft, partitions)	Low (simple to operate)
Best For	Event streaming, audit logs, high throughput	Task queues, RPC, simple pub/sub

Rule of thumb: Use RabbitMQ for task queues and simple messaging. Use Kafka when you need event replay, high throughput, or event sourcing.

Event Design

# Good event: self-contained, immutable, past tense
{
    "event_type": "order.created",
    "event_id": "evt_abc123",
    "timestamp": "2026-04-28T10:30:00Z",
    "version": 1,
    "data": {
        "order_id": "ord_456",
        "customer_id": "cust_789",
        "items": [
            {"product_id": "prod_1", "quantity": 2, "price": 29.99}
        ],
        "total": 59.98,
        "currency": "USD"
    },
    "metadata": {
        "source": "order-service",
        "correlation_id": "req_xyz"
    }
}

# Event naming conventions:
# entity.action (past tense): order.created, payment.processed, user.registered
# Include enough data that consumers do not need to call back to the producer

Event Sourcing

Instead of storing the current state, store every event that led to it. The current state is derived by replaying all events. Think of it as a Git log for your data.

# Traditional: store current state
# UPDATE accounts SET balance = 150 WHERE id = 1

# Event sourcing: store events
events = [
    {"type": "account.opened",    "data": {"balance": 0}},
    {"type": "money.deposited",   "data": {"amount": 200}},
    {"type": "money.withdrawn",   "data": {"amount": 50}},
    # Current balance: replay events -> 0 + 200 - 50 = 150
]

# Benefits:
# - Complete audit trail (when did balance change and why?)
# - Replay events to rebuild state or create new projections
# - Time travel: what was the balance on March 15?
# - Debug: replay events to reproduce any bug

# Drawbacks:
# - More complex queries (need projections for reads)
# - Event schema evolution is tricky
# - Storage grows over time (use snapshots)

CQRS: Command Query Responsibility Segregation

Separate the write model (commands) from the read model (queries). Write side handles business logic and publishes events. Read side creates optimized views for queries.

# Write side: handles commands, enforces business rules
class OrderCommandHandler:
    def handle_create_order(self, command):
        # Validate business rules
        if not inventory.has_stock(command.items):
            raise OutOfStockError()

        # Save to event store
        event = OrderCreatedEvent(
            order_id=generate_id(),
            items=command.items,
            total=calculate_total(command.items),
        )
        event_store.append(event)
        publish(event)

# Read side: optimized projections for queries
class OrderProjection:
    def on_order_created(self, event):
        # Update a denormalized read model
        db.execute("""
            INSERT INTO order_summaries (id, customer, total, status, created_at)
            VALUES (%s, %s, %s, 'pending', %s)
        """, [event.order_id, event.customer_id, event.total, event.timestamp])

    def on_payment_processed(self, event):
        db.execute("""
            UPDATE order_summaries SET status = 'paid' WHERE id = %s
        """, [event.order_id])

# Query side: fast reads from denormalized tables
def get_order_summary(order_id):
    return db.query("SELECT * FROM order_summaries WHERE id = %s", [order_id])

Dead Letter Queues

When a consumer fails to process a message after multiple retries, send it to a Dead Letter Queue (DLQ) instead of losing it or blocking the queue.

# Kafka: configure DLQ with error handling
from confluent_kafka import Consumer, Producer

dlq_producer = Producer({'bootstrap.servers': 'kafka:9092'})

def process_with_dlq(message, max_retries=3):
    for attempt in range(max_retries):
        try:
            process_event(message.value())
            return  # Success
        except TemporaryError:
            time.sleep(2 ** attempt)  # Exponential backoff
        except PermanentError as e:
            break  # Skip retries for permanent failures

    # All retries failed: send to DLQ
    dlq_producer.produce(
        'order-events.dlq',
        key=message.key(),
        value=message.value(),
        headers={
            'original-topic': message.topic(),
            'error': str(e),
            'retry-count': str(max_retries),
        }
    )

# Monitor DLQ: alert when messages appear
# Process DLQ: fix the issue, then replay messages

Idempotency: Processing Events Safely

Events can be delivered more than once (network retries, consumer restarts). Your consumers must be idempotent - processing the same event twice should produce the same result.

# Idempotent consumer using event_id deduplication
class IdempotentConsumer:
    def process(self, event):
        event_id = event['event_id']

        # Check if already processed
        if db.exists("SELECT 1 FROM processed_events WHERE event_id = %s", [event_id]):
            return  # Already processed, skip

        # Process the event
        handle_order_created(event['data'])

        # Mark as processed (in the same transaction!)
        db.execute(
            "INSERT INTO processed_events (event_id, processed_at) VALUES (%s, NOW())",
            [event_id]
        )
        db.commit()

# Alternative: use database constraints
# INSERT INTO payments (order_id, amount) VALUES (%s, %s)
# ON CONFLICT (order_id) DO NOTHING;
# The unique constraint on order_id prevents duplicate payments

Migrating from Monolith to Event-Driven

Start with domain events inside the monolith: Emit events from your existing code without changing architecture
Add an event bus (even in-process): Replace direct function calls with event publishing
Extract one consumer at a time: Move email sending to a separate service that consumes events
Introduce a broker (Kafka/RabbitMQ): Replace the in-process event bus with external messaging
Extract more services gradually: Each extraction is independent and reversible

Key Takeaways

Events decouple services in time and availability - producers and consumers do not need to be online simultaneously
Use Kafka for event streaming, RabbitMQ for task queues - Kafka retains events, RabbitMQ deletes after consumption
Events should be self-contained - include enough data that consumers never need to call back to the producer
Dead letter queues prevent data loss - failed messages go to DLQ for later investigation, not the void
Consumers must be idempotent - use event_id deduplication or database constraints
Event sourcing gives you a complete audit trail - but adds complexity, so use it where the audit trail justifies the cost
Migrate incrementally - start with events inside your monolith before splitting services

Event-driven architecture is not about technology - it is about designing systems where services communicate through facts rather than commands. When Service A says “an order was created” instead of “charge this customer,” you get a system that is more resilient, more scalable, and easier to evolve. Start with the events. The architecture follows.

Event-Driven Architecture: From Monolith Events to Kafka Streams

Synchronous vs Event-Driven

Core Concepts

Kafka vs RabbitMQ

Event Design

Event Sourcing

CQRS: Command Query Responsibility Segregation

Dead Letter Queues

Idempotency: Processing Events Safely

Migrating from Monolith to Event-Driven

Key Takeaways

Stuck on implementation?

Related Production Resources

Free learning tracks

Interactive engineering labs

Production cheatsheets

Key terms

Discussion

Discussion is unavailable

Synchronous vs Event-Driven

Core Concepts

Kafka vs RabbitMQ

Event Design

Event Sourcing

CQRS: Command Query Responsibility Segregation

Dead Letter Queues

Idempotency: Processing Events Safely

Migrating from Monolith to Event-Driven

Key Takeaways

Stuck on implementation?

Related Production Resources

Free learning tracks

Interactive engineering labs

Production cheatsheets

Key terms

Discussion

Discussion is unavailable

Continue Reading

OAuth2 Private Key JWT: Build Client Authentication Without Shared Secrets

Distributed Systems Algorithms: Consensus, Replication, and Coordination at Production Scale