In a traditional architecture, Service A calls Service B synchronously. If B is down, A fails too. If B is slow, A is slow too. Event-driven architecture breaks this coupling: Service A publishes an event, and any interested service processes it independently, at its own pace, on its own schedule.

Synchronous vs Event-Driven

# Synchronous: tight coupling
def create_order(data):
    order = save_order(data)
    payment_service.charge(order)        # Blocks. What if payment is down?
    inventory_service.reserve(order)      # Blocks. What if inventory is slow?
    email_service.send_confirmation(order) # Blocks. Email server timeout?
    return order  # Total time: sum of all calls

# Event-driven: loose coupling
def create_order(data):
    order = save_order(data)
    publish_event("order.created", order)  # Instant. Fire and forget.
    return order  # Total time: just the database write

# Consumers process independently:
# PaymentService listens for "order.created" -> charges customer
# InventoryService listens for "order.created" -> reserves items
# EmailService listens for "order.created" -> sends confirmation

Core Concepts

  • Event: An immutable fact that something happened (“OrderCreated”, “PaymentProcessed”, “UserRegistered”)
  • Producer: The service that publishes events
  • Consumer: A service that subscribes to and processes events
  • Broker: The infrastructure that routes events (Kafka, RabbitMQ, SNS/SQS)
  • Topic/Queue: A named channel where events are published

Kafka vs RabbitMQ

Feature Apache Kafka RabbitMQ
Model Distributed log (append-only) Message queue (consume and delete)
Retention Configurable (days/weeks/forever) Until consumed (or TTL)
Replay Yes (re-read from any offset) No (once consumed, gone)
Throughput Millions of events/sec Tens of thousands/sec
Ordering Guaranteed per partition Per queue (with single consumer)
Complexity High (ZooKeeper/KRaft, partitions) Low (simple to operate)
Best For Event streaming, audit logs, high throughput Task queues, RPC, simple pub/sub

Rule of thumb: Use RabbitMQ for task queues and simple messaging. Use Kafka when you need event replay, high throughput, or event sourcing.

Event Design

# Good event: self-contained, immutable, past tense
{
    "event_type": "order.created",
    "event_id": "evt_abc123",
    "timestamp": "2026-04-28T10:30:00Z",
    "version": 1,
    "data": {
        "order_id": "ord_456",
        "customer_id": "cust_789",
        "items": [
            {"product_id": "prod_1", "quantity": 2, "price": 29.99}
        ],
        "total": 59.98,
        "currency": "USD"
    },
    "metadata": {
        "source": "order-service",
        "correlation_id": "req_xyz"
    }
}

# Event naming conventions:
# entity.action (past tense): order.created, payment.processed, user.registered
# Include enough data that consumers do not need to call back to the producer

Event Sourcing

Instead of storing the current state, store every event that led to it. The current state is derived by replaying all events. Think of it as a Git log for your data.

# Traditional: store current state
# UPDATE accounts SET balance = 150 WHERE id = 1

# Event sourcing: store events
events = [
    {"type": "account.opened",    "data": {"balance": 0}},
    {"type": "money.deposited",   "data": {"amount": 200}},
    {"type": "money.withdrawn",   "data": {"amount": 50}},
    # Current balance: replay events -> 0 + 200 - 50 = 150
]

# Benefits:
# - Complete audit trail (when did balance change and why?)
# - Replay events to rebuild state or create new projections
# - Time travel: what was the balance on March 15?
# - Debug: replay events to reproduce any bug

# Drawbacks:
# - More complex queries (need projections for reads)
# - Event schema evolution is tricky
# - Storage grows over time (use snapshots)

CQRS: Command Query Responsibility Segregation

Separate the write model (commands) from the read model (queries). Write side handles business logic and publishes events. Read side creates optimized views for queries.

# Write side: handles commands, enforces business rules
class OrderCommandHandler:
    def handle_create_order(self, command):
        # Validate business rules
        if not inventory.has_stock(command.items):
            raise OutOfStockError()

        # Save to event store
        event = OrderCreatedEvent(
            order_id=generate_id(),
            items=command.items,
            total=calculate_total(command.items),
        )
        event_store.append(event)
        publish(event)

# Read side: optimized projections for queries
class OrderProjection:
    def on_order_created(self, event):
        # Update a denormalized read model
        db.execute("""
            INSERT INTO order_summaries (id, customer, total, status, created_at)
            VALUES (%s, %s, %s, 'pending', %s)
        """, [event.order_id, event.customer_id, event.total, event.timestamp])

    def on_payment_processed(self, event):
        db.execute("""
            UPDATE order_summaries SET status = 'paid' WHERE id = %s
        """, [event.order_id])

# Query side: fast reads from denormalized tables
def get_order_summary(order_id):
    return db.query("SELECT * FROM order_summaries WHERE id = %s", [order_id])

Dead Letter Queues

When a consumer fails to process a message after multiple retries, send it to a Dead Letter Queue (DLQ) instead of losing it or blocking the queue.

# Kafka: configure DLQ with error handling
from confluent_kafka import Consumer, Producer

dlq_producer = Producer({'bootstrap.servers': 'kafka:9092'})

def process_with_dlq(message, max_retries=3):
    for attempt in range(max_retries):
        try:
            process_event(message.value())
            return  # Success
        except TemporaryError:
            time.sleep(2 ** attempt)  # Exponential backoff
        except PermanentError as e:
            break  # Skip retries for permanent failures

    # All retries failed: send to DLQ
    dlq_producer.produce(
        'order-events.dlq',
        key=message.key(),
        value=message.value(),
        headers={
            'original-topic': message.topic(),
            'error': str(e),
            'retry-count': str(max_retries),
        }
    )

# Monitor DLQ: alert when messages appear
# Process DLQ: fix the issue, then replay messages

Idempotency: Processing Events Safely

Events can be delivered more than once (network retries, consumer restarts). Your consumers must be idempotent — processing the same event twice should produce the same result.

# Idempotent consumer using event_id deduplication
class IdempotentConsumer:
    def process(self, event):
        event_id = event['event_id']

        # Check if already processed
        if db.exists("SELECT 1 FROM processed_events WHERE event_id = %s", [event_id]):
            return  # Already processed, skip

        # Process the event
        handle_order_created(event['data'])

        # Mark as processed (in the same transaction!)
        db.execute(
            "INSERT INTO processed_events (event_id, processed_at) VALUES (%s, NOW())",
            [event_id]
        )
        db.commit()

# Alternative: use database constraints
# INSERT INTO payments (order_id, amount) VALUES (%s, %s)
# ON CONFLICT (order_id) DO NOTHING;
# The unique constraint on order_id prevents duplicate payments

Migrating from Monolith to Event-Driven

  1. Start with domain events inside the monolith: Emit events from your existing code without changing architecture
  2. Add an event bus (even in-process): Replace direct function calls with event publishing
  3. Extract one consumer at a time: Move email sending to a separate service that consumes events
  4. Introduce a broker (Kafka/RabbitMQ): Replace the in-process event bus with external messaging
  5. Extract more services gradually: Each extraction is independent and reversible

Key Takeaways

  • Events decouple services in time and availability — producers and consumers do not need to be online simultaneously
  • Use Kafka for event streaming, RabbitMQ for task queues — Kafka retains events, RabbitMQ deletes after consumption
  • Events should be self-contained — include enough data that consumers never need to call back to the producer
  • Dead letter queues prevent data loss — failed messages go to DLQ for later investigation, not the void
  • Consumers must be idempotent — use event_id deduplication or database constraints
  • Event sourcing gives you a complete audit trail — but adds complexity, so use it where the audit trail justifies the cost
  • Migrate incrementally — start with events inside your monolith before splitting services

Event-driven architecture is not about technology — it is about designing systems where services communicate through facts rather than commands. When Service A says “an order was created” instead of “charge this customer,” you get a system that is more resilient, more scalable, and easier to evolve. Start with the events. The architecture follows.