Outbox pattern and CDC — solving the dual-write problem
This is the question that separates engineers who've actually run event-driven systems from those who've only drawn the diagram. The setup is everywhere: a service updates its database and publishes an event ("order placed") so other services react. Doing both reliably is harder than it looks.
The dual-write problem
You have two systems to update — the database and the message broker — and no transaction spans both. Whatever order you choose, a crash in the gap leaves them inconsistent:
- DB commit, then publish → if the service crashes after commit but before publish, the order exists but no event is ever sent. Downstream never finds out. (Lost event.)
- Publish, then DB commit → if the publish succeeds but the DB transaction rolls back, you've announced an order that doesn't exist. (Phantom event.)
Wrapping them in a distributed transaction (2PC) technically works but is slow, couples availability of the DB and broker, and most modern brokers don't support it well. So the standard solution avoids 2PC entirely.
The transactional outbox pattern
The insight: make the event part of the same local database transaction as the business data. Instead of publishing to the broker directly, the service inserts the event into an outbox table in the same DB, in the same transaction as the order write. Now there is exactly one atomic commit — either both the order and the outbox row persist, or neither does. No dual write.
BEGIN;
INSERT INTO orders (id, customer_id, total) VALUES (...);
INSERT INTO outbox (id, aggregate_id, type, payload, created_at)
VALUES (gen_random_uuid(), :orderId, 'OrderPlaced', :json, now());
COMMIT; -- atomic: order + event, or nothing
A separate relay / message-relay process then reads unpublished rows from the outbox and publishes them to the broker, marking them sent (or deleting them) afterward. Because the relay publishes with at-least-once semantics and may retry, consumers must be idempotent — but no event is ever lost.
Two ways to drain the outbox
- Polling publisher. A background job
SELECTs new outbox rows on an interval and publishes them. Simple, no extra infrastructure, but adds DB load and polling latency. - Change Data Capture (CDC). Instead of polling, tail the database's transaction log (Postgres WAL, MySQL binlog). A connector reads committed changes in commit order and streams them out. Debezium running on Kafka Connect is the canonical tool: it captures inserts to the outbox table and publishes them to Kafka topics, with no polling and minimal DB overhead. CDC can even be applied to the business tables directly — but routing through an explicit outbox table gives you clean, intentional event shapes rather than leaking raw row diffs.
Trade-offs to mention
- Ordering: CDC preserves commit order; route by aggregate id to a Kafka partition to keep per-entity ordering downstream.
- Idempotency is still required: the relay is at-least-once, so duplicates happen on retry — consumers dedup on the event id.
- Outbox cleanup: the table grows; prune published rows (or rely on the connector's offset + a TTL sweep).
- Operational cost: CDC adds Debezium/Connect to operate and monitor; polling is simpler if event volume and latency tolerance allow.