When you move from a monolithWhat is monolith?A software architecture where the entire application lives in a single codebase and deploys as one unit. Simpler to build and debug than microservices. to distributed services, you lose the safety net of a single database transactionWhat is transaction?A group of database operations that either all succeed together or all fail together, preventing partial updates.. You can no longer wrap "create order + reserve inventory + charge payment" in one BEGIN...COMMIT. These three patterns -- SagaWhat is saga?A pattern for coordinating multi-service operations where each step has a compensating undo action that runs if a later step fails., Outbox, and CQRSWhat is cqrs?Command Query Responsibility Segregation - using separate models for read and write operations so each can be optimized independently. -- exist to handle the distributed coordination problems that inevitably arise.
The sagaWhat is saga?A pattern for coordinating multi-service operations where each step has a compensating undo action that runs if a later step fails. pattern
A saga is a sequence of local transactions where each step either succeeds (triggering the next step) or fails (triggering compensating actions to undo previous steps). It replaces a distributed transactionWhat is transaction?A group of database operations that either all succeed together or all fail together, preventing partial updates. with a chain of smaller, local ones.
Consider an order flow that spans three services:
1. Order Service -> Create order (status: pending)
2. Payment Service -> Charge credit card
3. Inventory Service -> Reserve itemsIf payment succeeds but inventory reservation fails, you need to refund the payment and cancel the order. A saga defines both the forward steps and the compensating actions.
Orchestration: central coordinator
In orchestration, a single saga orchestrator controls the flow. It sends commands to each service and decides what to do based on their responses.
class OrderSaga {
async execute(orderData: OrderData) {
const sagaLog: SagaStep[] = [];
try {
// Step 1: Create order
const order = await orderService.createOrder(orderData);
sagaLog.push({ service: 'order', action: 'create', id: order.id });
// Step 2: Process payment
const payment = await paymentService.charge({
amount: order.total,
customerId: order.customerId
});
sagaLog.push({ service: 'payment', action: 'charge', id: payment.id });
// Step 3: Reserve inventory
await inventoryService.reserve({
items: order.items,
orderId: order.id
});
sagaLog.push({ service: 'inventory', action: 'reserve', id: order.id });
// All steps succeeded
await orderService.updateStatus(order.id, 'confirmed');
return { success: true, orderId: order.id };
} catch (error) {
// Compensate in reverse order
await this.compensate(sagaLog);
return { success: false, error: error.message };
}
}
private async compensate(sagaLog: SagaStep[]) {
for (const step of sagaLog.reverse()) {
try {
switch (step.service) {
case 'payment':
await paymentService.refund(step.id);
break;
case 'order':
await orderService.cancel(step.id);
break;
case 'inventory':
await inventoryService.release(step.id);
break;
}
} catch (compensationError) {
// Log and alert: compensation failed, needs manual intervention
console.error(`Compensation failed for ${step.service}:`, compensationError);
await alertOps({ step, error: compensationError });
}
}
}
}The orchestrator has a clear picture of the entire flow, making it easier to understand, debug, and modify. But it becomes a single point of failure and can become a "god service" that knows too much.
Choreography: decentralized reactions
In choreography, there is no central coordinator. Each service listens for events and reacts by performing its work, then emitting its own event.
// Order Service: create order and emit event
async function placeOrder(orderData: OrderData) {
const order = await db.orders.create({ ...orderData, status: 'pending' });
await eventBus.publish({
type: 'OrderPlaced',
data: { orderId: order.id, items: order.items, total: order.total }
});
}
// Payment Service: listens for OrderPlaced
eventBus.subscribe('OrderPlaced', async (event) => {
try {
const payment = await chargeCard(event.data.total, event.data.customerId);
await eventBus.publish({
type: 'PaymentCompleted',
data: { orderId: event.data.orderId, paymentId: payment.id }
});
} catch (error) {
await eventBus.publish({
type: 'PaymentFailed',
data: { orderId: event.data.orderId, reason: error.message }
});
}
});
// Inventory Service: listens for PaymentCompleted
eventBus.subscribe('PaymentCompleted', async (event) => {
try {
await reserveItems(event.data.orderId);
await eventBus.publish({
type: 'InventoryReserved',
data: { orderId: event.data.orderId }
});
} catch (error) {
await eventBus.publish({
type: 'InventoryReservationFailed',
data: { orderId: event.data.orderId }
});
}
});
// Order Service: also listens for failure events to compensate
eventBus.subscribe('PaymentFailed', async (event) => {
await db.orders.update(event.data.orderId, { status: 'cancelled' });
});Choreography keeps services truly independent -- no service knows about the others. But the flow becomes implicit and hard to trace. Debugging "why did this order fail?" means reading logs across multiple services.
Orchestration vs choreography
| Aspect | Orchestration | Choreography |
|---|---|---|
| Flow visibility | Centralized, easy to follow | Distributed, hard to trace |
| Coupling | Orchestrator knows all services | Services only know events |
| Complexity | Grows in the orchestrator | Grows across all services |
| Single point of failure | Yes (the orchestrator) | No |
| Debugging | Read one service's logs | Correlate logs across services |
| Best for | Complex flows with many steps | Simple flows with 2-3 steps |
| Adding new steps | Modify orchestrator | Add new subscriber (no changes to existing) |
| Risk | God service | Spaghetti events |
In practice, many teams use a hybrid: orchestration for complex multi-step flows (order processing) and choreography for simple fan-out scenarios (send notification when user signs up).
The outbox patternWhat is outbox pattern?A reliability pattern where events are written to a database table in the same transaction as business data, then published separately - guaranteeing delivery.
Here is a nasty problem: your service needs to save data to its database AND publish an event. If you do them as two separate operations, either can fail independently.
// BROKEN: two separate operations
async function createOrder(orderData: OrderData) {
// Step 1: save to database
const order = await db.orders.create(orderData);
// Step 2: publish event
await eventBus.publish({ type: 'OrderPlaced', data: order });
// What if this fails? Order exists but no event was published.
// What if step 1 fails after step 2? Event published but no order.
}The Outbox pattern solves this by writing the event to an outbox table in the same database transactionWhat is transaction?A group of database operations that either all succeed together or all fail together, preventing partial updates. as the business data.
// CORRECT: Outbox pattern
async function createOrder(orderData: OrderData) {
await db.transaction(async (trx) => {
// Step 1: save order
const order = await trx('orders').insert(orderData).returning('*');
// Step 2: write event to outbox (same transaction!)
await trx('outbox').insert({
id: crypto.randomUUID(),
event_type: 'OrderPlaced',
payload: JSON.stringify({
orderId: order[0].id,
customerId: order[0].customerId,
total: order[0].total
}),
created_at: new Date(),
published: false
});
});
// Both writes succeed or both fail. Atomicity guaranteed.
}A separate process (often called a "relay" or "publisher") polls the outbox table and publishes unpublished events to the message broker.
// Outbox relay: runs on a timer or listens for DB changes
async function publishOutboxEvents() {
const unpublished = await db('outbox')
.where('published', false)
.orderBy('created_at', 'asc')
.limit(100);
for (const entry of unpublished) {
await eventBus.publish({
type: entry.event_type,
data: JSON.parse(entry.payload)
});
await db('outbox').where('id', entry.id).update({ published: true });
}
}
// Run every 5 seconds
setInterval(publishOutboxEvents, 5000);The outbox guarantees at-least-once delivery: if the relay crashes mid-publish, it will retry on the next run. This means consumers must be idempotentWhat is idempotent?An operation that produces the same result whether you perform it once or multiple times, making retries safe. (we cover this in lesson 4).
CQRSWhat is cqrs?Command Query Responsibility Segregation - using separate models for read and write operations so each can be optimized independently.: command query responsibility segregation
CQRS splits your application into two separate models: one optimized for writes (commands) and one optimized for reads (queries).
In a typical CRUDWhat is crud?Create, Read, Update, Delete - the four basic operations almost every application performs on data. application, the same database model handles both reading and writing. This works fine until your read and write patterns diverge significantly.
// Without CQRS: same model for reads and writes
// Write: normalized, validated, transactional
// Read: also normalized... but we need joins across 5 tables for one dashboard view
// With CQRS: separate models
// Write model: normalized, enforces business rules
async function placeOrder(command: PlaceOrderCommand) {
const order = await writeDb.orders.create({
customerId: command.customerId,
items: command.items,
status: 'pending'
});
await eventBus.publish({ type: 'OrderPlaced', data: order });
}
// Read model: denormalized, optimized for the dashboard query
eventBus.subscribe('OrderPlaced', async (event) => {
await readDb.orderSummaries.upsert({
orderId: event.data.id,
customerName: event.data.customerName, // pre-joined
itemCount: event.data.items.length,
total: event.data.total,
status: 'pending'
});
});
// Query: fast single-table read, no joins needed
async function getOrderDashboard(customerId: string) {
return readDb.orderSummaries
.where('customerId', customerId)
.orderBy('createdAt', 'desc')
.limit(50);
}| Aspect | Traditional CRUD | CQRS |
|---|---|---|
| Read model | Same as write (normalized) | Separate (denormalized for queries) |
| Write model | Same as read | Separate (normalized for integrity) |
| Read performance | Joins on every query | Pre-computed, fast |
| Consistency | Immediate | Eventually consistent |
| Complexity | Low | High (two models to maintain) |
| Best for | Simple apps with balanced reads/writes | Read-heavy apps with complex queries |
CQRS pairs naturally with event-driven architecture: writes produce events, and those events update the read model. But it introduces eventual consistencyWhat is eventual consistency?A guarantee that all copies of data will converge to the same value given enough time, rather than being instantly synchronized after every write. -- the read model lags behind the write model by the time it takes to process events.
When to use each pattern
| Pattern | Use when | Avoid when |
|---|---|---|
| Saga (orchestration) | Complex multi-service workflows with many steps | Simple 2-service interactions |
| Saga (choreography) | Loose coupling matters, few steps | Many steps where tracing is important |
| Outbox | You need guaranteed event publishing with DB writes | You are using an event-sourced store (events are already the source of truth) |
| CQRS | Reads and writes have very different performance needs | Simple CRUD with balanced access patterns |
Start simple. Add these patterns when the pain of not having them becomes real.