System Design/
Lesson

When multiple services collaborate on a business process, there are two ways to coordinate them: a central brain tells each service what to do (orchestration), or services react independently to events (choreography).

Orchestration: the central coordinator

One service (the orchestrator) controls the entire flow, calls each service in sequence, and handles errors.

// OrderOrchestrator - the conductor
class OrderOrchestrator {
  async fulfillOrder(orderId: string) {
    const order = await this.orderService.get(orderId);

    // Step 1: Charge payment
    const payment = await this.paymentService.charge(order.userId, order.total);
    if (!payment.success) {
      await this.orderService.updateStatus(orderId, 'payment_failed');
      return { success: false, reason: 'payment_failed' };
    }

    // Step 2: Reserve inventory
    try {
      await this.inventoryService.reserve(order.items);
    } catch (err) {
      // Compensate: refund the payment
      await this.paymentService.refund(payment.id);
      await this.orderService.updateStatus(orderId, 'inventory_failed');
      return { success: false, reason: 'out_of_stock' };
    }

    // Step 3: Create shipment
    const shipment = await this.shippingService.createShipment(order);

    // Step 4: Send confirmation
    await this.notificationService.sendOrderConfirmation(order, shipment);

    await this.orderService.updateStatus(orderId, 'fulfilled');
    return { success: true, shipmentId: shipment.id };
  }
}

Orchestration strengths

  • Easy to understand: the entire flow lives in one place
  • Easy to debug: one service logs the whole sequence
  • Explicit error handling: compensation logic is visible and testable
  • Consistent: the orchestrator enforces the exact order of operations

Orchestration weaknesses

  • Single point of failure: if the orchestrator dies mid-flow, the order is in a partial state
  • Coupling: the orchestrator knows about every service in the flow
  • Bottleneck: all traffic flows through one coordinator
  • Hard to extend: adding a step means modifying the orchestrator
02

Choreography: the reactive approach

No conductor. Each service listens for events, does its work, and publishes new events. Services don't know about each other, only about events.

// Order Service - publishes the initial event
async function createOrder(orderData: OrderInput) {
  const order = await db.insert(orders).values(orderData);
  await eventBus.publish('order.created', {
    orderId: order.id,
    userId: order.userId,
    items: order.items,
    total: order.total,
  });
  return order;
}

// Payment Service - reacts to order.created
eventBus.subscribe('order.created', async (event) => {
  const result = await chargeCustomer(event.userId, event.total);
  if (result.success) {
    await eventBus.publish('payment.completed', {
      orderId: event.orderId,
      paymentId: result.paymentId,
    });
  } else {
    await eventBus.publish('payment.failed', {
      orderId: event.orderId,
      reason: result.error,
    });
  }
});

// Inventory Service - reacts to payment.completed
eventBus.subscribe('payment.completed', async (event) => {
  const reserved = await reserveItems(event.orderId);
  if (reserved) {
    await eventBus.publish('inventory.reserved', {
      orderId: event.orderId,
    });
  } else {
    await eventBus.publish('inventory.failed', {
      orderId: event.orderId,
    });
  }
});

// Shipping Service - reacts to inventory.reserved
eventBus.subscribe('inventory.reserved', async (event) => {
  const shipment = await createShipment(event.orderId);
  await eventBus.publish('shipment.created', {
    orderId: event.orderId,
    trackingNumber: shipment.trackingNumber,
  });
});

// Notification Service - reacts to shipment.created
eventBus.subscribe('shipment.created', async (event) => {
  await sendConfirmationEmail(event.orderId, event.trackingNumber);
});

Choreography strengths

  • Loose coupling: services don't know about each other
  • Easy to extend: add a new subscriber without modifying existing services
  • Resilient: no single point of failure in the coordination
  • Independently deployable: services evolve at their own pace

Choreography weaknesses

  • Hard to trace: the flow is scattered across services and event handlers
  • Hard to debug: no single place shows the complete sequence
  • Implicit flow: the overall business process is emergent, not explicit
  • Compensation is complex: each service must handle rollbackWhat is rollback?Undoing a database migration or deployment to restore the previous state when something goes wrong. events independently
03

Side-by-side comparison

DimensionOrchestrationChoreography
Control flowCentralized, explicitDistributed, emergent
CouplingOrchestrator coupled to all servicesServices coupled to events only
DebuggingSingle service to inspectMust correlate across services
Error handlingCentralized compensationEach service handles its own rollback
Adding new stepsModify the orchestratorAdd new event subscriber
Single point of failureYes (the orchestrator)No (but event bus is critical)
VisibilityFull flow in one placeRequires distributed tracing
ConsistencyEasier to enforce orderEventual consistency, harder to guarantee
ScalabilityOrchestrator can bottleneckEach service scales independently
TestingTest the orchestrator end-to-endTest each service + integration
04

Sagas: transactions across services

In microservicesWhat is microservices?An architecture where an application is split into small, independently deployed services that communicate over the network, each owning its own data., there's no distributed transactionWhat is transaction?A group of database operations that either all succeed together or all fail together, preventing partial updates.. Sagas are a sequence of local transactions where each step succeeds and triggers the next, or fails and triggers compensating actions.

Orchestration sagaWhat is saga?A pattern for coordinating multi-service operations where each step has a compensating undo action that runs if a later step fails.

class OrderSaga {
  private steps: SagaStep[] = [];

  async execute(orderId: string) {
    for (const step of this.steps) {
      try {
        await step.execute(orderId);
      } catch (err) {
        // Something failed - compensate all completed steps in reverse
        await this.compensate(orderId);
        throw err;
      }
    }
  }

  private async compensate(orderId: string) {
    // Walk backwards through completed steps
    for (const step of this.steps.reverse()) {
      if (step.completed) {
        await step.compensate(orderId);
      }
    }
  }
}

// Define the saga steps
const saga = new OrderSaga();
saga.addStep({
  execute: (id) => paymentService.charge(id),
  compensate: (id) => paymentService.refund(id),
});
saga.addStep({
  execute: (id) => inventoryService.reserve(id),
  compensate: (id) => inventoryService.release(id),
});
saga.addStep({
  execute: (id) => shippingService.schedule(id),
  compensate: (id) => shippingService.cancel(id),
});

Choreography saga

Happy path:
order.created → payment.completed → inventory.reserved → shipment.created

Failure path (inventory fails):
order.created → payment.completed → inventory.failed → payment.refund.requested → order.cancelled

Each service listens for failure events and compensates accordingly.

05

The hybrid approach

Most production systems use orchestration for critical flows and choreography for side effects.

Order Fulfillment (orchestrated - too important for implicit flow):
  Orchestrator → Payment → Inventory → Shipping

Side effects (choreographed - loosely coupled, non-critical):
  order.fulfilled event →
    ├── Analytics Service: update metrics
    ├── Loyalty Service: award points
    ├── Email Service: send receipt
    └── Recommendation Service: update model

The orchestrator handles the critical path with explicit compensation. Side effects can fail independently. If the loyalty service is down, the user still gets their order. Orchestrate the core, choreograph the restWhat is rest?An architectural style for web APIs where URLs represent resources (nouns) and HTTP methods (GET, POST, PUT, DELETE) represent actions on those resources..

06

Choosing your pattern

ScenarioRecommended patternWhy
Multi-step transaction with rollbackOrchestrationExplicit compensation is critical
Many services react to one eventChoreographyFan-out without coupling
Strict ordering requiredOrchestrationCoordinator enforces sequence
Adding new reactions frequentlyChoreographyNo central code to modify
Complex business process (5+ steps)OrchestrationReadability matters at scale
Notification / analytics side effectsChoreographyNon-critical, loosely coupled
Regulatory/compliance requirementsOrchestrationAuditability in one place
07

Quick reference

ConceptKey point
OrchestrationCentral service controls the flow; easy to debug, creates coupling
ChoreographyServices react to events; loosely coupled, hard to trace
SagaDistributed transaction using compensating actions
CompensationUndoing a completed step when a later step fails
HybridOrchestrate critical flows, choreograph side effects
Event busInfrastructure that routes events between services (Kafka, RabbitMQ)
AI pitfall
AI will confidently generate saga implementations that look clean on paper but miss critical failure scenarios. What happens if the compensation step itself fails? What if the orchestrator crashes mid-saga? These edge cases require careful thought about idempotency, timeouts, and manual intervention paths that AI rarely includes in its first draft.
Good to know
Most business processes that feel like they need a distributed saga can actually be modeled as a single database transaction if you keep the data in one service. Before implementing a saga, ask: "Can I restructure my service boundaries so this entire workflow fits in one service?" If yes, do that instead, it is dramatically simpler.
Edge case
Choreography-based architectures can develop "event loops" where Service A publishes an event that triggers Service B, which publishes an event that triggers Service A again. Without careful design, this creates infinite loops. Always include a correlation ID and a maximum hop count in your events to detect and break cycles.