Integration & APIs/
Lesson

Imagine you are flipping a light switch and the bulb keeps blowing. After the third blown bulb, you stop flipping the switch and call an electrician. That is exactly what a circuit breakerWhat is circuit breaker?A pattern that stops sending requests to a failing service after repeated errors, giving it time to recover before trying again. does for APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses. calls: after enough failures, it stops trying and fails immediately instead of wasting time and resources on a service that clearly is not responding.

Without a circuit breaker, your application keeps sending requests to a dead service, each one consuming a connection, holding a thread, and waiting for a timeout. Multiply that by hundreds of concurrent users and you have a cascading failure. The circuit breaker cuts the circuit before the damage spreads.

AI pitfall
AI-generated circuit breaker configurations almost always use a single global circuit breaker for all external calls. If the recommendation engine goes down, it opens the circuit for the payment service too. Each dependency must have its own circuit breaker with its own configuration, AI rarely does this unless you explicitly ask.

The three states

A circuit breakerWhat is circuit breaker?A pattern that stops sending requests to a failing service after repeated errors, giving it time to recover before trying again. is a state machine with three states. Understanding the transitions between them is the core of the pattern.

StateBehaviorTransitions toTrigger
ClosedRequests pass through normally. Failures are counted.OpenFailure count exceeds threshold
OpenAll requests fail immediately (no network call). A timer starts.Half-OpenCooldown timer expires
Half-OpenOne test request is allowed through.Closed (if success) or Open (if failure)Test request result
┌──────────┐  failure threshold  ┌──────────┐
│  CLOSED  │ ──────────────────→ │   OPEN (normal)(fail fast)│
└──────────┘                     └──────────┘
     ↑                                │
     │   success                      │ cooldown expires
     │                                ↓
     │                          ┌───────────┐
     └───────────────────────── │ HALF-OPEN  (testing) │
                                └───────────┘
              failure → back to OPEN

In the closed state, everything works normally. The circuit breaker silently counts failures. When the failure count crosses a threshold (say, 5 failures in 30 seconds), the circuit opens.

In the open state, the circuit breaker short-circuits every request. No HTTPWhat is http?The protocol browsers and servers use to exchange web pages, API data, and other resources, defining how requests and responses are formatted. call is made. The caller receives an error immediately, typically within microseconds instead of waiting 5-30 seconds for a timeout. This is the protective behavior: you fail fast and preserve your resources.

After a cooldown period (say, 30 seconds), the circuit moves to half-open. It allows one request through as a test. If that request succeeds, the circuit closes and normal traffic resumes. If it fails, the circuit opens again and the cooldown timer restarts.

Good to know
The half-open state is the clever part. Without it, you would have to choose between "keep trying a dead service" (no circuit breaker) and "never try again" (open circuit with no recovery). The half-open state automatically tests recovery by letting one request through, and either closes the circuit or re-opens it based on the result.
02

Building a circuit breakerWhat is circuit breaker?A pattern that stops sending requests to a failing service after repeated errors, giving it time to recover before trying again. from scratch

Let's implement one step by step so you understand every moving part.

type CircuitState = 'CLOSED' | 'OPEN' | 'HALF_OPEN';

interface CircuitBreakerOptions {
  failureThreshold: number;  // failures before opening
  resetTimeout: number;      // ms before trying half-open
  monitorWindow: number;     // ms window for counting failures
}

class CircuitBreaker {
  private state: CircuitState = 'CLOSED';
  private failures: number[] = [];  // timestamps of recent failures
  private lastFailureTime: number = 0;
  private options: CircuitBreakerOptions;

  constructor(options: CircuitBreakerOptions) {
    this.options = options;
  }

  async call<T>(fn: () => Promise<T>): Promise<T> {
    // OPEN: fail immediately
    if (this.state === 'OPEN') {
      if (Date.now() - this.lastFailureTime >= this.options.resetTimeout) {
        this.state = 'HALF_OPEN';
      } else {
        throw new CircuitOpenError('Circuit is OPEN - failing fast');
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  private onSuccess(): void {
    if (this.state === 'HALF_OPEN') {
      // Test request succeeded - close the circuit
      this.state = 'CLOSED';
      this.failures = [];
      console.log('Circuit CLOSED - service recovered');
    }
  }

  private onFailure(): void {
    const now = Date.now();
    this.lastFailureTime = now;

    // Remove failures outside the monitoring window
    this.failures = this.failures.filter(
      (t) => now - t < this.options.monitorWindow
    );
    this.failures.push(now);

    if (this.state === 'HALF_OPEN') {
      // Test request failed - reopen the circuit
      this.state = 'OPEN';
      console.log('Circuit OPEN - test request failed');
      return;
    }

    if (this.failures.length >= this.options.failureThreshold) {
      this.state = 'OPEN';
      console.log(
        `Circuit OPEN - ${this.failures.length} failures in ` +
        `${this.options.monitorWindow}ms`
      );
    }
  }

  getState(): CircuitState {
    return this.state;
  }
}

class CircuitOpenError extends Error {
  constructor(message: string) {
    super(message);
    this.name = 'CircuitOpenError';
  }
}

Using the circuit breaker

const paymentBreaker = new CircuitBreaker({
  failureThreshold: 5,     // open after 5 failures
  resetTimeout: 30000,     // try again after 30 seconds
  monitorWindow: 60000,    // count failures in a 60-second window
});

async function chargeCustomer(amount: number): Promise<PaymentResult> {
  try {
    return await paymentBreaker.call(() =>
      fetch('https://payments.example.com/charge', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ amount }),
      }).then((res) => {
        if (!res.ok) throw new Error(`Payment failed: ${res.status}`);
        return res.json();
      })
    );
  } catch (error) {
    if (error instanceof CircuitOpenError) {
      // Circuit is open - don't even try, return a friendly error
      return { status: 'unavailable', message: 'Payment service temporarily unavailable. Try again shortly.' };
    }
    throw error;
  }
}
03

Using opossum (Node.js library)

In production, you probably do not want to maintain your own state machine. The opossum library is the most popular circuit breakerWhat is circuit breaker?A pattern that stops sending requests to a failing service after repeated errors, giving it time to recover before trying again. for Node.js. It handles the state transitions, event emission, monitoring, and fallback logic for you.

import CircuitBreaker from 'opossum';

// Wrap any async function
const breaker = new CircuitBreaker(
  async (userId: string) => {
    const res = await fetch(`https://api.example.com/users/${userId}`);
    if (!res.ok) throw new Error(`API error: ${res.status}`);
    return res.json();
  },
  {
    timeout: 5000,            // call times out after 5s
    errorThresholdPercentage: 50,  // open at 50% failure rate
    resetTimeout: 30000,       // try half-open after 30s
    volumeThreshold: 10,       // need at least 10 calls before tripping
  }
);

// Fallback when circuit is open
breaker.fallback((userId: string) => {
  return { id: userId, name: 'Unknown', source: 'cache' };
});

// Events for monitoring
breaker.on('open', () => console.log('Circuit OPENED'));
breaker.on('halfOpen', () => console.log('Circuit HALF-OPEN'));
breaker.on('close', () => console.log('Circuit CLOSED'));
breaker.on('fallback', () => metrics.increment('circuit.fallback'));

// Use it
const user = await breaker.fire('user-123');
Edge case
If you retry 3 times inside a circuit breaker with a threshold of 5, two failed user requests (3 retries each = 6 failures) will trip the circuit. This interaction between retry logic and circuit breaker thresholds catches many teams off guard. Account for your retry count when setting the failure threshold.
04

Configuration guide

Getting circuit breakerWhat is circuit breaker?A pattern that stops sending requests to a failing service after repeated errors, giving it time to recover before trying again. configuration right is tricky. Too sensitive and you will trip on normal fluctuations. Too permissive and you will not catch real failures fast enough.

ParameterToo lowToo highRecommended starting point
Failure thresholdTrips on transient errorsSends too many requests to a dead service5-10 failures
Reset timeoutHammers recovering serviceKeeps circuit open too long15-60 seconds
Monitor windowForgets real failure patternsOld failures trigger false positives30-120 seconds
Volume thresholdTrips before having enough dataNeeds too many calls to detect problems10-20 calls
Timeout (per call)Rejects slow but valid responsesHolds resources too long on dead calls3-10 seconds

Start conservative and tune based on your monitoring data. Every service has different characteristics, a payment APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses. has different acceptable latencyWhat is latency?The time delay between sending a request and receiving the first byte of the response, usually measured in milliseconds. than a recommendation engine.

A common mistake is configuring a single global circuit breaker for all external calls. Each dependency should have its own circuit breaker with its own configuration. The payment service going down should not open the circuit for the notification service.
05

When not to use a circuit breakerWhat is circuit breaker?A pattern that stops sending requests to a failing service after repeated errors, giving it time to recover before trying again.

Circuit breakers are not always the right tool. You do not need one when:

  • The call is already fast-failing (connection refused is instant, no protection needed)
  • The service is statelessWhat is stateless?A design where each request contains all the information the server needs, so any server can handle any request without remembering previous ones. and idempotentWhat is idempotent?An operation that produces the same result whether you perform it once or multiple times, making retries safe. with built-in retries (a CDNWhat is cdn?Content Delivery Network - a network of servers around the world that caches your files and serves them from the location closest to the user, making pages load faster., for example)
  • You are calling a local library or in-process dependencyWhat is dependency?A piece of code written by someone else that your project needs to work. Think of it as a building block you import instead of writing yourself.
  • The failure mode is always corrupt data, not availability (a circuit breaker detects failure counts, not data quality)

Use circuit breakers specifically for remote calls where failures are slow (timeouts) and can cascade to exhaust your resources.