fetch() call has no timeout, it will wait indefinitely. This is the single most common reliability mistake in distributed systems, and AI reproduces it faithfully because the training data is full of timeout-free code.A missing timeout is the single most common reliability mistake in distributed systems. Without a timeout, a single hung connection can hold a thread, a socket, and a database connection hostage for minutes or hours. Multiply that across concurrent requests and you have a system that appears to be "slow" when it is actually stuck, waiting for a response that will never come.
Types of timeouts
Not all timeouts measure the same thing. Understanding the distinction between them helps you configure each one correctly.
| Timeout type | What it measures | Typical range | When it fires |
|---|---|---|---|
| Connect timeout | Time to establish a TCP connection | 1-3 seconds | The server is unreachable or a firewall is dropping packets |
| Read timeout | Time to receive the first byte of the response body | 3-30 seconds | The server accepted the connection but is processing slowly |
| Total timeout | Wall clock time for the entire request/response cycle | 5-60 seconds | Large payloads, slow servers, or DNS resolution delays |
| Idle timeout | Time a connection can sit unused before being closed | 30-120 seconds | Connection pooling, keep-alive management |
Connect timeouts should always be short. If you cannot establish a TCP connection in 2-3 seconds, the server is either down or a network device is silently dropping packets. Waiting 30 seconds will not help, it will just waste 30 seconds.
Read timeouts depend on what the server is doing. A simple CRUDWhat is crud?Create, Read, Update, Delete - the four basic operations almost every application performs on data. lookup should respond in under a second. A report generation endpointWhat is endpoint?A specific URL path on a server that handles a particular type of request, like GET /api/users. might legitimately take 15 seconds. Set your read timeout based on what is reasonable for that specific operation.
// Using node-fetch with separate connect and read timeouts
import { Agent } from 'http';
const agent = new Agent({
timeout: 3000, // connect timeout: 3 seconds
keepAlive: true,
maxSockets: 50,
});
const response = await fetch('https://api.example.com/data', {
agent,
signal: AbortSignal.timeout(10000), // total timeout: 10 seconds
});Axios timeout configuration
import axios from 'axios';
const client = axios.create({
timeout: 10000, // total timeout: 10 seconds
// For separate connect/read timeouts, use custom adapter or httpAgent
httpAgent: new Agent({
timeout: 3000, // connect timeout
}),
});
// Per-request override for slow endpoints
const report = await client.get('/reports/annual', {
timeout: 30000, // this one is legitimately slow
});AbortSignal.timeout(5000) in Node.js creates a timeout on the entire fetch operation including DNS resolution and TLS handshake. If DNS takes 3 seconds (rare but possible on cold starts), you only have 2 seconds left for the actual request. Account for infrastructure overhead when setting timeouts.Timeout propagation
This is where most developers get tripped up. You set a 10-second timeout on your API gatewayWhat is api gateway?A single entry point that sits in front of multiple backend services, routing requests to the right one and handling shared concerns like authentication and rate limiting. and think you are safe. But your handler calls three services sequentially, each with a 10-second timeout. The actual worst case is 30 seconds, three times your gateway timeout. The gateway times out after 10 seconds, returns a 504 to the user, but your handler keeps running in the background, consuming resources for another 20 seconds.
User request (10s budget)
├── Service A (10s timeout) → might use 10s
├── Service B (10s timeout) → might use 10s
└── Service C (10s timeout) → might use 10s
Worst case: 30 seconds. Gateway timed out at 10s.
The user sees an error. Your server is still working on a dead request.The fix is timeout budgets: distribute your total timeout across all downstream calls.
class TimeoutBudget {
private deadline: number;
constructor(totalMs: number) {
this.deadline = Date.now() + totalMs;
}
remaining(): number {
return Math.max(0, this.deadline - Date.now());
}
hasExpired(): boolean {
return Date.now() >= this.deadline;
}
// Create a signal that aborts when the budget is exhausted
toAbortSignal(): AbortSignal {
return AbortSignal.timeout(this.remaining());
}
}
// Usage: distribute a 10-second budget across three calls
async function handleRequest(req: Request): Promise<Response> {
const budget = new TimeoutBudget(10000); // 10 seconds total
// Each call gets whatever time is left
const user = await fetch('https://users.internal/api/user/123', {
signal: budget.toAbortSignal(),
}).then((r) => r.json());
if (budget.hasExpired()) {
return Response.json({ error: 'Timeout' }, { status: 504 });
}
const orders = await fetch('https://orders.internal/api/user/123/orders', {
signal: budget.toAbortSignal(),
}).then((r) => r.json());
if (budget.hasExpired()) {
return Response.json(
{ user, orders: [], note: 'Orders timed out' },
{ status: 200 } // partial success
);
}
const recommendations = await fetch('https://recs.internal/api/user/123', {
signal: budget.toAbortSignal(),
}).then((r) => r.json()).catch(() => []); // optional, fail silently
return Response.json({ user, orders, recommendations });
}Notice the progression: the user fetch is critical (fail the request if it times out), the orders fetch returns partial data, and the recommendations are optional (catch and return empty array). This is graceful degradation in action.
Graceful degradation
Graceful degradation means your application still works, just with reduced functionality, when a dependencyWhat is dependency?A piece of code written by someone else that your project needs to work. Think of it as a building block you import instead of writing yourself. fails. Instead of showing a blank error page, you show what you can with what you have.
| Dependency fails | Bad response | Graceful degradation |
|---|---|---|
| Recommendation engine | 500 Internal Server Error | Show "Top sellers" from a static list |
| User avatar service | Broken image icon | Show initials or a default avatar |
| Search service | "Service unavailable" page | Show category browsing instead |
| Real-time pricing | Old price or no price shown | Show cached price with "prices may vary" |
| Analytics service | Entire page fails to load | Disable tracking silently, page loads fine |
async function getProductPage(productId: string): Promise<ProductPage> {
// Critical: product data must exist
const product = await productService.getProduct(productId);
// Degraded: show cached price if pricing service is down
let price: Price;
try {
price = await pricingService.getPrice(productId);
} catch {
price = await cache.get(`price:${productId}`) ?? product.basePrice;
}
// Optional: recommendations are nice but not essential
const recommendations = await recommendationService
.getRecommendations(productId)
.catch(() => getFallbackRecommendations(product.category));
// Optional: reviews are non-critical
const reviews = await reviewService
.getReviews(productId)
.catch(() => ({ items: [], total: 0, note: 'Reviews temporarily unavailable' }));
return { product, price, recommendations, reviews };
}Cache as a safety net
Caching is the most common degradation strategy. When the live service is down, serve stale data. Stale data is almost always better than no data.
async function fetchWithCache<T>(
key: string,
fetcher: () => Promise<T>,
ttlMs: number = 300000 // 5 minutes default
): Promise<T> {
try {
const fresh = await fetcher();
await cache.set(key, fresh, ttlMs);
return fresh;
} catch (error) {
// Live fetch failed - try cache
const cached = await cache.get<T>(key);
if (cached) {
console.warn(`Serving stale cache for ${key}: ${error}`);
return cached;
}
// Nothing in cache either - now we truly fail
throw error;
}
}Bulkhead pattern
The bulkhead pattern is borrowed from ship design: ships have watertight compartments (bulkheads) so that a breach in one compartment does not sink the entire ship. In software, a bulkhead isolates failures so that one misbehaving dependencyWhat is dependency?A piece of code written by someone else that your project needs to work. Think of it as a building block you import instead of writing yourself. cannot consume all your resources.
Without a bulkhead, all your outbound HTTPWhat is http?The protocol browsers and servers use to exchange web pages, API data, and other resources, defining how requests and responses are formatted. calls share the same connection poolWhat is connection pool?A set of pre-opened database connections that your app reuses instead of opening and closing a new one for every request.. If the recommendation service starts hanging, it consumes all connections. Now the payment service cannot get a connection either, even though it is perfectly healthy.
// Without bulkhead: shared connection pool
// If recs hangs, payments can't get a connection
const sharedPool = new ConnectionPool({ maxConnections: 50 });
// With bulkhead: isolated pools per dependency
const paymentPool = new ConnectionPool({ maxConnections: 20 });
const recsPool = new ConnectionPool({ maxConnections: 10 });
const notificationPool = new ConnectionPool({ maxConnections: 5 });Semaphore-based bulkhead
When you cannot control connection pools directly, use a semaphore to limit concurrencyWhat is concurrency?The ability of a program to handle multiple tasks at the same time, like serving thousands of users without slowing down. per dependency.
class Semaphore {
private queue: Array<() => void> = [];
private active = 0;
constructor(private maxConcurrency: number) {}
async acquire(): Promise<void> {
if (this.active < this.maxConcurrency) {
this.active++;
return;
}
return new Promise<void>((resolve) => {
this.queue.push(() => {
this.active++;
resolve();
});
});
}
release(): void {
this.active--;
const next = this.queue.shift();
if (next) next();
}
}
// Isolate each dependency
const paymentBulkhead = new Semaphore(10);
const recsBulkhead = new Semaphore(5);
async function callPaymentService(data: PaymentRequest): Promise<PaymentResult> {
await paymentBulkhead.acquire();
try {
return await fetch('https://payments.internal/charge', {
method: 'POST',
body: JSON.stringify(data),
signal: AbortSignal.timeout(5000),
}).then((r) => r.json());
} finally {
paymentBulkhead.release();
}
}| Pattern | What it protects | How |
|---|---|---|
| Timeout | Individual requests from hanging | Abort after N milliseconds |
| Timeout budget | Total request time across dependencies | Distribute time budget |
| Graceful degradation | User experience when dependencies fail | Serve cached/default data |
| Bulkhead | Healthy dependencies from failing ones | Isolate resource pools |
These patterns are not alternatives, they are layers. A well-built system uses timeouts inside circuit breakers inside bulkheads, with graceful degradation as the final safety net. Each layer catches what the previous one missed.