System Design/
Lesson

A load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed. sits between clients and your servers. It distributes traffic, detects unhealthy servers, and provides a single stable entry point regardless of how many servers are behind it.

L4 vs L7 load balancing

L4: Transport layer

An L4 load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed. works at the TCP/UDP level. It sees IP addresses and ports but does not understand HTTPWhat is http?The protocol browsers and servers use to exchange web pages, API data, and other resources, defining how requests and responses are formatted.. Extremely fast, often millions of connections per second.

Client (192.168.1.10:54321)
  │
  ▼
L4 Load Balancer (looks at IP + port only)
  │
  ├──▶ Server A (10.0.0.1:8080)
  └──▶ Server B (10.0.0.2:8080)

Use L4 when: you need raw throughputWhat is throughput?The number of requests or operations a system can handle per unit of time, like requests per second. or are load balancing non-HTTP protocols (databases, game servers, gRPCWhat is grpc?A high-performance protocol for service-to-service communication that sends data in a compact binary format instead of JSON text.).

L7: Application layer

An L7 load balancer understands HTTP. It reads headers, URLs, and cookies, enabling content-based routing.

Client sends: GET /api/users HTTP/1.1
  │            Host: app.example.com
  │            Cookie: session=abc123
  ▼
L7 Load Balancer (reads URL, headers, cookies)
  │
  ├──▶ /api/*     → API server pool
  ├──▶ /static/*  → CDN or static server
  └──▶ /admin/*   → Admin server pool
# Nginx L7 load balancing example
upstream api_servers {
    server 10.0.0.1:3000;
    server 10.0.0.2:3000;
}

upstream static_servers {
    server 10.0.0.3:80;
    server 10.0.0.4:80;
}

server {
    listen 80;

    location /api/ {
        proxy_pass http://api_servers;
    }

    location /static/ {
        proxy_pass http://static_servers;
    }
}
FeatureL4 load balancerL7 load balancer
Operates atTCP/UDPHTTP/HTTPS
SeesIP, port, raw bytesURLs, headers, cookies, body
SpeedVery fast (millions of conn/s)Slower (must parse HTTP)
Routing logicIP/port basedContent-based (URL, header, cookie)
SSL terminationPass-through or terminateTypically terminates SSL
Use casesDatabases, game servers, gRPCWeb apps, APIs, microservices
Cloud examplesAWS NLB, GCP TCP LBAWS ALB, Cloudflare, GCP HTTP LB
02

Load balancing algorithms

Round-robin

Send each request to the next server in line. Works when all servers are identical and requests take roughly the same time.

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A  (back to the start)

Weighted round-robin

Servers with more capacity get more requests. Useful for mixed instance sizes or rolling deployments.

Server A (weight: 3) → gets 3 out of every 5 requests
Server B (weight: 2) → gets 2 out of every 5 requests

Least connections

Send each request to the server with the fewest active connections. The best default for most web applications, adapts to uneven workloads without configuration.

Server A: 12 active connections
Server B: 3 active connections  ← next request goes here
Server C: 8 active connections

IP hash

Hash the client's IP to always route the same user to the same server. Provides sessionWhat is session?A server-side record that tracks a logged-in user. The browser holds only a session ID in a cookie, and the server looks up the full data on each request. affinity but creates problems with uneven distribution and failoverWhat is failover?Automatically switching traffic from a failed server or service to a healthy backup to keep the system running..

hash("192.168.1.10") % 3 = 1 → always Server B
hash("10.0.0.55")    % 3 = 0 → always Server A

Comparison of algorithms

AlgorithmHow it worksBest forDrawback
Round-robinCycle through servers sequentiallyUniform requests, identical serversIgnores server load
Weighted round-robinCycle with proportional distributionMixed server capacitiesRequires manual weight tuning
Least connectionsRoute to server with fewest active connsVariable request durationsSlightly more overhead
IP hashHash client IP to pick serverSession affinity needsUneven distribution, failover issues
Least response timeRoute to fastest-responding serverLatency-sensitive applicationsRequires active measurement
03

Health checks

Active health checks

The load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed. periodically sends a request to each server (typically GET /health). If a server fails consecutive checks, it is removed from the pool.

// Typical health endpoint
app.get('/health', async (req, res) => {
  try {
    await db.query('SELECT 1');
    await redis.ping();

    res.status(200).json({
      status: 'healthy',
      uptime: process.uptime(),
      timestamp: Date.now()
    });
  } catch (err) {
    res.status(503).json({
      status: 'unhealthy',
      error: err.message
    });
  }
});
# Nginx active health check configuration
upstream backend {
    server 10.0.0.1:3000;
    server 10.0.0.2:3000;

    # Check every 5 seconds, mark unhealthy after 3 failures,
    # mark healthy again after 2 successes
    health_check interval=5s fails=3 passes=2;
}

Passive health checks

The load balancer monitors real traffic instead of probing. If a server starts returning errors (5xx, timeouts), it is marked unhealthy. Slower to detect problems, so most production setups use both types together.

Health check typeHow it worksDetection speedOverhead
ActiveLB sends periodic probesFast (seconds)LB must send requests
PassiveLB monitors real responsesSlower (depends on traffic)Zero additional requests
CombinedBoth active probes + response monitoringFastestModerate
04

Sticky sessions and why they are problematic

Sticky sessions route all of a user's requests to the same server, typically via a cookieWhat is cookie?A small piece of data the browser stores and automatically sends with every request to the matching server, often used for sessions..

First request:
  Client → LB → Server B
  LB sets cookie: SERVERID=B

All subsequent requests:
  Client (cookie: SERVERID=B)LB → Server B (always)

They are a workaround for stateful applications, but they cause real problems:

  1. Uneven load: heavy sessions cluster on one server while others sit idle
  2. FailoverWhat is failover?Automatically switching traffic from a failed server or service to a healthy backup to keep the system running. breaks sessions: if Server B goes down, all pinned users lose their sessions
  3. Scaling is constrained: adding servers does not help users already pinned to existing ones
  4. Deployment is risky: rolling updates leave some users on old servers with no clean migrationWhat is migration?A versioned script that changes your database structure (add a column, create a table) so every developer and server stays in sync.

The fix is to externalize state. Store sessions in Redis, use JWTs, and let any server handle any request.

// Instead of in-memory sessions...
app.use(session({ store: new MemoryStore() })); // BAD: tied to one server

// ...use Redis-backed sessions
app.use(session({
  store: new RedisStore({ client: redisClient }),
  secret: process.env.SESSION_SECRET,
  resave: false,
  saveUninitialized: false
}));
AI pitfall
AI-generated load balancer configs almost always use round-robin. What AI gets wrong: round-robin assumes all servers are equally fast and all requests equally expensive. Least-connections is usually a better default because it routes traffic to whichever server has the most available capacity.
Good to know
Health checks are the most important load balancer configuration. Without them, the load balancer sends traffic to dead servers, turning a single-server failure into a partial outage for all users. Always configure a health check endpoint (/health) and an appropriate check interval.
Edge case
WebSocket connections break the load balancing model. A WebSocket is a persistent connection to a specific server, so the load balancer cannot redistribute it. You need sticky sessions for WebSockets or a dedicated WebSocket server pool with a separate scaling strategy.