Load Balancing Techniques

Create a free account to save your progress

Earn XP, track streaks, and sync your dashboard across devices.

Lesson

A load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed. sits between clients and your servers. It distributes traffic, detects unhealthy servers, and provides a single stable entry point regardless of how many servers are behind it.

L4 vs L7 load balancing

L4: Transport layer

An L4 load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed. works at the TCP/UDP level. It sees IP addresses and ports but does not understand HTTPWhat is http?The protocol browsers and servers use to exchange web pages, API data, and other resources, defining how requests and responses are formatted.. Extremely fast, often millions of connections per second.

Client (192.168.1.10:54321)
  │
  ▼
L4 Load Balancer (looks at IP + port only)
  │
  ├──▶ Server A (10.0.0.1:8080)
  └──▶ Server B (10.0.0.2:8080)

Use L4 when: you need raw throughputWhat is throughput?The number of requests or operations a system can handle per unit of time, like requests per second. or are load balancing non-HTTP protocols (databases, game servers, gRPCWhat is grpc?A high-performance protocol for service-to-service communication that sends data in a compact binary format instead of JSON text.).

L7: Application layer

An L7 load balancer understands HTTP. It reads headers, URLs, and cookies, enabling content-based routing.

Client sends: GET /api/users HTTP/1.1
  │            Host: app.example.com
  │            Cookie: session=abc123
  ▼
L7 Load Balancer (reads URL, headers, cookies)
  │
  ├──▶ /api/*     → API server pool
  ├──▶ /static/*  → CDN or static server
  └──▶ /admin/*   → Admin server pool

# Nginx L7 load balancing example
upstream api_servers {
    server 10.0.0.1:3000;
    server 10.0.0.2:3000;
}

upstream static_servers {
    server 10.0.0.3:80;
    server 10.0.0.4:80;
}

server {
    listen 80;

    location /api/ {
        proxy_pass http://api_servers;
    }

    location /static/ {
        proxy_pass http://static_servers;
    }
}

Feature	L4 load balancer	L7 load balancer
Operates at	TCP/UDP	HTTP/HTTPS
Sees	IP, port, raw bytes	URLs, headers, cookies, body
Speed	Very fast (millions of conn/s)	Slower (must parse HTTP)
Routing logic	IP/port based	Content-based (URL, header, cookie)
SSL termination	Pass-through or terminate	Typically terminates SSL
Use cases	Databases, game servers, gRPC	Web apps, APIs, microservices
Cloud examples	AWS NLB, GCP TCP LB	AWS ALB, Cloudflare, GCP HTTP LB

Load balancing algorithms

Round-robin

Send each request to the next server in line. Works when all servers are identical and requests take roughly the same time.

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A  (back to the start)

Weighted round-robin

Servers with more capacity get more requests. Useful for mixed instance sizes or rolling deployments.

Server A (weight: 3) → gets 3 out of every 5 requests
Server B (weight: 2) → gets 2 out of every 5 requests

Least connections

Send each request to the server with the fewest active connections. The best default for most web applications, adapts to uneven workloads without configuration.

Server A: 12 active connections
Server B: 3 active connections  ← next request goes here
Server C: 8 active connections

IP hash

Hash the client's IP to always route the same user to the same server. Provides sessionWhat is session?A server-side record that tracks a logged-in user. The browser holds only a session ID in a cookie, and the server looks up the full data on each request. affinity but creates problems with uneven distribution and failoverWhat is failover?Automatically switching traffic from a failed server or service to a healthy backup to keep the system running..

hash("192.168.1.10") % 3 = 1 → always Server B
hash("10.0.0.55")    % 3 = 0 → always Server A

Comparison of algorithms

Algorithm	How it works	Best for	Drawback
Round-robin	Cycle through servers sequentially	Uniform requests, identical servers	Ignores server load
Weighted round-robin	Cycle with proportional distribution	Mixed server capacities	Requires manual weight tuning
Least connections	Route to server with fewest active conns	Variable request durations	Slightly more overhead
IP hash	Hash client IP to pick server	Session affinity needs	Uneven distribution, failover issues
Least response time	Route to fastest-responding server	Latency-sensitive applications	Requires active measurement

Health checks

Active health checks

The load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed. periodically sends a request to each server (typically GET /health). If a server fails consecutive checks, it is removed from the pool.

// Typical health endpoint
app.get('/health', async (req, res) => {
  try {
    await db.query('SELECT 1');
    await redis.ping();

    res.status(200).json({
      status: 'healthy',
      uptime: process.uptime(),
      timestamp: Date.now()
    });
  } catch (err) {
    res.status(503).json({
      status: 'unhealthy',
      error: err.message
    });
  }
});

# Nginx active health check configuration
upstream backend {
    server 10.0.0.1:3000;
    server 10.0.0.2:3000;

    # Check every 5 seconds, mark unhealthy after 3 failures,
    # mark healthy again after 2 successes
    health_check interval=5s fails=3 passes=2;
}

Passive health checks

The load balancer monitors real traffic instead of probing. If a server starts returning errors (5xx, timeouts), it is marked unhealthy. Slower to detect problems, so most production setups use both types together.

Health check type	How it works	Detection speed	Overhead
Active	LB sends periodic probes	Fast (seconds)	LB must send requests
Passive	LB monitors real responses	Slower (depends on traffic)	Zero additional requests
Combined	Both active probes + response monitoring	Fastest	Moderate

Sticky sessions and why they are problematic

Sticky sessions route all of a user's requests to the same server, typically via a cookieWhat is cookie?A small piece of data the browser stores and automatically sends with every request to the matching server, often used for sessions..

First request:
  Client → LB → Server B
  LB sets cookie: SERVERID=B

All subsequent requests:
  Client (cookie: SERVERID=B) → LB → Server B (always)

They are a workaround for stateful applications, but they cause real problems:

Uneven load: heavy sessions cluster on one server while others sit idle
FailoverWhat is failover?Automatically switching traffic from a failed server or service to a healthy backup to keep the system running. breaks sessions: if Server B goes down, all pinned users lose their sessions
Scaling is constrained: adding servers does not help users already pinned to existing ones
Deployment is risky: rolling updates leave some users on old servers with no clean migrationWhat is migration?A versioned script that changes your database structure (add a column, create a table) so every developer and server stays in sync.

The fix is to externalize state. Store sessions in Redis, use JWTs, and let any server handle any request.

// Instead of in-memory sessions...
app.use(session({ store: new MemoryStore() })); // BAD: tied to one server

// ...use Redis-backed sessions
app.use(session({
  store: new RedisStore({ client: redisClient }),
  secret: process.env.SESSION_SECRET,
  resave: false,
  saveUninitialized: false
}));

AI pitfall

AI-generated load balancer configs almost always use round-robin. What AI gets wrong: round-robin assumes all servers are equally fast and all requests equally expensive. Least-connections is usually a better default because it routes traffic to whichever server has the most available capacity.

Good to know

Health checks are the most important load balancer configuration. Without them, the load balancer sends traffic to dead servers, turning a single-server failure into a partial outage for all users. Always configure a health check endpoint (/health) and an appropriate check interval.

Edge case

WebSocket connections break the load balancing model. A WebSocket is a persistent connection to a specific server, so the load balancer cannot redistribute it. You need sticky sessions for WebSockets or a dedicated WebSocket server pool with a separate scaling strategy.

Done

Complete & Next