A load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed. sits between clients and your servers. It distributes traffic, detects unhealthy servers, and provides a single stable entry point regardless of how many servers are behind it.
L4 vs L7 load balancing
L4: Transport layer
An L4 load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed. works at the TCP/UDP level. It sees IP addresses and ports but does not understand HTTPWhat is http?The protocol browsers and servers use to exchange web pages, API data, and other resources, defining how requests and responses are formatted.. Extremely fast, often millions of connections per second.
Client (192.168.1.10:54321)
│
▼
L4 Load Balancer (looks at IP + port only)
│
├──▶ Server A (10.0.0.1:8080)
└──▶ Server B (10.0.0.2:8080)Use L4 when: you need raw throughputWhat is throughput?The number of requests or operations a system can handle per unit of time, like requests per second. or are load balancing non-HTTP protocols (databases, game servers, gRPCWhat is grpc?A high-performance protocol for service-to-service communication that sends data in a compact binary format instead of JSON text.).
L7: Application layer
An L7 load balancer understands HTTP. It reads headers, URLs, and cookies, enabling content-based routing.
Client sends: GET /api/users HTTP/1.1
│ Host: app.example.com
│ Cookie: session=abc123
▼
L7 Load Balancer (reads URL, headers, cookies)
│
├──▶ /api/* → API server pool
├──▶ /static/* → CDN or static server
└──▶ /admin/* → Admin server pool# Nginx L7 load balancing example
upstream api_servers {
server 10.0.0.1:3000;
server 10.0.0.2:3000;
}
upstream static_servers {
server 10.0.0.3:80;
server 10.0.0.4:80;
}
server {
listen 80;
location /api/ {
proxy_pass http://api_servers;
}
location /static/ {
proxy_pass http://static_servers;
}
}| Feature | L4 load balancer | L7 load balancer |
|---|---|---|
| Operates at | TCP/UDP | HTTP/HTTPS |
| Sees | IP, port, raw bytes | URLs, headers, cookies, body |
| Speed | Very fast (millions of conn/s) | Slower (must parse HTTP) |
| Routing logic | IP/port based | Content-based (URL, header, cookie) |
| SSL termination | Pass-through or terminate | Typically terminates SSL |
| Use cases | Databases, game servers, gRPC | Web apps, APIs, microservices |
| Cloud examples | AWS NLB, GCP TCP LB | AWS ALB, Cloudflare, GCP HTTP LB |
Load balancing algorithms
Round-robin
Send each request to the next server in line. Works when all servers are identical and requests take roughly the same time.
Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (back to the start)Weighted round-robin
Servers with more capacity get more requests. Useful for mixed instance sizes or rolling deployments.
Server A (weight: 3) → gets 3 out of every 5 requests
Server B (weight: 2) → gets 2 out of every 5 requestsLeast connections
Send each request to the server with the fewest active connections. The best default for most web applications, adapts to uneven workloads without configuration.
Server A: 12 active connections
Server B: 3 active connections ← next request goes here
Server C: 8 active connectionsIP hash
Hash the client's IP to always route the same user to the same server. Provides sessionWhat is session?A server-side record that tracks a logged-in user. The browser holds only a session ID in a cookie, and the server looks up the full data on each request. affinity but creates problems with uneven distribution and failoverWhat is failover?Automatically switching traffic from a failed server or service to a healthy backup to keep the system running..
hash("192.168.1.10") % 3 = 1 → always Server B
hash("10.0.0.55") % 3 = 0 → always Server AComparison of algorithms
| Algorithm | How it works | Best for | Drawback |
|---|---|---|---|
| Round-robin | Cycle through servers sequentially | Uniform requests, identical servers | Ignores server load |
| Weighted round-robin | Cycle with proportional distribution | Mixed server capacities | Requires manual weight tuning |
| Least connections | Route to server with fewest active conns | Variable request durations | Slightly more overhead |
| IP hash | Hash client IP to pick server | Session affinity needs | Uneven distribution, failover issues |
| Least response time | Route to fastest-responding server | Latency-sensitive applications | Requires active measurement |
Health checks
Active health checks
The load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed. periodically sends a request to each server (typically GET /health). If a server fails consecutive checks, it is removed from the pool.
// Typical health endpoint
app.get('/health', async (req, res) => {
try {
await db.query('SELECT 1');
await redis.ping();
res.status(200).json({
status: 'healthy',
uptime: process.uptime(),
timestamp: Date.now()
});
} catch (err) {
res.status(503).json({
status: 'unhealthy',
error: err.message
});
}
});# Nginx active health check configuration
upstream backend {
server 10.0.0.1:3000;
server 10.0.0.2:3000;
# Check every 5 seconds, mark unhealthy after 3 failures,
# mark healthy again after 2 successes
health_check interval=5s fails=3 passes=2;
}Passive health checks
The load balancer monitors real traffic instead of probing. If a server starts returning errors (5xx, timeouts), it is marked unhealthy. Slower to detect problems, so most production setups use both types together.
| Health check type | How it works | Detection speed | Overhead |
|---|---|---|---|
| Active | LB sends periodic probes | Fast (seconds) | LB must send requests |
| Passive | LB monitors real responses | Slower (depends on traffic) | Zero additional requests |
| Combined | Both active probes + response monitoring | Fastest | Moderate |
Sticky sessions and why they are problematic
Sticky sessions route all of a user's requests to the same server, typically via a cookieWhat is cookie?A small piece of data the browser stores and automatically sends with every request to the matching server, often used for sessions..
First request:
Client → LB → Server B
LB sets cookie: SERVERID=B
All subsequent requests:
Client (cookie: SERVERID=B) → LB → Server B (always)They are a workaround for stateful applications, but they cause real problems:
- Uneven load: heavy sessions cluster on one server while others sit idle
- FailoverWhat is failover?Automatically switching traffic from a failed server or service to a healthy backup to keep the system running. breaks sessions: if Server B goes down, all pinned users lose their sessions
- Scaling is constrained: adding servers does not help users already pinned to existing ones
- Deployment is risky: rolling updates leave some users on old servers with no clean migrationWhat is migration?A versioned script that changes your database structure (add a column, create a table) so every developer and server stays in sync.
The fix is to externalize state. Store sessions in Redis, use JWTs, and let any server handle any request.
// Instead of in-memory sessions...
app.use(session({ store: new MemoryStore() })); // BAD: tied to one server
// ...use Redis-backed sessions
app.use(session({
store: new RedisStore({ client: redisClient }),
secret: process.env.SESSION_SECRET,
resave: false,
saveUninitialized: false
}));/health) and an appropriate check interval.