There are exactly two directions to handle growing traffic: make the machine bigger, or add more machines. The complexity lives in choosing which approach, when, and how to handle the consequences of each.
Vertical scalingWhat is vertical scaling?Making a single machine more powerful by adding CPU, RAM, or storage, rather than adding more machines. (scaling up)
Vertical scaling means upgrading the hardware of your existing server. More CPU cores, more RAM, faster SSDs. Your application code does not change, you just move it to a beefier machine.
Before: 1 server, 4 CPU cores, 8 GB RAM
After: 1 server, 32 CPU cores, 128 GB RAM
Code changes required: zero
Downtime: yes (typically minutes for a migration)This is the default first move. It is fast, simple, and requires no architectural changes. A single PostgreSQL database on a 64 GB machine with NVMe storage can handle a surprising amount of traffic.
Vertical scaling is ideal when your application is single-threadedWhat is single-threaded?A model where one main execution thread handles all work - Node.js uses this with an event loop to handle many requests concurrently., your team is small, or traffic is growing but not explosively.
The ceiling
Every cloud providerWhat is provider?A wrapper component that makes data available to all components nested inside it without passing props manually. has a maximum instance size, and the price curve is not linear, doubling resources often more than doubles the cost.
AWS EC2 pricing (approximate, us-east-1):
t3.medium (2 vCPU, 4 GB): ~$30/month
m6i.xlarge (4 vCPU, 16 GB): ~CODE_BLOCK40/month (4x resources, ~4.7x cost)
m6i.4xlarge (16 vCPU, 64 GB): ~$560/month (16x resources, ~18.7x cost)
m6i.16xlarge(64 vCPU, 256 GB): ~$2,240/month (64x resources, ~74.7x cost)At some point, one giant machine costs more than running ten smaller ones.
Horizontal scalingWhat is horizontal scaling?Adding more machines to handle increased load, rather than upgrading a single machine to be more powerful. (scaling out)
Horizontal scaling means running multiple copies of your application behind a load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed..
Before: 1 server handling 1,000 req/s
After: 5 servers, each handling 200 req/s
┌──── Server 1
├──── Server 2
Load Balancer ───├──── Server 3
├──── Server 4
└──── Server 5No theoretical ceiling. Need more capacity? Add another server.
The catch: state
Horizontal scaling is easy if your servers are statelessWhat is stateless?A design where each request contains all the information the server needs, so any server can handle any request without remembering previous ones.: they store no user-specific data between requests.
// Stateless: any server can handle any request
app.get('/api/profile', async (req, res) => {
const userId = verifyJWT(req.headers.authorization); // info is in the token
const user = await db.query('SELECT * FROM users WHERE id = CODE_BLOCK', [userId]);
res.json(user);
});
// Stateful: only the server that stored the session can handle this
app.get('/api/profile', (req, res) => {
const user = req.session.user; // session lives in this server's memory
res.json(user);
});The stateful version breaks the moment a different server receives the next request, because sessionWhat is session?A server-side record that tracks a logged-in user. The browser holds only a session ID in a cookie, and the server looks up the full data on each request. data is trapped in one server's memory.
Shared-nothing architecture
The gold standard for horizontal scaling is shared-nothing architecture: each node has its own CPU and memory, shares no disk or state, and communicates only through the network.
Shared-nothing:
┌─────────┐ ┌─────────┐ ┌─────────┐
│ App + ∅ │ │ App + ∅ │ │ App + ∅ │ (no shared state)
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
└────────────┼────────────┘
│
┌─────┴─────┐
│ Database │ (shared data layer, not shared memory)
└───────────┘You can add or remove nodes without affecting the others. If one crashes, the restWhat is rest?An architectural style for web APIs where URLs represent resources (nouns) and HTTP methods (GET, POST, PUT, DELETE) represent actions on those resources. keep serving traffic.
Capacity math
Before you pick a scaling strategy, you need numbers.
| Metric | How to measure | What it tells you |
|---|---|---|
| Requests per second (RPS) | Load balancer metrics, APM tools | Current demand on your system |
| Concurrent connections | netstat, server metrics | How many users are active simultaneously |
| Average response time (p50) | APM, application logs | Typical user experience |
| Tail latency (p99) | APM tools | Worst-case user experience |
| CPU utilization | System metrics (top, htop, CloudWatch) | How close you are to compute limits |
| Memory utilization | System metrics | How close you are to RAM limits |
| Disk I/O | iostat, CloudWatch | Whether storage is the bottleneck |
Current: 500 RPS, 1 server, 60% CPU
Growth: 10x in 12 months (projected)
Vertical path: need a server handling 5,000 RPS
→ Probably a 16-core machine, ~$560/month
→ Single point of failure
Horizontal path: need 10 servers handling 500 RPS each
→ 10x t3.medium at $30/month = $300/month
→ Built-in redundancy (one server down = 10% capacity loss)Vertical vs horizontal: the full comparison
| Dimension | Vertical scaling | Horizontal scaling |
|---|---|---|
| Complexity | Low, no code changes | High, need load balancer, stateless design, health checks |
| Cost curve | Exponential (diminishing returns) | Linear (pay per node) |
| Ceiling | Hard limit (max instance size) | No theoretical limit |
| Downtime risk | Single point of failure | Redundancy built in |
| Data consistency | Simple (one database) | Complex (distributed state) |
| Migration effort | Minutes (resize instance) | Days to weeks (redesign for statelessness) |
| Best for | Databases, early-stage apps, small teams | Web servers, APIs, microservices |
| When to choose | Traffic < 10K RPS, team < 5, no redundancy requirement | Traffic > 10K RPS, high availability required, cost-sensitive at scale |
The realistic path
Almost nobody starts with horizontal scalingWhat is horizontal scaling?Adding more machines to handle increased load, rather than upgrading a single machine to be more powerful. on day one. The typical journey:
- Start small: one server, vertical scalingWhat is vertical scaling?Making a single machine more powerful by adding CPU, RAM, or storage, rather than adding more machines. as needed
- Separate concerns: move the database to its own server
- Go statelessWhat is stateless?A design where each request contains all the information the server needs, so any server can handle any request without remembering previous ones.: externalize sessions (Redis, JWTWhat is jwt?JSON Web Token - a self-contained, signed token that carries user data (like user ID and role). The server can verify it without a database lookup.), remove server-side state
- Add a load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed.: run 2+ app servers for redundancy
- Scale out: add servers as traffic grows
- Specialize: separate read-heavy and write-heavy workloads
You do not need to design for Google-scale on day one. You need to design so that scaling later does not require rewriting everything.