Vertical & Horizontal Scaling

Create a free account to save your progress

Earn XP, track streaks, and sync your dashboard across devices.

Lesson

There are exactly two directions to handle growing traffic: make the machine bigger, or add more machines. The complexity lives in choosing which approach, when, and how to handle the consequences of each.

Vertical scalingWhat is vertical scaling?Making a single machine more powerful by adding CPU, RAM, or storage, rather than adding more machines. (scaling up)

Vertical scaling means upgrading the hardware of your existing server. More CPU cores, more RAM, faster SSDs. Your application code does not change, you just move it to a beefier machine.

Before:  1 server, 4 CPU cores, 8 GB RAM
After:   1 server, 32 CPU cores, 128 GB RAM

Code changes required: zero
Downtime: yes (typically minutes for a migration)

This is the default first move. It is fast, simple, and requires no architectural changes. A single PostgreSQL database on a 64 GB machine with NVMe storage can handle a surprising amount of traffic.

Vertical scaling is ideal when your application is single-threadedWhat is single-threaded?A model where one main execution thread handles all work - Node.js uses this with an event loop to handle many requests concurrently., your team is small, or traffic is growing but not explosively.

The ceiling

Every cloud providerWhat is provider?A wrapper component that makes data available to all components nested inside it without passing props manually. has a maximum instance size, and the price curve is not linear, doubling resources often more than doubles the cost.

AWS EC2 pricing (approximate, us-east-1):
t3.medium   (2 vCPU,  4 GB):   ~$30/month
m6i.xlarge  (4 vCPU,  16 GB):  ~CODE_BLOCK40/month    (4x resources, ~4.7x cost)
m6i.4xlarge (16 vCPU, 64 GB):  ~$560/month    (16x resources, ~18.7x cost)
m6i.16xlarge(64 vCPU, 256 GB): ~$2,240/month  (64x resources, ~74.7x cost)

At some point, one giant machine costs more than running ten smaller ones.

Horizontal scalingWhat is horizontal scaling?Adding more machines to handle increased load, rather than upgrading a single machine to be more powerful. (scaling out)

Horizontal scaling means running multiple copies of your application behind a load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed..

Before:  1 server handling 1,000 req/s
After:   5 servers, each handling 200 req/s

                 ┌──── Server 1
                 ├──── Server 2
Load Balancer ───├──── Server 3
                 ├──── Server 4
                 └──── Server 5

No theoretical ceiling. Need more capacity? Add another server.

The catch: state

Horizontal scaling is easy if your servers are statelessWhat is stateless?A design where each request contains all the information the server needs, so any server can handle any request without remembering previous ones.: they store no user-specific data between requests.

// Stateless: any server can handle any request
app.get('/api/profile', async (req, res) => {
  const userId = verifyJWT(req.headers.authorization); // info is in the token
  const user = await db.query('SELECT * FROM users WHERE id = CODE_BLOCK', [userId]);
  res.json(user);
});

// Stateful: only the server that stored the session can handle this
app.get('/api/profile', (req, res) => {
  const user = req.session.user; // session lives in this server's memory
  res.json(user);
});

The stateful version breaks the moment a different server receives the next request, because sessionWhat is session?A server-side record that tracks a logged-in user. The browser holds only a session ID in a cookie, and the server looks up the full data on each request. data is trapped in one server's memory.

Shared-nothing architecture

The gold standard for horizontal scaling is shared-nothing architecture: each node has its own CPU and memory, shares no disk or state, and communicates only through the network.

Shared-nothing:
┌─────────┐  ┌─────────┐  ┌─────────┐
│ App + ∅ │  │ App + ∅ │  │ App + ∅ │   (no shared state)
└────┬────┘  └────┬────┘  └────┬────┘
     │            │            │
     └────────────┼────────────┘
                  │
            ┌─────┴─────┐
            │ Database  │  (shared data layer, not shared memory)
            └───────────┘

You can add or remove nodes without affecting the others. If one crashes, the restWhat is rest?An architectural style for web APIs where URLs represent resources (nouns) and HTTP methods (GET, POST, PUT, DELETE) represent actions on those resources. keep serving traffic.

Capacity math

Before you pick a scaling strategy, you need numbers.

Metric	How to measure	What it tells you
Requests per second (RPS)	Load balancer metrics, APM tools	Current demand on your system
Concurrent connections	`netstat`, server metrics	How many users are active simultaneously
Average response time (p50)	APM, application logs	Typical user experience
Tail latency (p99)	APM tools	Worst-case user experience
CPU utilization	System metrics (top, htop, CloudWatch)	How close you are to compute limits
Memory utilization	System metrics	How close you are to RAM limits
Disk I/O	iostat, CloudWatch	Whether storage is the bottleneck

Current: 500 RPS, 1 server, 60% CPU
Growth:  10x in 12 months (projected)

Vertical path: need a server handling 5,000 RPS
  → Probably a 16-core machine, ~$560/month
  → Single point of failure

Horizontal path: need 10 servers handling 500 RPS each
  → 10x t3.medium at $30/month = $300/month
  → Built-in redundancy (one server down = 10% capacity loss)

Vertical vs horizontal: the full comparison

Dimension	Vertical scaling	Horizontal scaling
Complexity	Low, no code changes	High, need load balancer, stateless design, health checks
Cost curve	Exponential (diminishing returns)	Linear (pay per node)
Ceiling	Hard limit (max instance size)	No theoretical limit
Downtime risk	Single point of failure	Redundancy built in
Data consistency	Simple (one database)	Complex (distributed state)
Migration effort	Minutes (resize instance)	Days to weeks (redesign for statelessness)
Best for	Databases, early-stage apps, small teams	Web servers, APIs, microservices
When to choose	Traffic < 10K RPS, team < 5, no redundancy requirement	Traffic > 10K RPS, high availability required, cost-sensitive at scale

The realistic path

Almost nobody starts with horizontal scalingWhat is horizontal scaling?Adding more machines to handle increased load, rather than upgrading a single machine to be more powerful. on day one. The typical journey:

Start small: one server, vertical scalingWhat is vertical scaling?Making a single machine more powerful by adding CPU, RAM, or storage, rather than adding more machines. as needed
Separate concerns: move the database to its own server
Go statelessWhat is stateless?A design where each request contains all the information the server needs, so any server can handle any request without remembering previous ones.: externalize sessions (Redis, JWTWhat is jwt?JSON Web Token - a self-contained, signed token that carries user data (like user ID and role). The server can verify it without a database lookup.), remove server-side state
Add a load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed.: run 2+ app servers for redundancy
Scale out: add servers as traffic grows
Specialize: separate read-heavy and write-heavy workloads

You do not need to design for Google-scale on day one. You need to design so that scaling later does not require rewriting everything.

AI pitfall

Ask AI "how should I scale my app?" and it will jump straight to horizontal scaling with load balancers and auto-scaling groups. What AI gets wrong: for most applications, simply upgrading to the next server size buys you months of headroom for a few dollars more per month. Always exhaust vertical scaling before adding the complexity of multiple servers.

Good to know

The single most important thing you can do for future scalability is make your application stateless. If your server stores session data in memory, file uploads on local disk, or background job state in a process variable, horizontal scaling is impossible. Move state to external stores (Redis, S3, a database) and scaling becomes a matter of adding more servers behind a load balancer.

Edge case

Auto-scaling has a cold-start problem. New instances need time to boot, load code, warm caches, and establish database connections. A traffic spike that doubles in 30 seconds will overwhelm your system before auto-scaling kicks in. Always keep a baseline of servers above your minimum expected traffic.

Done

Complete & Next