System Design/
Lesson

There are exactly two directions to handle growing traffic: make the machine bigger, or add more machines. The complexity lives in choosing which approach, when, and how to handle the consequences of each.

Vertical scalingWhat is vertical scaling?Making a single machine more powerful by adding CPU, RAM, or storage, rather than adding more machines. (scaling up)

Vertical scaling means upgrading the hardware of your existing server. More CPU cores, more RAM, faster SSDs. Your application code does not change, you just move it to a beefier machine.

Before:  1 server, 4 CPU cores, 8 GB RAM
After:   1 server, 32 CPU cores, 128 GB RAM

Code changes required: zero
Downtime: yes (typically minutes for a migration)

This is the default first move. It is fast, simple, and requires no architectural changes. A single PostgreSQL database on a 64 GB machine with NVMe storage can handle a surprising amount of traffic.

Vertical scaling is ideal when your application is single-threadedWhat is single-threaded?A model where one main execution thread handles all work - Node.js uses this with an event loop to handle many requests concurrently., your team is small, or traffic is growing but not explosively.

The ceiling

Every cloud providerWhat is provider?A wrapper component that makes data available to all components nested inside it without passing props manually. has a maximum instance size, and the price curve is not linear, doubling resources often more than doubles the cost.

AWS EC2 pricing (approximate, us-east-1):
t3.medium   (2 vCPU,  4 GB):   ~$30/month
m6i.xlarge  (4 vCPU,  16 GB):  ~CODE_BLOCK40/month    (4x resources, ~4.7x cost)
m6i.4xlarge (16 vCPU, 64 GB):  ~$560/month    (16x resources, ~18.7x cost)
m6i.16xlarge(64 vCPU, 256 GB): ~$2,240/month  (64x resources, ~74.7x cost)

At some point, one giant machine costs more than running ten smaller ones.

02

Horizontal scalingWhat is horizontal scaling?Adding more machines to handle increased load, rather than upgrading a single machine to be more powerful. (scaling out)

Horizontal scaling means running multiple copies of your application behind a load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed..

Before:  1 server handling 1,000 req/s
After:   5 servers, each handling 200 req/s

                 ┌──── Server 1
                 ├──── Server 2
Load Balancer ───├──── Server 3
                 ├──── Server 4
                 └──── Server 5

No theoretical ceiling. Need more capacity? Add another server.

The catch: state

Horizontal scaling is easy if your servers are statelessWhat is stateless?A design where each request contains all the information the server needs, so any server can handle any request without remembering previous ones.: they store no user-specific data between requests.

// Stateless: any server can handle any request
app.get('/api/profile', async (req, res) => {
  const userId = verifyJWT(req.headers.authorization); // info is in the token
  const user = await db.query('SELECT * FROM users WHERE id = CODE_BLOCK', [userId]);
  res.json(user);
});

// Stateful: only the server that stored the session can handle this
app.get('/api/profile', (req, res) => {
  const user = req.session.user; // session lives in this server's memory
  res.json(user);
});

The stateful version breaks the moment a different server receives the next request, because sessionWhat is session?A server-side record that tracks a logged-in user. The browser holds only a session ID in a cookie, and the server looks up the full data on each request. data is trapped in one server's memory.

Shared-nothing architecture

The gold standard for horizontal scaling is shared-nothing architecture: each node has its own CPU and memory, shares no disk or state, and communicates only through the network.

Shared-nothing:
┌─────────┐  ┌─────────┐  ┌─────────┐
│ App + ∅ │  │ App + ∅ │  │ App +   (no shared state)
└────┬────┘  └────┬────┘  └────┬────┘
     │            │            │
     └────────────┼────────────┘
                  │
            ┌─────┴─────┐
            │ Database    (shared data layer, not shared memory)
            └───────────┘

You can add or remove nodes without affecting the others. If one crashes, the restWhat is rest?An architectural style for web APIs where URLs represent resources (nouns) and HTTP methods (GET, POST, PUT, DELETE) represent actions on those resources. keep serving traffic.

03

Capacity math

Before you pick a scaling strategy, you need numbers.

MetricHow to measureWhat it tells you
Requests per second (RPS)Load balancer metrics, APM toolsCurrent demand on your system
Concurrent connectionsnetstat, server metricsHow many users are active simultaneously
Average response time (p50)APM, application logsTypical user experience
Tail latency (p99)APM toolsWorst-case user experience
CPU utilizationSystem metrics (top, htop, CloudWatch)How close you are to compute limits
Memory utilizationSystem metricsHow close you are to RAM limits
Disk I/Oiostat, CloudWatchWhether storage is the bottleneck
Current: 500 RPS, 1 server, 60% CPU
Growth:  10x in 12 months (projected)

Vertical path: need a server handling 5,000 RPS
  → Probably a 16-core machine, ~$560/month
  → Single point of failure

Horizontal path: need 10 servers handling 500 RPS each
  → 10x t3.medium at $30/month = $300/month
  → Built-in redundancy (one server down = 10% capacity loss)
04

Vertical vs horizontal: the full comparison

DimensionVertical scalingHorizontal scaling
ComplexityLow, no code changesHigh, need load balancer, stateless design, health checks
Cost curveExponential (diminishing returns)Linear (pay per node)
CeilingHard limit (max instance size)No theoretical limit
Downtime riskSingle point of failureRedundancy built in
Data consistencySimple (one database)Complex (distributed state)
Migration effortMinutes (resize instance)Days to weeks (redesign for statelessness)
Best forDatabases, early-stage apps, small teamsWeb servers, APIs, microservices
When to chooseTraffic < 10K RPS, team < 5, no redundancy requirementTraffic > 10K RPS, high availability required, cost-sensitive at scale
05

The realistic path

Almost nobody starts with horizontal scalingWhat is horizontal scaling?Adding more machines to handle increased load, rather than upgrading a single machine to be more powerful. on day one. The typical journey:

  1. Start small: one server, vertical scalingWhat is vertical scaling?Making a single machine more powerful by adding CPU, RAM, or storage, rather than adding more machines. as needed
  2. Separate concerns: move the database to its own server
  3. Go statelessWhat is stateless?A design where each request contains all the information the server needs, so any server can handle any request without remembering previous ones.: externalize sessions (Redis, JWTWhat is jwt?JSON Web Token - a self-contained, signed token that carries user data (like user ID and role). The server can verify it without a database lookup.), remove server-side state
  4. Add a load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed.: run 2+ app servers for redundancy
  5. Scale out: add servers as traffic grows
  6. Specialize: separate read-heavy and write-heavy workloads

You do not need to design for Google-scale on day one. You need to design so that scaling later does not require rewriting everything.

AI pitfall
Ask AI "how should I scale my app?" and it will jump straight to horizontal scaling with load balancers and auto-scaling groups. What AI gets wrong: for most applications, simply upgrading to the next server size buys you months of headroom for a few dollars more per month. Always exhaust vertical scaling before adding the complexity of multiple servers.
Good to know
The single most important thing you can do for future scalability is make your application stateless. If your server stores session data in memory, file uploads on local disk, or background job state in a process variable, horizontal scaling is impossible. Move state to external stores (Redis, S3, a database) and scaling becomes a matter of adding more servers behind a load balancer.
Edge case
Auto-scaling has a cold-start problem. New instances need time to boot, load code, warm caches, and establish database connections. A traffic spike that doubles in 30 seconds will overwhelm your system before auto-scaling kicks in. Always keep a baseline of servers above your minimum expected traffic.