Tech Vocabulary/
Lesson

Your app works with 100 users. What happens at 1 million? Scaling is how systems handle growth without breaking, not just bigger computers, but smarter architecture.

Vertical vs horizontal scalingWhat is horizontal scaling?Adding more machines to handle increased load, rather than upgrading a single machine to be more powerful.

Vertical scalingWhat is vertical scaling?Making a single machine more powerful by adding CPU, RAM, or storage, rather than adding more machines.: upgrade your computer

Adding more power (CPU, RAM, storage) to a single server. Simple and requires no code changes, but hits hardware limits, costs exponentially more, and creates a single point of failure.

Best for: Small applications, databases that can't be easily distributed.

Horizontal scaling: add more computers

Adding more servers and distributing work among them. Virtually unlimited scale, cost-effective, and resilient, but more complex, requiring load balancing and data sync across servers.

Best for: High-traffic applications, variable traffic patterns (Black Friday, viral content).

The ceiling problem
Vertical scaling hits a ceiling, you can't buy infinite RAM. That's why every major tech company uses horizontal scaling.
02

Caching: saving work for later

Caching stores copies of expensive-to-compute data so you don't recompute it.

Browser cache

Your browser downloads files on first visit, then reuses them. Cache headers control this: Cache-Control: max-age=3600 keeps files for 1 hour. Tradeoff: cached files might be outdated.

CDNWhat is cdn?Content Delivery Network - a network of servers around the world that caches your files and serves them from the location closest to the user, making pages load faster. cache

A Content Delivery Network is servers spread across the globe. Instead of one server in New York serving everyone, you have edge servers in each region. A Tokyo user gets content from Tokyo, not New York, reducing latencyWhat is latency?The time delay between sending a request and receiving the first byte of the response, usually measured in milliseconds., originWhat is origin?The combination of protocol, domain, and port that defines a security boundary in the browser, like https://example.com:443. server load, and bandwidthWhat is bandwidth?How much data can flow through a connection at once - like the number of lanes on a highway rather than the speed limit. costs. Common CDNs: Cloudflare, AWS CloudFront, Fastly, Akamai.

Edge computing
Modern CDNs can also run code at the edge. Cloudflare Workers and AWS Lambda@Edge execute logic (personalization, auth, A/B testing) without hitting your main servers.

Application cache

Store frequently accessed data in server memory instead of querying the database every time.

Without cache:
User requests profile → Query database (50ms) → Return data

With cache:
User requests profile → Check cache (1ms) → Return data
StrategyHow it worksBest for
Cache-asideCheck cache first, fetch from DB if missingRead-heavy workloads
Write-throughWrite to cache and DB simultaneouslyData consistency critical
Write-behindWrite to cache, async write to DBHigh write throughput
TTL (Time To Live)Auto-expire cache after set timeMost common approach
Cache invalidation is notoriously hard. Update a user's profile but forget to clear their cache? They see stale data. This is one of the "two hard things in computer science" (along with naming things and off-by-one errors).
03

Load balancing: distributing the work

With multiple servers, a load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed. decides which one handles each request.

Round Robin: Take turns across servers. Least Connections: Send to the server with fewest active connections. IP Hash: Same user always hits the same server. Geographic: Route to the nearest server.

Most companies use software load balancers (NGINX, HAProxy) or cloud-managed ones (AWS ELB). Good load balancers also run health checks: if a server stops responding, traffic routes around it automatically.

04

Database scaling

Databases are usually the bottleneck because data needs to stay consistent.

Read replicas

Most apps read far more than they write. Create database copies for reading, writes go to the primary, reads go to replicas.

Write: App → Primary DB → Replicates to → Replica 1, Replica 2, Replica 3
Read:   App → Replica 1 or Replica 2 or Replica 3

Tradeoff: Replication lag (100ms-1s). A comment you just posted might not appear immediately on refresh.

ShardingWhat is sharding?Splitting a database across multiple servers by distributing rows based on a key, so each server handles only a portion of the total data.

Split data across multiple databases: Users A-M go to Database 1, Users N-Z to Database 2. Each handles less load, but cross-shard queries become complicated and rebalancing is hard.

Optimize before you scale

Database indexing: like a book's indexWhat is index?A data structure the database maintains alongside a table so it can find rows by specific columns quickly instead of scanning everything., jump directly to the right data. Query optimization: don't fetch 10,000 rows when you need 10. Connection pooling: reuse database connections instead of opening new ones.

The 80/20 rule
80% of database load often comes from 20% of queries. Fix the hot queries before scaling infrastructure.
05

Auto-scaling

Auto-scaling automatically adds or removes servers based on demand. It monitors metrics (CPU, memory, request count), triggers at thresholds ("if CPU > 70% for 5 minutes, add a server"), and scales back down when quiet to save money.

Types: Reactive (respond to current load), Predictive (ML-based anticipation), Scheduled (pre-planned for known events).

Scaling lag
Adding a server takes 2-5 minutes. If traffic spikes instantly, auto-scaling might not keep up, that's why "pre-warming" before big events matters.
AI pitfall
AI often recommends caching and horizontal scaling before profiling the actual bottleneck. In practice, a single unindexed query or N+1 fetch loop causes most slowdowns, no amount of infrastructure fixes bad queries.
06

Quick reference: Scaling strategies

StrategyWhat it isBest forTradeoff
Vertical scalingBigger serverSimplicity, databasesHits hardware limits
Horizontal scalingMore serversHigh traffic, resilienceComplexity
Read replicasCopy database for readsRead-heavy appsReplication lag
ShardingSplit data across databasesMassive datasetsQuery complexity
CachingStore frequently used dataAlmost everythingStale data risk
CDNGlobal edge serversStatic content, global usersCost at scale
Load balancingDistribute trafficMultiple serversSingle point of failure if not redundant
Auto-scalingAutomatic server adjustmentVariable trafficScaling lag