System Design/
Lesson

Your application servers are easy to scale horizontally, make them statelessWhat is stateless?A design where each request contains all the information the server needs, so any server can handle any request without remembering previous ones., add more behind a load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed., and you are done. But the database is where scaling gets genuinely hard. Every request eventually hits the database, and unlike stateless app servers, databases hold state that must remain consistent.

This lesson walks through database scaling strategies in the order you should consider them: simple and cheap first, complex and expensive last.

Read replicas

Most web applications are read-heavy. A typical ratio is 90% reads to 10% writes. Read replicas exploit this imbalance: you keep one primary database that handles all writes, and create one or more replicas that handle reads.

Application
  │
  ├── WRITE (INSERT, UPDATE, DELETE) ──▶ Primary DB
  │
  └── READ (SELECT) ──▶ Replica 1
                     ──▶ Replica 2
                     ──▶ Replica 3

Primary replicates data to all replicas asynchronously

Setting it up

In PostgreSQL, you configure streaming replication. The replica connects to the primary and continuously receives write-ahead logWhat is wal?Write-Ahead Logging - a database journal mode that allows concurrent reads during writes and provides better crash safety. (WAL) entries.

-- On the primary: check replication status
SELECT client_addr, state, sent_lsn, replay_lsn,
       sent_lsn - replay_lsn AS replication_lag
FROM pg_stat_replication;

In your application code, you route queries to the appropriate database:

import { Pool } from 'pg';

const primaryPool = new Pool({
  host: process.env.DB_PRIMARY_HOST,
  database: 'myapp',
});

const replicaPool = new Pool({
  host: process.env.DB_REPLICA_HOST,
  database: 'myapp',
});

// Write operations always go to primary
async function createUser(name: string, email: string) {
  return primaryPool.query(
    'INSERT INTO users (name, email) VALUES (CODE_BLOCK, $2) RETURNING *',
    [name, email]
  );
}

// Read operations go to replica
async function getUsers() {
  return replicaPool.query('SELECT * FROM users ORDER BY created_at DESC');
}

// Critical reads (right after a write) go to primary
async function getUserAfterUpdate(id: string) {
  return primaryPool.query('SELECT * FROM users WHERE id = CODE_BLOCK', [id]);
}

Replication lag

Replicas are not instant. There is always some delay between a write hitting the primary and appearing on the replica, typically milliseconds, but it can spike to seconds under heavy load.

This means a user who updates their profile and immediately refreshes the page might see the old data if the read hits a replica that has not caught up yet. You handle this with read-your-writes consistency: route reads that follow a recent write to the primary instead of a replica.

02

Connection pooling

Every database connection consumes memory, typically 5-10 MB per connection in PostgreSQL. If you have 10 app servers, each opening 20 connections, that is 200 connections consuming over 1 GB of RAM just for connection overhead.

Connection poolers like PgBouncer sit between your application and the database, maintaining a pool of reusable connections.

Without pooling:
  App Server 1 (20 conns) ──┐
  App Server 2 (20 conns) ──┼──▶ PostgreSQL (60 connections, ~600 MB)
  App Server 3 (20 conns) ──┘

With PgBouncer:
  App Server 1 (20 conns) ──┐
  App Server 2 (20 conns) ──┼──▶ PgBouncer (10 actual conns) ──▶ PostgreSQL (~100 MB)
  App Server 3 (20 conns) ──┘

PgBouncer multiplexes: 60 application connections share 10 real database connections. Queries are queued and dispatched as connections become available. The database sees far fewer connections, uses less memory, and performs better.

ini
; pgbouncer.ini
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp

[pgbouncer]
listen_port = 6432
pool_mode = transaction          ; release connection after each transaction
max_client_conn = 500            ; accept up to 500 app connections
default_pool_size = 20           ; only 20 real DB connections
reserve_pool_size = 5            ; extra connections for spikes
Use pool_mode = transaction for web applications. It releases the real database connection after each transaction completes, maximizing connection reuse. The session mode holds a connection for the entire client session, which defeats the purpose for short-lived web requests.
03

Partitioning

When a single table grows to hundreds of millions of rows, queries slow down even with proper indexes. Partitioning splits one large table into smaller physical pieces while keeping them logically unified.

Partition strategies

-- Range partitioning: split by date ranges
CREATE TABLE orders (
    id BIGINT,
    user_id BIGINT,
    amount DECIMAL(10,2),
    created_at TIMESTAMP
) PARTITION BY RANGE (created_at);

CREATE TABLE orders_2024 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
CREATE TABLE orders_2025 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
CREATE TABLE orders_2026 PARTITION OF orders
    FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');
-- Hash partitioning: distribute evenly by hash of a column
CREATE TABLE sessions (
    id UUID,
    user_id BIGINT,
    data JSONB
) PARTITION BY HASH (user_id);

CREATE TABLE sessions_0 PARTITION OF sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE sessions_1 PARTITION OF sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE sessions_2 PARTITION OF sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE sessions_3 PARTITION OF sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 3);
StrategyHow it splits dataBest forWatch out for
RangeBy value ranges (dates, IDs)Time-series data, logs, archiving old dataHot partitions if most queries hit recent data
HashBy hash of a column valueEven distribution across partitionsCannot do efficient range queries
ListBy explicit value lists (country, category)Multi-region data, enumerated categoriesUneven partition sizes
04

ShardingWhat is sharding?Splitting a database across multiple servers by distributing rows based on a key, so each server handles only a portion of the total data.

Sharding is partitioning taken to the next level: instead of splitting data within one database server, you distribute it across multiple independent database servers.

Partitioning:                      Sharding:
┌─────────────────────┐            ┌──────────┐  ┌──────────┐  ┌──────────┐
│  One DB server       │            │ DB Shard 1│  │ DB Shard 2│  │ DB Shard 3│
│ ┌───────┬───────┐   │            │ users A-H │  │ users I-P │  │ users Q-Z │
│ │Part 1 │Part 2 │   │            └──────────┘  └──────────┘  └──────────┘
│ └───────┴───────┘   │              3 separate database servers
└─────────────────────┘

Shard key selection

The shard key determines which server holds each piece of data. This is the most critical decision in a sharding strategy because changing it later requires migrating all your data.

// Simple shard routing by user ID
function getShardForUser(userId: number): string {
  const shardCount = 3;
  const shardIndex = userId % shardCount;
  return `shard-${shardIndex}`;
}

// Route query to the correct shard
async function getUser(userId: number) {
  const shardName = getShardForUser(userId);
  const db = shardConnections[shardName];
  return db.query('SELECT * FROM users WHERE id = CODE_BLOCK', [userId]);
}

A good shard key has three properties:

  1. High cardinality: many distinct values (user IDs: good; country codes: bad, you only have ~200)
  2. Even distribution: data spreads roughly equally across shards (random UUIDsWhat is uuid?Universally Unique Identifier - a 128-bit random string used as a globally unique ID for database records, events, or request tracking.: great; auto-increment IDs for a SaaS with one huge tenant: terrible)
  3. Query alignment: most queries include the shard key, so you know which shard to hit (queries without the shard key must fan out to all shards, called a scatter-gather, and it is slow)

What sharding breaks

Sharding introduces real complexity. Cross-shard queries (JOINs across two users on different shards) become impossible or extremely expensive. Auto-incrementing IDs no longer work (each shard has its own sequence). Transactions spanning multiple shards require distributed transactionWhat is transaction?A group of database operations that either all succeed together or all fail together, preventing partial updates. protocols like two-phase commitWhat is commit?A permanent snapshot of your staged changes saved in Git's history, identified by a unique hash and accompanied by a message describing what changed., which is slow and fragile.

05

The scaling ladder

StrategyComplexityWhen to useTypical scale
Vertical scaling (bigger machine)TrivialFirst move, alwaysUp to ~10K RPS
Connection pooling (PgBouncer)LowWhen connection count > 10010x connection efficiency
Read replicasLow-mediumRead-heavy workloads (>80% reads)2-5x read throughput
Table partitioningMediumTables > 100M rowsBetter query performance
ShardingVery highSingle DB server maxed outVirtually unlimited
The golden rule
exhaust each simpler strategy before moving to the next. Sharding is powerful but adds enormous operational complexity. Many successful companies (Basecamp, Stack Overflow) run on a single well-tuned database for years.
AI pitfall
AI will suggest sharding as a solution for "large databases." What AI gets wrong: most applications never need sharding. A single PostgreSQL instance with proper indexes can handle tables with hundreds of millions of rows. Sharding means every query needs to know which shard to hit, cross-shard joins become impossible, and rebalancing shards is a multi-week project. Exhaust vertical scaling, read replicas, and partitioning first.
Good to know
Read replicas are the lowest-effort database scaling technique and solve the most common problem (too many reads). Set up a replica in your cloud provider's dashboard, point read-heavy queries to it, and your primary database has more capacity for writes. Most ORMs support read/write splitting with minimal code changes.
Edge case
Replication lag between the primary database and read replicas creates subtle bugs. A user creates an order (write to primary), then immediately views their orders page (read from replica). If the replica is 200ms behind, the new order is missing. Use read-your-writes consistency for user-facing read-after-write scenarios, route those reads to the primary.