Your application servers are easy to scale horizontally, make them statelessWhat is stateless?A design where each request contains all the information the server needs, so any server can handle any request without remembering previous ones., add more behind a load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed., and you are done. But the database is where scaling gets genuinely hard. Every request eventually hits the database, and unlike stateless app servers, databases hold state that must remain consistent.
This lesson walks through database scaling strategies in the order you should consider them: simple and cheap first, complex and expensive last.
Read replicas
Most web applications are read-heavy. A typical ratio is 90% reads to 10% writes. Read replicas exploit this imbalance: you keep one primary database that handles all writes, and create one or more replicas that handle reads.
Application
│
├── WRITE (INSERT, UPDATE, DELETE) ──▶ Primary DB
│
└── READ (SELECT) ──▶ Replica 1
──▶ Replica 2
──▶ Replica 3
Primary replicates data to all replicas asynchronouslySetting it up
In PostgreSQL, you configure streaming replication. The replica connects to the primary and continuously receives write-ahead logWhat is wal?Write-Ahead Logging - a database journal mode that allows concurrent reads during writes and provides better crash safety. (WAL) entries.
-- On the primary: check replication status
SELECT client_addr, state, sent_lsn, replay_lsn,
sent_lsn - replay_lsn AS replication_lag
FROM pg_stat_replication;In your application code, you route queries to the appropriate database:
import { Pool } from 'pg';
const primaryPool = new Pool({
host: process.env.DB_PRIMARY_HOST,
database: 'myapp',
});
const replicaPool = new Pool({
host: process.env.DB_REPLICA_HOST,
database: 'myapp',
});
// Write operations always go to primary
async function createUser(name: string, email: string) {
return primaryPool.query(
'INSERT INTO users (name, email) VALUES (CODE_BLOCK, $2) RETURNING *',
[name, email]
);
}
// Read operations go to replica
async function getUsers() {
return replicaPool.query('SELECT * FROM users ORDER BY created_at DESC');
}
// Critical reads (right after a write) go to primary
async function getUserAfterUpdate(id: string) {
return primaryPool.query('SELECT * FROM users WHERE id = CODE_BLOCK', [id]);
}Replication lag
Replicas are not instant. There is always some delay between a write hitting the primary and appearing on the replica, typically milliseconds, but it can spike to seconds under heavy load.
This means a user who updates their profile and immediately refreshes the page might see the old data if the read hits a replica that has not caught up yet. You handle this with read-your-writes consistency: route reads that follow a recent write to the primary instead of a replica.
Connection pooling
Every database connection consumes memory, typically 5-10 MB per connection in PostgreSQL. If you have 10 app servers, each opening 20 connections, that is 200 connections consuming over 1 GB of RAM just for connection overhead.
Connection poolers like PgBouncer sit between your application and the database, maintaining a pool of reusable connections.
Without pooling:
App Server 1 (20 conns) ──┐
App Server 2 (20 conns) ──┼──▶ PostgreSQL (60 connections, ~600 MB)
App Server 3 (20 conns) ──┘
With PgBouncer:
App Server 1 (20 conns) ──┐
App Server 2 (20 conns) ──┼──▶ PgBouncer (10 actual conns) ──▶ PostgreSQL (~100 MB)
App Server 3 (20 conns) ──┘PgBouncer multiplexes: 60 application connections share 10 real database connections. Queries are queued and dispatched as connections become available. The database sees far fewer connections, uses less memory, and performs better.
; pgbouncer.ini
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp
[pgbouncer]
listen_port = 6432
pool_mode = transaction ; release connection after each transaction
max_client_conn = 500 ; accept up to 500 app connections
default_pool_size = 20 ; only 20 real DB connections
reserve_pool_size = 5 ; extra connections for spikespool_mode = transaction for web applications. It releases the real database connection after each transaction completes, maximizing connection reuse. The session mode holds a connection for the entire client session, which defeats the purpose for short-lived web requests.Partitioning
When a single table grows to hundreds of millions of rows, queries slow down even with proper indexes. Partitioning splits one large table into smaller physical pieces while keeping them logically unified.
Partition strategies
-- Range partitioning: split by date ranges
CREATE TABLE orders (
id BIGINT,
user_id BIGINT,
amount DECIMAL(10,2),
created_at TIMESTAMP
) PARTITION BY RANGE (created_at);
CREATE TABLE orders_2024 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
CREATE TABLE orders_2025 PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
CREATE TABLE orders_2026 PARTITION OF orders
FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');-- Hash partitioning: distribute evenly by hash of a column
CREATE TABLE sessions (
id UUID,
user_id BIGINT,
data JSONB
) PARTITION BY HASH (user_id);
CREATE TABLE sessions_0 PARTITION OF sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE sessions_1 PARTITION OF sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE sessions_2 PARTITION OF sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE sessions_3 PARTITION OF sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 3);| Strategy | How it splits data | Best for | Watch out for |
|---|---|---|---|
| Range | By value ranges (dates, IDs) | Time-series data, logs, archiving old data | Hot partitions if most queries hit recent data |
| Hash | By hash of a column value | Even distribution across partitions | Cannot do efficient range queries |
| List | By explicit value lists (country, category) | Multi-region data, enumerated categories | Uneven partition sizes |
ShardingWhat is sharding?Splitting a database across multiple servers by distributing rows based on a key, so each server handles only a portion of the total data.
Sharding is partitioning taken to the next level: instead of splitting data within one database server, you distribute it across multiple independent database servers.
Partitioning: Sharding:
┌─────────────────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ One DB server │ │ DB Shard 1│ │ DB Shard 2│ │ DB Shard 3│
│ ┌───────┬───────┐ │ │ users A-H │ │ users I-P │ │ users Q-Z │
│ │Part 1 │Part 2 │ │ └──────────┘ └──────────┘ └──────────┘
│ └───────┴───────┘ │ 3 separate database servers
└─────────────────────┘Shard key selection
The shard key determines which server holds each piece of data. This is the most critical decision in a sharding strategy because changing it later requires migrating all your data.
// Simple shard routing by user ID
function getShardForUser(userId: number): string {
const shardCount = 3;
const shardIndex = userId % shardCount;
return `shard-${shardIndex}`;
}
// Route query to the correct shard
async function getUser(userId: number) {
const shardName = getShardForUser(userId);
const db = shardConnections[shardName];
return db.query('SELECT * FROM users WHERE id = CODE_BLOCK', [userId]);
}A good shard key has three properties:
- High cardinality: many distinct values (user IDs: good; country codes: bad, you only have ~200)
- Even distribution: data spreads roughly equally across shards (random UUIDsWhat is uuid?Universally Unique Identifier - a 128-bit random string used as a globally unique ID for database records, events, or request tracking.: great; auto-increment IDs for a SaaS with one huge tenant: terrible)
- Query alignment: most queries include the shard key, so you know which shard to hit (queries without the shard key must fan out to all shards, called a scatter-gather, and it is slow)
What sharding breaks
Sharding introduces real complexity. Cross-shard queries (JOINs across two users on different shards) become impossible or extremely expensive. Auto-incrementing IDs no longer work (each shard has its own sequence). Transactions spanning multiple shards require distributed transactionWhat is transaction?A group of database operations that either all succeed together or all fail together, preventing partial updates. protocols like two-phase commitWhat is commit?A permanent snapshot of your staged changes saved in Git's history, identified by a unique hash and accompanied by a message describing what changed., which is slow and fragile.
The scaling ladder
| Strategy | Complexity | When to use | Typical scale |
|---|---|---|---|
| Vertical scaling (bigger machine) | Trivial | First move, always | Up to ~10K RPS |
| Connection pooling (PgBouncer) | Low | When connection count > 100 | 10x connection efficiency |
| Read replicas | Low-medium | Read-heavy workloads (>80% reads) | 2-5x read throughput |
| Table partitioning | Medium | Tables > 100M rows | Better query performance |
| Sharding | Very high | Single DB server maxed out | Virtually unlimited |