System Design - Database Scaling

Create a free account to save your progress

Earn XP, track streaks, and sync your dashboard across devices.

Lesson

Your application servers are easy to scale horizontally, make them statelessWhat is stateless?A design where each request contains all the information the server needs, so any server can handle any request without remembering previous ones., add more behind a load balancerWhat is load balancer?A server that distributes incoming traffic across multiple backend servers so no single server gets overwhelmed., and you are done. But the database is where scaling gets genuinely hard. Every request eventually hits the database, and unlike stateless app servers, databases hold state that must remain consistent.

This lesson walks through database scaling strategies in the order you should consider them: simple and cheap first, complex and expensive last.

Read replicas

Most web applications are read-heavy. A typical ratio is 90% reads to 10% writes. Read replicas exploit this imbalance: you keep one primary database that handles all writes, and create one or more replicas that handle reads.

Application
  │
  ├── WRITE (INSERT, UPDATE, DELETE) ──▶ Primary DB
  │
  └── READ (SELECT) ──▶ Replica 1
                     ──▶ Replica 2
                     ──▶ Replica 3

Primary replicates data to all replicas asynchronously

Setting it up

In PostgreSQL, you configure streaming replication. The replica connects to the primary and continuously receives write-ahead logWhat is wal?Write-Ahead Logging - a database journal mode that allows concurrent reads during writes and provides better crash safety. (WAL) entries.

-- On the primary: check replication status
SELECT client_addr, state, sent_lsn, replay_lsn,
       sent_lsn - replay_lsn AS replication_lag
FROM pg_stat_replication;

In your application code, you route queries to the appropriate database:

import { Pool } from 'pg';

const primaryPool = new Pool({
  host: process.env.DB_PRIMARY_HOST,
  database: 'myapp',
});

const replicaPool = new Pool({
  host: process.env.DB_REPLICA_HOST,
  database: 'myapp',
});

// Write operations always go to primary
async function createUser(name: string, email: string) {
  return primaryPool.query(
    'INSERT INTO users (name, email) VALUES (CODE_BLOCK, $2) RETURNING *',
    [name, email]
  );
}

// Read operations go to replica
async function getUsers() {
  return replicaPool.query('SELECT * FROM users ORDER BY created_at DESC');
}

// Critical reads (right after a write) go to primary
async function getUserAfterUpdate(id: string) {
  return primaryPool.query('SELECT * FROM users WHERE id = CODE_BLOCK', [id]);
}

Replication lag

Replicas are not instant. There is always some delay between a write hitting the primary and appearing on the replica, typically milliseconds, but it can spike to seconds under heavy load.

This means a user who updates their profile and immediately refreshes the page might see the old data if the read hits a replica that has not caught up yet. You handle this with read-your-writes consistency: route reads that follow a recent write to the primary instead of a replica.

Connection pooling

Every database connection consumes memory, typically 5-10 MB per connection in PostgreSQL. If you have 10 app servers, each opening 20 connections, that is 200 connections consuming over 1 GB of RAM just for connection overhead.

Connection poolers like PgBouncer sit between your application and the database, maintaining a pool of reusable connections.

Without pooling:
  App Server 1 (20 conns) ──┐
  App Server 2 (20 conns) ──┼──▶ PostgreSQL (60 connections, ~600 MB)
  App Server 3 (20 conns) ──┘

With PgBouncer:
  App Server 1 (20 conns) ──┐
  App Server 2 (20 conns) ──┼──▶ PgBouncer (10 actual conns) ──▶ PostgreSQL (~100 MB)
  App Server 3 (20 conns) ──┘

PgBouncer multiplexes: 60 application connections share 10 real database connections. Queries are queued and dispatched as connections become available. The database sees far fewer connections, uses less memory, and performs better.

ini

; pgbouncer.ini
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp

[pgbouncer]
listen_port = 6432
pool_mode = transaction          ; release connection after each transaction
max_client_conn = 500            ; accept up to 500 app connections
default_pool_size = 20           ; only 20 real DB connections
reserve_pool_size = 5            ; extra connections for spikes

Use pool_mode = transaction for web applications. It releases the real database connection after each transaction completes, maximizing connection reuse. The session mode holds a connection for the entire client session, which defeats the purpose for short-lived web requests.

Partitioning

When a single table grows to hundreds of millions of rows, queries slow down even with proper indexes. Partitioning splits one large table into smaller physical pieces while keeping them logically unified.

Partition strategies

-- Range partitioning: split by date ranges
CREATE TABLE orders (
    id BIGINT,
    user_id BIGINT,
    amount DECIMAL(10,2),
    created_at TIMESTAMP
) PARTITION BY RANGE (created_at);

CREATE TABLE orders_2024 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
CREATE TABLE orders_2025 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
CREATE TABLE orders_2026 PARTITION OF orders
    FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');

-- Hash partitioning: distribute evenly by hash of a column
CREATE TABLE sessions (
    id UUID,
    user_id BIGINT,
    data JSONB
) PARTITION BY HASH (user_id);

CREATE TABLE sessions_0 PARTITION OF sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE sessions_1 PARTITION OF sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE sessions_2 PARTITION OF sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE sessions_3 PARTITION OF sessions
    FOR VALUES WITH (MODULUS 4, REMAINDER 3);

Strategy	How it splits data	Best for	Watch out for
Range	By value ranges (dates, IDs)	Time-series data, logs, archiving old data	Hot partitions if most queries hit recent data
Hash	By hash of a column value	Even distribution across partitions	Cannot do efficient range queries
List	By explicit value lists (country, category)	Multi-region data, enumerated categories	Uneven partition sizes

ShardingWhat is sharding?Splitting a database across multiple servers by distributing rows based on a key, so each server handles only a portion of the total data.

Sharding is partitioning taken to the next level: instead of splitting data within one database server, you distribute it across multiple independent database servers.

Partitioning:                      Sharding:
┌─────────────────────┐            ┌──────────┐  ┌──────────┐  ┌──────────┐
│  One DB server       │            │ DB Shard 1│  │ DB Shard 2│  │ DB Shard 3│
│ ┌───────┬───────┐   │            │ users A-H │  │ users I-P │  │ users Q-Z │
│ │Part 1 │Part 2 │   │            └──────────┘  └──────────┘  └──────────┘
│ └───────┴───────┘   │              3 separate database servers
└─────────────────────┘

Shard key selection

The shard key determines which server holds each piece of data. This is the most critical decision in a sharding strategy because changing it later requires migrating all your data.

// Simple shard routing by user ID
function getShardForUser(userId: number): string {
  const shardCount = 3;
  const shardIndex = userId % shardCount;
  return `shard-${shardIndex}`;
}

// Route query to the correct shard
async function getUser(userId: number) {
  const shardName = getShardForUser(userId);
  const db = shardConnections[shardName];
  return db.query('SELECT * FROM users WHERE id = CODE_BLOCK', [userId]);
}

A good shard key has three properties:

High cardinality: many distinct values (user IDs: good; country codes: bad, you only have ~200)
Even distribution: data spreads roughly equally across shards (random UUIDsWhat is uuid?Universally Unique Identifier - a 128-bit random string used as a globally unique ID for database records, events, or request tracking.: great; auto-increment IDs for a SaaS with one huge tenant: terrible)
Query alignment: most queries include the shard key, so you know which shard to hit (queries without the shard key must fan out to all shards, called a scatter-gather, and it is slow)

What sharding breaks

Sharding introduces real complexity. Cross-shard queries (JOINs across two users on different shards) become impossible or extremely expensive. Auto-incrementing IDs no longer work (each shard has its own sequence). Transactions spanning multiple shards require distributed transactionWhat is transaction?A group of database operations that either all succeed together or all fail together, preventing partial updates. protocols like two-phase commitWhat is commit?A permanent snapshot of your staged changes saved in Git's history, identified by a unique hash and accompanied by a message describing what changed., which is slow and fragile.

The scaling ladder

Strategy	Complexity	When to use	Typical scale
Vertical scaling (bigger machine)	Trivial	First move, always	Up to ~10K RPS
Connection pooling (PgBouncer)	Low	When connection count > 100	10x connection efficiency
Read replicas	Low-medium	Read-heavy workloads (>80% reads)	2-5x read throughput
Table partitioning	Medium	Tables > 100M rows	Better query performance
Sharding	Very high	Single DB server maxed out	Virtually unlimited

The golden rule

exhaust each simpler strategy before moving to the next. Sharding is powerful but adds enormous operational complexity. Many successful companies (Basecamp, Stack Overflow) run on a single well-tuned database for years.

AI pitfall

AI will suggest sharding as a solution for "large databases." What AI gets wrong: most applications never need sharding. A single PostgreSQL instance with proper indexes can handle tables with hundreds of millions of rows. Sharding means every query needs to know which shard to hit, cross-shard joins become impossible, and rebalancing shards is a multi-week project. Exhaust vertical scaling, read replicas, and partitioning first.

Good to know

Read replicas are the lowest-effort database scaling technique and solve the most common problem (too many reads). Set up a replica in your cloud provider's dashboard, point read-heavy queries to it, and your primary database has more capacity for writes. Most ORMs support read/write splitting with minimal code changes.

Edge case

Replication lag between the primary database and read replicas creates subtle bugs. A user creates an order (write to primary), then immediately views their orders page (read from replica). If the replica is 200ms behind, the new order is missing. Use read-your-writes consistency for user-facing read-after-write scenarios, route those reads to the primary.

Done

Complete & Next