System Design/
Lesson

By now you know the difference between SQLWhat is sql?A language for querying and managing data in relational databases, letting you insert, read, update, and delete rows across tables. and NoSQLWhat is nosql?A category of databases that store data without fixed table schemas, using documents, key-value pairs, or graphs instead of rows and columns., and you understand normalizationWhat is normalization?Organizing database tables to reduce duplicated data by splitting information into related tables connected by foreign keys. trade-offs. The next question is concrete: given a real application, which storage technology do you pick for each piece of data? This lesson gives you a decision matrix and teaches you polyglot persistence, using multiple storage technologies in the same system.

The five storage options

Let's profile five storage technologies you'll encounter in most production systems.

PostgreSQL

The swiss army knife of databases. Relational, ACIDWhat is acid?Four guarantees a database makes about transactions: changes are all-or-nothing, data stays valid, concurrent users don't interfere, and saved data survives crashes.-compliant, with excellent support for JSONWhat is json?A text format for exchanging data between systems. It uses key-value pairs and arrays, and every programming language can read and write it., full-text search, and extensions. If you're unsure, start here.

-- PostgreSQL handles structured data beautifully
CREATE TABLE users (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email TEXT UNIQUE NOT NULL,
  name TEXT NOT NULL,
  preferences JSONB DEFAULT '{}',  -- flexible JSON inside SQL
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Full-text search built in
SELECT * FROM articles
WHERE to_tsvector('english', title || ' ' || body)
  @@ to_tsquery('english', 'serverless & deployment');

Use when: You need transactions, complex queries, JOINs, strong consistency, or your data is relational.

MongoDB

A document database that stores JSON-like objects (BSON). Each document can have a different shape. ShardingWhat is sharding?Splitting a database across multiple servers by distributing rows based on a key, so each server handles only a portion of the total data. (splitting data across machines) is built in.

// MongoDB excels at flexible, nested documents
await db.collection("products").insertOne({
  name: "Wireless Keyboard",
  price: 79.99,
  specs: {
    layout: "ANSI",
    connectivity: ["Bluetooth", "USB-C"],
    battery_hours: 200
  },
  variants: [
    { color: "black", sku: "KB-BLK-001", stock: 142 },
    { color: "white", sku: "KB-WHT-001", stock: 89 }
  ]
});

// Query nested fields directly
const result = await db.collection("products").find({
  "specs.connectivity": "Bluetooth",
  "variants.stock": { $gt: 0 }
});

Use when: SchemaWhat is schema?A formal definition of the structure your data must follow - which fields exist, what types they have, and which are required. varies across records, you need horizontal scalingWhat is horizontal scaling?Adding more machines to handle increased load, rather than upgrading a single machine to be more powerful., or your data is naturally document-shaped (CMS content, product catalogs, user profiles with variable fields).

DynamoDB

A fully managed key-value and document database from AWS. You define a partition key (and optional sort key), and DynamoDB handles scaling, replication, and availability. You pay per request or per provisioned capacity.

// DynamoDB: design your table around access patterns
const params = {
  TableName: "Orders",
  Item: {
    PK: { S: "USER#alice" },           // partition key
    SK: { S: "ORDER#2024-01-15#001" },  // sort key
    total: { N: "89.99" },
    status: { S: "shipped" },
    items: { L: [
      { M: { name: { S: "Keyboard" }, qty: { N: "1" } } }
    ]}
  }
};
await dynamo.putItem(params).promise();

// Efficient query: all orders for a user
const orders = await dynamo.query({
  TableName: "Orders",
  KeyConditionExpression: "PK = :pk AND begins_with(SK, :prefix)",
  ExpressionAttributeValues: {
    ":pk": { S: "USER#alice" },
    ":prefix": { S: "ORDER#" }
  }
}).promise();

Use when: You're on AWS, need predictable performance at any scale, and your access patterns are well-defined (primarily key-based lookups). Avoid for ad-hoc queries or complex reporting.

Redis

An in-memory data structure store. Not a replacement for your primary database, it's a speed layer. Data in Redis should be data you can afford to lose (or reconstruct from your primary database).

// Redis: multiple data structures, all blazing fast
// Strings (caching)
await redis.setex("user:42:profile", 3600, JSON.stringify(profile));

// Sorted sets (leaderboards)
await redis.zadd("leaderboard:weekly", score, `user:${userId}`);
const top10 = await redis.zrevrange("leaderboard:weekly", 0, 9, "WITHSCORES");

// Hash maps (structured cache)
await redis.hset("session:abc", { userId: "42", role: "admin", expiresAt: "..." });

// Pub/Sub (real-time messaging)
redis.publish("notifications", JSON.stringify({ userId: 42, msg: "New order" }));

Use when: You need sub-millisecond reads (caching, sessions, rate limitingWhat is rate limiting?Restricting how many requests a client can make within a time window. Prevents brute-force attacks and protects your API from being overwhelmed., real-time leaderboards, pub/subWhat is pub/sub?A messaging pattern where senders publish events to a channel and any number of listeners receive them in real time. messaging). Not for durable storage.

S3 (Object Storage)

Amazon S3 (and equivalents like Google Cloud Storage, Cloudflare R2) stores files, images, PDFs, videos, backups. It's cheap, durable (99.999999999%, eleven nines), and scales infinitely.

// Upload a file to S3
await s3.putObject({
  Bucket: "myapp-uploads",
  Key: `avatars/${userId}.jpg`,
  Body: fileBuffer,
  ContentType: "image/jpeg"
}).promise();

// Generate a pre-signed URL (expires in 1 hour)
const url = s3.getSignedUrl("getObject", {
  Bucket: "myapp-uploads",
  Key: `avatars/${userId}.jpg`,
  Expires: 3600
});

Use when: You need to store files of any size. Never store binaryWhat is binary?A ready-to-run file produced by the compiler. You can send it to any computer and it just works - no install needed. files (images, videos, documents) in your database, use object storage and store only the URL/key in the database.

02

Decision matrix

CriteriaPostgreSQLMongoDBDynamoDBRedisS3
Data modelRelational tablesJSON documentsKey-value / documentKey-value + structuresBinary objects
SchemaStrictFlexibleFlexible with key designSchema-lessN/A (files)
Query powerVery high (SQL)Medium (aggregation)Low (key-based)Low (key-based)None (get/put by key)
ScalingVertical (read replicas help)Horizontal (sharding)Horizontal (automatic)Horizontal (cluster)Infinite (managed)
ConsistencyStrongConfigurableConfigurableStrong (single-node)Eventual (strong available)
Latency1-10ms1-10ms1-10ms<1ms50-200ms
DurabilityExcellentExcellentExcellentVolatile by default99.999999999%
Cost modelPer server/instancePer server/instancePer request or capacityPer server/instancePer GB stored + transfer
Best forCore business dataFlexible contentServerless, high scaleCaching, real-timeFiles, backups, static assets
Avoid forUnstructured blobsComplex transactionsAd-hoc queriesDurable primary storageStructured data queries
03

Polyglot persistence in practice

Polyglot persistence means using different storage technologies for different types of data within the same application. Here's how a realistic e-commerce system might look:

┌──────────────────────────────────────────────────┐
│                 Application Layer                  │
└──────────┬──────┬──────┬──────┬──────┬───────────┘
           │      │      │      │      │
     ┌─────▼──┐ ┌─▼────┐ ┌▼─────┐ ┌──▼──┐ ┌──▼───┐
     │Postgres│ │Mongo │ │Redis │ │ S3  │ │Dynamo│
     │        │ │      │ │      │ │     │ │      │
     │Users   │ │Product│ │Sessions│ │Images│ │Event │
     │Orders  │ │Catalog│ │Cache  │ │PDFs │ │Logs  │
     │Payments│ │CMS   │ │Rate   │ │Videos│ │Clicks│
     │        │ │      │ │Limits │ │     │ │      │
     └────────┘ └──────┘ └──────┘ └─────┘ └──────┘

PostgreSQL handles users, orders, and payments, data with strict relationships and transactional requirements.

MongoDB handles the product catalog (products have wildly different attributes across categories) and CMS content (pages, blog posts, marketing blocks).

Redis handles sessions (fast auth checks on every request), caches (pre-computed product summaries), and rate limitingWhat is rate limiting?Restricting how many requests a client can make within a time window. Prevents brute-force attacks and protects your API from being overwhelmed. (count APIWhat is api?A set of rules that lets one program talk to another, usually over the internet, by sending requests and getting responses. calls per user).

S3 stores all uploaded files, product images, user avatars, invoice PDFs, video content.

DynamoDB handles high-volume event data, click tracking, page views, audit logs, where write throughputWhat is throughput?The number of requests or operations a system can handle per unit of time, like requests per second. matters more than query flexibility.

04

Mapping access patterns to storage

The most important skill in choosing storage is identifying your access patterns first. Here's a systematic approach:

// Step 1: List your access patterns
const accessPatterns = {
  "Get user by email":       { type: "lookup-by-key",    frequency: "high" },
  "Get user's orders":       { type: "one-to-many-join", frequency: "high" },
  "Search products by text": { type: "full-text-search", frequency: "high" },
  "Store user avatar":       { type: "binary-blob",      frequency: "low" },
  "Check rate limit":        { type: "increment-counter", frequency: "very-high" },
  "Get monthly revenue":     { type: "aggregation",      frequency: "low" },
  "Track page view":         { type: "append-only-write", frequency: "very-high" }
};

// Step 2: Map patterns to storage
// lookup-by-key, one-to-many-join, aggregation → PostgreSQL
// full-text-search → PostgreSQL (or Elasticsearch for advanced)
// binary-blob → S3
// increment-counter → Redis
// append-only-write at very high frequency → DynamoDB (or Kafka)
05

Common architecture mistakes

Using one database for everything. Storing images as BLOBs in PostgreSQL, using PostgreSQL for sessionWhat is session?A server-side record that tracks a logged-in user. The browser holds only a session ID in a cookie, and the server looks up the full data on each request. storage, running analytics queries on your production database. Each of these has a better tool.

Using too many databases too early. A startup with 100 users doesn't need PostgreSQL + MongoDB + Redis + DynamoDB + Elasticsearch. Start with PostgreSQL + S3. Add specialized stores when you hit specific, measured bottlenecks.

Ignoring operational complexity. Every database you add is another system to monitor, back up, upgrade, and debug at 3 AM. The cost of polyglot persistence isn't just the software, it's the operational burden.

The sweet spot for most applications: PostgreSQL + Redis + S3. That covers 90% of use cases. Add more only when you have data-driven reasons.

AI pitfall
AI tends to recommend polyglot persistence too early. Ask it to design storage for a new app and it will suggest PostgreSQL + MongoDB + Redis + Elasticsearch + DynamoDB, five databases for an app with 100 users. What AI gets wrong: it optimizes for technical correctness without considering operational complexity. Every database you add is another thing to monitor, back up, and debug at 3 AM.
Good to know
Cloudflare R2 is S3-compatible but charges zero egress fees. If data transfer costs are a significant part of your budget (image-heavy apps, video streaming), R2 can cut your storage costs dramatically compared to S3.
Edge case
Redis persistence (RDB snapshots and AOF logs) does not make Redis a durable database. If Redis runs out of memory, it starts evicting keys based on your eviction policy, potentially dropping data you thought was saved. Always treat Redis as a cache layer, not a primary data store, even with persistence enabled.