The previous lesson taught you to normalize: split data into clean, non-redundant tables. This lesson teaches you when to deliberately break those rules. DenormalizationWhat is denormalization?Intentionally duplicating data across tables to speed up read queries by avoiding expensive joins at query time. is the art of trading write complexity for read speed. Done right, it makes your application noticeably faster. Done wrong, it creates data consistency nightmares.
Why denormalize at all?
Imagine your e-commerce product page. To render it, you need data from five tables:
-- Normalized query: 4 JOINs
SELECT p.name, p.description, p.price,
c.name AS category,
b.name AS brand,
AVG(r.rating) AS avg_rating,
COUNT(r.id) AS review_count
FROM products p
JOIN categories c ON c.id = p.category_id
JOIN brands b ON b.id = p.brand_id
LEFT JOIN reviews r ON r.product_id = p.id
WHERE p.id = 42
GROUP BY p.id, c.name, b.name;This query is correct, clean, and normalized. It's also running on every single product page view. With 10 million page views per day, those JOINs add up. Each one requires the database to match rows across tables, sort, and aggregate.
DenormalizationWhat is denormalization?Intentionally duplicating data across tables to speed up read queries by avoiding expensive joins at query time. stores pre-computed results so you can skip the JOINs:
-- Denormalized: one table, no JOINs
SELECT name, description, price, category_name,
brand_name, avg_rating, review_count
FROM products_denormalized
WHERE id = 42;Same result, but instead of combining five tables at query time, the data is already combined at write time. The read goes from ~5ms to ~0.5ms. That matters at scale.
The read/write trade-off
DenormalizationWhat is denormalization?Intentionally duplicating data across tables to speed up read queries by avoiding expensive joins at query time. shifts work from read time to write time. Here's the fundamental trade-off:
Normalized:
Read: Slow (JOINs at query time)
Write: Fast (update one row in one table)
Risk: Low (single source of truth)
Denormalized:
Read: Fast (pre-computed, no JOINs)
Write: Slow (must update multiple places)
Risk: Higher (data can go out of sync)This is only worth it when your application is read-heavy: meaning reads outnumber writes by 10:1, 100:1, or more. Most web applications are read-heavy: for every review someone posts, thousands of people read the product page.
DenormalizationWhat is denormalization?Intentionally duplicating data across tables to speed up read queries by avoiding expensive joins at query time. strategies
Strategy 1: Duplicate columns
Copy frequently-accessed data from related tables into the main table.
-- Instead of JOINing to get the category name every time:
ALTER TABLE products ADD COLUMN category_name TEXT;
-- When a product is created or updated:
UPDATE products
SET category_name = (SELECT name FROM categories WHERE id = products.category_id)
WHERE id = 42;Danger: If you rename a category, you must update every product in that category. Forget to do it, and your data is inconsistent.
Strategy 2: Pre-computed aggregates
Store calculated values (counts, averages, sums) instead of computing them on every read.
-- Add aggregate columns to the products table
ALTER TABLE products
ADD COLUMN review_count INTEGER DEFAULT 0,
ADD COLUMN avg_rating NUMERIC(3,2) DEFAULT 0;
-- Update aggregates when a review is added
CREATE OR REPLACE FUNCTION update_product_stats()
RETURNS TRIGGER AS $
BEGIN
UPDATE products
SET review_count = (SELECT COUNT(*) FROM reviews WHERE product_id = NEW.product_id),
avg_rating = (SELECT AVG(rating) FROM reviews WHERE product_id = NEW.product_id)
WHERE id = NEW.product_id;
RETURN NEW;
END;
$ LANGUAGE plpgsql;
CREATE TRIGGER after_review_insert
AFTER INSERT ON reviews
FOR EACH ROW EXECUTE FUNCTION update_product_stats();Strategy 3: Materialized views
A materialized viewWhat is materialized view?A database object that stores the pre-computed result of a query on disk - only recomputes when you explicitly refresh it. is a query result that the database stores as a physical table. You get denormalized reads without manually maintaining the duplicated data.
-- Create a materialized view
CREATE MATERIALIZED VIEW product_summary AS
SELECT p.id, p.name, p.price,
c.name AS category_name,
b.name AS brand_name,
COALESCE(AVG(r.rating), 0) AS avg_rating,
COUNT(r.id) AS review_count
FROM products p
JOIN categories c ON c.id = p.category_id
JOIN brands b ON b.id = p.brand_id
LEFT JOIN reviews r ON r.product_id = p.id
GROUP BY p.id, c.name, b.name;
-- Create an index on the materialized view
CREATE UNIQUE INDEX ON product_summary(id);
-- Reads are fast - it's just a table scan
SELECT * FROM product_summary WHERE id = 42;
-- Refresh when data changes (can be scheduled)
REFRESH MATERIALIZED VIEW CONCURRENTLY product_summary;The trade-off is staleness: materialized views are only as fresh as the last refresh. For product pages, refreshing every 5 minutes is usually fine. For a live stock ticker, it's not.
Strategy 4: Read replicas
Instead of denormalizing your primary database, create a separate read-only copy with a different schemaWhat is schema?A formal definition of the structure your data must follow - which fields exist, what types they have, and which are required. optimized for queries.
Primary DB (normalized) Read Replica (denormalized)
┌──────────────┐ ┌──────────────────────┐
│ products │ replication │ product_search_view │
│ categories │ ─────────────→ │ (flat, indexed) │
│ brands │ │ │
│ reviews │ │ analytics_summary │
└──────────────┘ │ (pre-aggregated) │
↑ └──────────────────────┘
writes ↑
readsThis gives you the best of both worlds: writes go to the clean, normalized primary, and reads come from the optimized replica. The replica can be seconds behind, which is acceptable for most use cases.
Strategy 5: Application-level caching
Instead of denormalizing in the database, cache the assembled result in Redis or an in-memory cache.
async function getProduct(id: number) {
// Check cache first
const cached = await redis.get(`product:${id}`);
if (cached) return JSON.parse(cached);
// Cache miss: run the full normalized query
const product = await db.query(`
SELECT p.*, c.name AS category_name, b.name AS brand_name,
AVG(r.rating) AS avg_rating, COUNT(r.id) AS review_count
FROM products p
JOIN categories c ON c.id = p.category_id
JOIN brands b ON b.id = p.brand_id
LEFT JOIN reviews r ON r.product_id = p.id
WHERE p.id = CODE_BLOCK
GROUP BY p.id, c.name, b.name
`, [id]);
// Cache for 5 minutes
await redis.setex(`product:${id}`, 300, JSON.stringify(product));
return product;
}This is often the safest starting point for denormalization. Your database stays clean, and cache invalidationWhat is cache invalidation?Removing or updating cached data when the original data changes, so users never see outdated information. is explicit.
Strategy comparison
| Strategy | Consistency risk | Implementation effort | Best for |
|---|---|---|---|
| Duplicate columns | High, must update all copies | Low | Rarely-changing reference data |
| Pre-computed aggregates | Medium, triggers can fail | Medium | Counts, averages, sums |
| Materialized views | Low, database manages it | Low | Reporting, dashboards |
| Read replicas | Low, replication lag only | High (ops overhead) | Large-scale read offloading |
| Application cache | Medium, stale cache risk | Medium | Hot data, API responses |
When to denormalize
DenormalizationWhat is denormalization?Intentionally duplicating data across tables to speed up read queries by avoiding expensive joins at query time. is medicine, not vitamins. Take it for a specific symptom, not as a general health practice.
Denormalize when:
- Query profiling shows JOINs are the bottleneck (not missing indexes, not bad queries)
- The read-to-write ratio is heavily skewed toward reads (100:1 or more)
- The data changes infrequently (product catalog, user profiles)
- You can accept slight staleness (a review count that's 5 minutes behind is fine)
Don't denormalize when:
- You haven't measured the actual performance problem
- The data changes frequently and consistency matters (financial transactions)
- You're still in the early stages of your product (premature optimization)
- A proper indexWhat is index?A data structure the database maintains alongside a table so it can find rows by specific columns quickly instead of scanning everything. would solve the same problem with zero trade-offs
-- Before denormalizing, try adding an index first
CREATE INDEX idx_reviews_product_id ON reviews(product_id);
-- This might make your JOIN fast enough without any denormalizationThe mantra: normalize by default, index second, denormalize last.