There is an old joke in computer science: the two hardest problems are cache invalidationWhat is cache invalidation?Removing or updating cached data when the original data changes, so users never see outdated information., naming things, and off-by-one errors. The joke is funny because cache invalidation really is that hard. Storing data in a cache is easy. Knowing when to remove or update it, without serving stale data, without killing your database, and without adding so much complexity that you wish you had never cached anything, that is the challenge.
Every invalidation strategy makes a tradeoff between freshness, performance, and complexity. There is no single correct answer. The right strategy depends on how stale your data can be, how often it changes, and how painful a stale read is for your users.
TTLWhat is ttl?Time-to-Live - a countdown attached to cached data that automatically expires it after a set number of seconds.-based invalidation
TTL (Time to Live) is the simplest invalidation strategy: every cached entry has an expiration time, and when it expires, it is automatically removed. The next request triggers a cache miss and fetches fresh data from the originWhat is origin?The combination of protocol, domain, and port that defines a security boundary in the browser, like https://example.com:443..
// TTL-based: set it and forget it
await redis.setex('product:42', 300, JSON.stringify(product)); // Expires in 5 minutes
// After 5 minutes, redis.get('product:42') returns null → cache miss → fresh fetchTTL is the right default when you can tolerate bounded staleness. If your product catalog updates a few times a day and users can live with data that is up to 5 minutes old, a 5-minute TTL is simple, reliable, and requires zero extra infrastructure.
The problem with TTL-only invalidation is the window of staleness. If a product's price changes from $10 to $15, users might see $10 for up to 5 minutes. For some data (blog posts, product descriptions), this is fine. For other data (inventory counts, prices during a flash sale), it is not.
Event-based invalidation
Event-based invalidation removes or updates the cache immediately when the underlying data changes. Instead of waiting for the TTLWhat is ttl?Time-to-Live - a countdown attached to cached data that automatically expires it after a set number of seconds. to expire, you actively tell the cache "this data is no longer valid."
// When a product is updated, invalidate its cache
async function updateProduct(productId, updates) {
await db.query('UPDATE products SET price = ? WHERE id = ?',
[updates.price, productId]);
// Immediately invalidate cache
await redis.del(`product:${productId}`);
// Also invalidate any list that contains this product
await redis.del(`category:${updates.categoryId}:products`);
await redis.del('featured-products');
}This gives you much better freshness than TTL alone, but it introduces a coordination problem: every code path that modifies data must also invalidate the corresponding cache keys. Miss one, and you have stale data. As your system grows, tracking which cache keys depend on which data becomes increasingly difficult.
A more scalable approach is to use a publish/subscribe mechanism:
// Publisher: broadcast that a product changed
async function updateProduct(productId, updates) {
await db.query('UPDATE products SET price = ? WHERE id = ?',
[updates.price, productId]);
// Publish event - all subscribers will clear their caches
await redis.publish('product:updated', JSON.stringify({ productId }));
}
// Subscriber: listen for changes and invalidate
const subscriber = new Redis();
subscriber.subscribe('product:updated');
subscriber.on('message', (channel, message) => {
const { productId } = JSON.parse(message);
redis.del(`product:${productId}`);
redis.del('product-list:*'); // Clear product list caches
});Versioned keys
Versioned keys solve the problem of invalidating groups of related cache entries. Instead of tracking and deleting every individual key, you embed a version number in the key. To invalidate, you increment the version, all old keys become unreachable and eventually expire via TTLWhat is ttl?Time-to-Live - a countdown attached to cached data that automatically expires it after a set number of seconds..
// Store version number in Redis
// Current version: 7
await redis.set('catalog:version', '7');
// Cache keys include the version
async function getProduct(productId) {
const version = await redis.get('catalog:version');
const cacheKey = `catalog:v${version}:product:${productId}`;
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);
const product = await db.query('SELECT * FROM products WHERE id = ?', [productId]);
await redis.setex(cacheKey, 3600, JSON.stringify(product)); // 1 hour TTL
return product;
}
// To invalidate the entire catalog: just bump the version
async function invalidateCatalog() {
await redis.incr('catalog:version');
// All old v7 keys are now orphaned and will expire via TTL
// New requests will create v8 keys with fresh data
}This is elegant for bulk invalidation, one version bump invalidates everything, but wasteful if only a single item changed, because the entire cache is effectively cold after a version bump.
Cache stampede (thundering herd)
Cache stampede is what happens when a popular cache key expires and many concurrent requests all experience a cache miss at the same time. All of them query the database simultaneously, potentially overwhelming it.
Popular key expires at T=300s
T=300.001s: Request A → cache miss → query DB
T=300.002s: Request B → cache miss → query DB
T=300.003s: Request C → cache miss → query DB
... 500 more requests → 500 DB queries for the same dataThis is not a theoretical problem. A single popular key expiring during a traffic spike can bring down your database. There are three main solutions.
Solution 1: MutexWhat is mutex?A mutual exclusion lock that prevents concurrent access to shared data - only one thread or goroutine can hold it at a time. lock (lock and wait)
Only one request is allowed to repopulate the cache. All other requests wait for that one to finish.
async function getProductWithLock(productId) {
const cacheKey = `product:${productId}`;
const lockKey = `lock:${cacheKey}`;
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);
// Try to acquire lock (SET NX = set if not exists, EX = expire in 5s)
const acquired = await redis.set(lockKey, '1', 'EX', 5, 'NX');
if (acquired) {
// This request won the lock - fetch from DB and populate cache
const product = await db.query('SELECT * FROM products WHERE id = ?', [productId]);
await redis.setex(cacheKey, 300, JSON.stringify(product));
await redis.del(lockKey);
return product;
} else {
// Another request is already fetching - wait and retry
await sleep(50);
return getProductWithLock(productId); // Retry (add max retries in production)
}
}Solution 2: Stale-while-revalidate
Serve the stale data immediately while refreshing the cache in the background. The user gets a fast response (possibly slightly stale), and the cache is updated for the next request.
async function getProductSWR(productId) {
const cacheKey = `product:${productId}`;
const cached = await redis.get(cacheKey);
if (cached) {
const { data, expiresAt, staleAt } = JSON.parse(cached);
if (Date.now() < staleAt) {
return data; // Fresh - serve as-is
}
if (Date.now() < expiresAt) {
// Stale but not expired - serve stale, refresh in background
refreshCache(productId, cacheKey); // fire-and-forget
return data;
}
}
// Fully expired or missing - must fetch synchronously
return fetchAndCache(productId, cacheKey);
}
async function refreshCache(productId, cacheKey) {
const product = await db.query('SELECT * FROM products WHERE id = ?', [productId]);
const entry = {
data: product,
staleAt: Date.now() + 240_000, // Fresh for 4 minutes
expiresAt: Date.now() + 300_000, // Stale-but-servable for 1 more minute
};
await redis.setex(cacheKey, 600, JSON.stringify(entry));
}Solution 3: Probabilistic early expiration
Each request has a small random chance of refreshing the cache before it expires. The closer to expiration, the higher the probability. This spreads the refresh load over time instead of concentrating it at one moment.
async function getProductProbabilistic(productId) {
const cacheKey = `product:${productId}`;
const cached = await redis.get(cacheKey);
if (cached) {
const { data, expiresAt, ttl } = JSON.parse(cached);
const remaining = expiresAt - Date.now();
const probability = Math.exp(-remaining / (ttl * 0.1)); // Higher as expiry approaches
if (Math.random() < probability) {
// "Won the lottery" - proactively refresh
refreshCache(productId, cacheKey);
}
if (Date.now() < expiresAt) return data;
}
return fetchAndCache(productId, cacheKey);
}Invalidation strategies comparison
| Strategy | Freshness | Complexity | Stampede protection | Best for |
|---|---|---|---|---|
| TTL only | Bounded staleness (up to TTL) | Very low | None | Data where brief staleness is acceptable |
| Event-based (delete on write) | Near-immediate | Medium | None (still need stampede protection) | Data that changes infrequently but must be fresh |
| Event-based (pub/sub) | Near-immediate | High | None | Distributed systems with multiple cache instances |
| Versioned keys | Immediate (bulk) | Medium | None | Bulk invalidation of related data |
| Mutex lock | Fresh (single fetcher) | Medium | Yes | Popular keys with expensive DB queries |
| Stale-while-revalidate | Slightly stale during refresh | Medium | Yes (serves stale) | High-traffic endpoints where some staleness is okay |
| Probabilistic early expiry | Mostly fresh | High | Yes (spreads load) | Very high traffic, keys with predictable access patterns |
Practical advice
Start with TTLWhat is ttl?Time-to-Live - a countdown attached to cached data that automatically expires it after a set number of seconds.-based invalidation. It covers most cases and adds zero complexity. Layer event-based invalidation on top for data where staleness causes real user-facing problems (prices, inventory). Add stampede protection only for keys that receive enough concurrent traffic to actually cause a stampede, most keys in most systems will never have this problem.
The biggest mistake teams make with invalidation is over-engineering it from day one. A simple 5-minute TTL with delete-on-write for critical keys is more reliable than a complex pub/subWhat is pub/sub?A messaging pattern where senders publish events to a channel and any number of listeners receive them in real time. invalidation system that nobody fully understands.
SET NX with a 5-second timeout, and the database query takes 6 seconds, the lock expires while the query is still running. A second request acquires the lock and runs the same query. Use a lock timeout that is at least 2x your worst-case query time.Cache-Control: stale-while-revalidate=60 enables this at the browser and CDN level with zero application code.