1. Why Caching Matters

Without caching, every request hits the database. A typical MySQL read takes 1–10ms. A Redis read takes 0.1–0.5ms — a 10–100x speedup. At 100,000 requests per second, the database would need to handle 100K queries/second; with a 95% cache hit rate, only 5,000 queries/second reach the DB. This is the difference between $1,000/month in database costs and $100,000/month.

2. Caching Strategies Compared

StrategyRead PathWrite PathConsistencyBest For
Cache-Aside (Lazy Loading)Check cache → miss → read DB → populate cacheWrite DB → invalidate cacheEventual (stale until invalidated)General purpose, read-heavy
Write-ThroughAlways cache hit after first writeWrite cache + DB synchronouslyStrong (always consistent)Read-after-write consistency needed
Write-Back (Write-Behind)Same as write-throughWrite cache only; async flush to DBEventual (risk of data loss)Write-heavy, tolerate some loss
Read-ThroughCache populates itself from DB on missWrite DB → invalidate cacheEventualCache as a proxy (Redis Modules)
Refresh-AheadPre-fetch before expiryWrite DB → update cacheStrong (proactive)Predictable access patterns

Cache-Aside Pattern — Code Flow

Cache-aside read and write def get_user(user_id: int) -> dict: cache_key = f"user:{user_id}" # 1. Check cache cached = redis.get(cache_key) if cached: return json.loads(cached) # cache HIT # 2. Cache MISS — fetch from DB user = db.query("SELECT * FROM users WHERE id = ?", user_id) if not user: return None # 3. Populate cache (TTL 1 hour) redis.setex(cache_key, 3600, json.dumps(user)) return user def update_user(user_id: int, data: dict): # 1. Update DB db.execute("UPDATE users SET ... WHERE id = ?", user_id) # 2. Invalidate cache (delete, not update — avoids race conditions) redis.delete(f"user:{user_id}") # Next read will repopulate with fresh data

Why Delete Instead of Update on Write?

Updating the cache on write introduces a race condition: two concurrent writes could result in stale data being written to the cache after the DB update. Deletion is safe because the next read will always re-fetch from the DB. The cost is one extra cache miss per write — acceptable for most write-ratio workloads.

3. Eviction Policies

When the cache is full and a new item needs to be added, an eviction policy determines which existing item is removed.

LRU (Least Recently Used)
  Cache: [A (used 5s ago), B (1min ago), C (10min ago), D (1hr ago)]
  On new insert: evict D (accessed 1 hour ago — least recently used)
  Good for: temporal locality (recent = future access)

LFU (Least Frequently Used)
  Cache: [A: 1000 hits, B: 500 hits, C: 50 hits, D: 2 hits]
  On new insert: evict D (accessed only 2 times — least frequent)
  Good for: stable popularity distribution (bestsellers stay cached)

FIFO (First In, First Out)
  Cache: [A (inserted first), B, C, D (inserted last)]
  On new insert: evict A (oldest insertion, regardless of access)
  Simple but ignores access patterns — rarely used in production

TTL-based (Time-To-Live)
  Every item expires automatically after N seconds
  Passive eviction: item removed on next access attempt after expiry
  Combined with LRU: expired items considered least recently used

4. Cache Stampede (Thundering Herd)

One of the most dangerous failure modes in caching. When a hot cached item expires, all threads simultaneously try to rebuild it, overwhelming the database.

Real-World Impact

A cache stampede on a single hot key can bring down a production database in seconds. In 2010, Instagram suffered an outage when a celebrity's profile cache expired — 10,000 simultaneous DB queries for the same user_id. Facebook, Twitter, and Reddit have all published post-mortems about stampedes.

Prevention Strategy 1: Mutex Lock

Cache stampede prevention — mutex pattern def get_with_mutex(key: str) -> dict: cached = redis.get(key) if cached: return json.loads(cached) # Acquire distributed lock (only one thread rebuilds) lock_key = f"lock:{key}" acquired = redis.set(lock_key, "1", nx=True, ex=10) # 10s timeout if acquired: # We hold the lock — rebuild and cache data = db.fetch(key) redis.setex(key, 3600, json.dumps(data)) redis.delete(lock_key) return data else: # Another thread is rebuilding — wait and retry time.sleep(0.1) return get_with_mutex(key) # retry (will hit cache next time)

Prevention Strategy 2: Probabilistic Early Expiry (XFetch)

Before the TTL expires, proactively recompute with increasing probability. Items are refreshed while still "alive" in cache — no stampede window.

5. Consistent Hashing for Cache Distribution

When you have multiple cache nodes, you need to determine which node stores each key. Simple modulo hashing node = hash(key) % N means adding or removing one node remaps all keys — a full cache flush.

Consistent Hashing Ring (0 to 2^32)

          0
          │
   Node C ● ─────────── ● Node A
          │              │
          │   Ring       │
          │              │
   Node B ● ─────────────┘

Key "user:123" hashes to position 1,200,000,000
→ next clockwise node is Node A
→ if Node A is removed, key remaps only to Node B (adjacent)
→ only ~1/N keys are remapped (not all keys)

With virtual nodes (vnodes):
Each physical node gets 100–150 positions on the ring
→ even distribution despite unequal key hashing

6. Redis Cluster Architecture

Redis Cluster is the production standard for distributed caching at scale. It provides automatic sharding, replication, and failover.

  • Hash slots: 16,384 total slots; CRC16(key) mod 16384 maps every key to a slot
  • Node ownership: Each node owns a range of slots. A 6-node cluster: 3 primaries own slots 0–5460, 5461–10922, 10923–16383; 3 replicas mirror each primary
  • Gossip protocol: Nodes communicate node health via gossip every 100ms; failure detected within 500ms
  • Automatic failover: If a primary fails, its replica is elected primary within 1–2 seconds (majority vote)
  • Client routing: Smart clients (redis-py-cluster, Jedis cluster) cache the slot map and route directly; dumb clients get a MOVED redirect

7. Cache Invalidation Strategies

Cache invalidation is notoriously hard. There are three main approaches:

StrategyMechanismConsistencyComplexity
TTL-based expiryCache auto-expires after N secondsEventual (up to TTL)Very low
Event-driven invalidationDB write publishes event; cache deletes keyStrong (near real-time)Medium
Write-throughWrite always updates both cache and DBStrong (synchronous)Medium
Cache versioningAppend version to key; old version naturally expiresStrong (new key)Low
Bulk invalidation (tags)Tag related keys; invalidate all by tagStrongHigh

8. Monitoring Cache Health

A cache that isn't monitored will degrade silently. Track these metrics:

  • Hit rate: hits / (hits + misses). Target: >90% for read-heavy systems. A drop indicates over-eviction or incorrect TTLs.
  • Eviction rate: Items evicted per second. High eviction = cache too small; increase memory or reduce TTL.
  • Memory usage: Track used_memory vs maxmemory. Alert at 80% to prevent OOM eviction bursts.
  • Latency: Redis p99 latency should be <1ms. Spikes indicate slow commands (e.g. KEYS, unindexed SCAN).
  • Connection count: Too many connections cause overhead. Use connection pooling.

How We Research and Update This Guide

We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.

  • The workflow or formula is tested directly in the tool and compared against independent reference examples.
  • Examples are kept practical so readers can verify the result without hidden assumptions.
  • Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.

Frequently Asked Questions — Distributed Cache System Design