Day 2 Medium Caching LRU/LFU/ARC Invalidation

Caching — The Art of Passing Through LayersCaching: Layers, Eviction Policies, Write Strategies, Invalidation

The Problem

Your home page has to return in under 200ms. A DB query takes 80ms and the page needs seven of them — without a cache, you're dead. But caching is never "just throw Redis at it." Facebook lost 2.5 hours of global availability in 2010 to a Memcached invalidation storm. Discord found LRU ruining their hot-channel message list in 2017 and switched to LFU, lifting hit rate from 78% to 96%. GitHub's PR page uses Russian-doll caching at 99.3% hit rate.

This issue covers three things: where caches live, how they evict, and how they handle writes. The hardest of these is invalidation — Phil Karlton famously called it one of the two hard problems in CS (the other being naming).

Requirements and Constraints (the interview opener)

High-Level Architecture (Multi-Layer Cache)

graph LR
    B["Browser cache
localStorage / SW"] CDN["CDN Edge
Cloudflare / Akamai"] RP["Reverse Proxy
Varnish / Nginx"] APP["App + L1
in-proc · Caffeine"] L2["Redis / Memcached
shared L2"] DB[("DB + buffer pool")] B -->|①| CDN -->|②| RP -->|③| APP -->|④| DB APP <-.->|miss → backfill| L2 L2 -.miss.-> DB classDef client fill:#1a2530,stroke:#64c8ff,color:#e8eef5 classDef edge fill:#0e2030,stroke:#5eead4,color:#e8eef5 classDef cache fill:#1a1a30,stroke:#ffb450,color:#e8eef5 classDef origin fill:#2a1530,stroke:#ff7ab6,color:#e8eef5 class B client class CDN,RP edge class APP,L2 cache class DB origin

Hit rate increases left to right; on return, each layer backfills so the next request short-circuits earlier

Key Technical Points

1. Cache Layers: Browser / CDN / App / DB Buffer Pool

Principle: each layer solves a different latency. Browser cache saves the network round trip (0ms vs 50ms). CDN puts assets within 50 km of users at PoPs, p50 around 10ms. In-process cache (Guava, Caffeine, Python's functools.lru_cache) saves a Redis RTT (0.1ms vs 1ms). Redis is shared across processes; on hit, it skips the DB (1ms vs 50ms). DB buffer pool (Postgres shared_buffers, InnoDB buffer pool) saves disk I/O.

Trade-off:
Real cases:

2. Eviction Policies: LRU vs LFU vs ARC vs TinyLFU

Principle: cache capacity is bounded, so you need a rule for "who dies first."

PolicyWhat gets evictedBest forHit rate
LRUleast recently accessedstrong temporal localitymedium
LFUleast frequently accessedfrequency-locality, stable hot sethigh (steady state)
ARCadaptive LRU+LFUshifting access patterns, no tuninghigh
W-TinyLFULFU with admission controlbursty traffic, scan-resistantvery high
FIFO/Randomfirst-in / randomO(1) minimalism, pure streaminglow
Trade-off:
# Minimal Python LRU (OrderedDict)
from collections import OrderedDict
class LRU:
    def __init__(self, cap): self.cap, self.d = cap, OrderedDict()
    def get(self, k):
        if k not in self.d: return None
        self.d.move_to_end(k)        # recent access → tail
        return self.d[k]
    def put(self, k, v):
        if k in self.d: self.d.move_to_end(k)
        self.d[k] = v
        if len(self.d) > self.cap:
            self.d.popitem(last=False)  # evict head (least-recent)

# Redis uses *approximate* LRU: sample 5 random keys, evict the oldest
# maxmemory-policy allkeys-lru / allkeys-lfu / volatile-ttl
Real cases:

3. Write Strategies: Cache-aside vs Write-through vs Write-back

Principle: on a write, the order in which you touch cache and DB — and who owns the write — determines consistency and complexity.

Cache-aside (lazy)Write-throughWrite-back
Readmiss → DB → backfill cachealways cache hitcache hit
Writewrite DB → delete cachewrite cache → sync to DBwrite cache only → async to DB
Consistencyeventual (small window)strong (cache = DB)weak (crash loses unflushed)
Write latencylow (DB only)high (two writes)very low (memory only)
FitsRedis + DB, the defaultfinance, strong consistency readsCPU cache, SSD FTL
# Cache-aside canonical form (Python)
def get_user(uid):
    user = redis.get(f"u:{uid}")
    if user: return json.loads(user)
    user = db.query("SELECT * FROM users WHERE id=%s", uid)
    redis.setex(f"u:{uid}", 3600, json.dumps(user))  # TTL safety net
    return user

def update_user(uid, data):
    db.update("UPDATE users SET ... WHERE id=%s", uid, data)
    redis.delete(f"u:{uid}")   # ⚠️ delete, not update — avoids concurrent-write disorder
    # ⚠️ another concern: a read right after delete may backfill the old value → double-delete / delayed double-delete
Trade-off:
Real cases:

4. The Hardest Part: TTL, Thundering Herd, Avalanche, Penetration

Principle: four classic invalidation traps, every one of which has burned production somewhere.

# Single-flight to prevent stampede (Go)
var g singleflight.Group
func GetHot(key string) (any, error) {
    v, err, _ := g.Do(key, func() (any, error) {
        if v := redis.Get(key); v != nil { return v, nil }
        v := db.Query(key)
        redis.SetEX(key, v, ttlWithJitter(3600))
        return v, nil
    })
    return v, err
}
Real cases:

Scaling Up: What to Do After Growth

Common Pitfalls (Even Senior Engineers Slip Here)

1. Update cache, or delete cache? Always delete. Updates create write ordering hazards (A writes v1 → B writes v2 → B updates cache → A updates cache; cache stays at v1 forever).
2. Write DB first, or delete cache first? Standard is "DB first, then delete cache." Deleting first leaves a window where a read can backfill the old value. Worst case use "delayed double delete": delete → write DB → sleep 500ms → delete again.
3. The cache-DB consistency window is never zero. Acknowledge that; define the staleness budget at the business level.
4. Don't treat Redis as a DB: Redis persistence (RDB / AOF) isn't built for it as the design goal — crash RPO is at least a few seconds. For account balances, Redis is a cache, not the source of truth.
5. Ignoring cold start: a restart or cluster swap drops hit rate to 0% and the DB takes the full load. Pre-warm or ramp traffic in slowly.

Sample Interview Questions

  1. Design Instagram's home feed cache: 1B users, p99 < 200ms.
  2. One Redis hot key (a celebrity's follower list) at 1M QPS — how do you survive?
  3. What's your cache consistency model? Discuss the failure cases for "DB first then delete cache" both ways.
  4. Implementation details for LRU vs LFU, time complexity, and which fits which scenario?
  5. How do you stop cache penetration (malicious lookups for non-existent keys)? Compute a Bloom filter's false positive rate.

Key Resources

English Summary

Caching is a multi-layer art spanning browser, CDN, reverse proxy, in-process, distributed (Redis/Memcached), and DB buffer pools. Choose eviction policy by access pattern: LRU for recency, LFU for frequency, W-TinyLFU (Caffeine) when in doubt. Write strategies trade consistency for performance: cache-aside is the default, write-through ensures correctness, write-back maximizes throughput at the cost of durability. The hardest part is invalidation — beware thundering herd, avalanche (use TTL jitter), penetration (bloom filter), and the eternal question of delete vs update cache (always delete).

Going Deeper (click to expand)

1. Hit rate drops from 95% to 90%. How many times more load does the DB see — and why isn't it simply 2×?

Naive math: miss rate 5% → 10%, origin QPS doubles, DB load = 2×.

Reality is worse, at least three amplifiers:

  • Misses also cause writes: every miss is followed by a cache SET; under high concurrency you also get a thundering herd — 1000 concurrent requests for the same key all hit the DB once each, not collectively once.
  • The DB's own caches collapse in turn: Postgres shared_buffers / MySQL buffer pool hit rate sinks too, disk I/O spikes, p99 isn't just 2× anymore.
  • Connection-pool contention: the pool is sized for 95% hit rate; doubling the QPS exhausts it, new requests queue, p99 avalanches.

Real numbers: at 90% hit rate, DB load is usually 3–5× nominal. Hence the rule of thumb: a 5-point drop in hit rate is a P2.

2. Cache-aside "write DB then delete cache" still occasionally returns stale data in production. List three plausible causes and the matching fix.
  • Concurrent read interleaved with write: A reads, cache miss → A reads DB (gets v1) → B writes DB (v2) → B deletes cache → A backfills with v1. Cache stuck at v1. Fix: delayed double-delete (B deletes → sleep 500ms → delete again); or single-flight to limit concurrent backfills.
  • Replication lag: write goes to primary, cache deleted, but a follow-up read routes to a not-yet-replicated replica and returns stale data. Fix: read from primary for a window after delete, or "write primary → wait for replica → then delete cache."
  • Delete failed: DB write succeeded but Redis delete dropped due to network jitter, with no retry. Fix: outbox / CDC pattern — durably emit the invalidation event (Debezium → Kafka → consumer deletes cache), at-least-once delivery.
  • Multi-region desync: US-East deleted, US-West didn't. Fix: cross-region invalidation broadcast (Facebook's mcsqueal).
  • Client-side local cache: in-process Caffeine still serves the old value even after Redis is invalidated. Fix: shorter local TTL or Redis pub/sub to push invalidation.
3. Redis Cluster doesn't support MULTI / transactions across slots. How does that affect key design? What does the hash tag ({user:123}) solve, and what does it introduce?

The constraint: user:123:profile and user:123:posts can hash to different slots on different nodes; a MULTI across them gets rejected (CROSSSLOT). Same goes for cross-slot MGET/MSET.

Hash tag usage: write keys as {user:123}:profile and {user:123}:posts; the content inside {} is the hash input, forcing both keys into the same slot.

What it introduces:

  • Skew: all of a user's keys land on one node; a celebrity user becomes a single-node hot spot with pegged CPU.
  • Big key risk: binding multiple collections together means one heavy user is a memory blowup on one node.
  • Painful resharding: cluster resize has to migrate the whole group as a unit; writes to those keys stall during migration.
  • Anti-pattern temptation: developers tag every "might be read together" key, until the cluster degenerates into a single-node Redis.

Field rule: only use hash tags when atomicity is truly required (cart + inventory). For multi-key display pages, do concurrent GETs from the app rather than rely on transactions.

4. A hot key sees 1M QPS at a single Redis node, which can't handle it. List three solutions across different axes and their costs.
  • ① Key sharding (split on write, merge on read): turn celebrity:123:followers_count into celebrity:123:followers_count:{0..15}; on write, increment a random shard; on read, SUM 16 shards. Cost: 16× read amplification, weak consistency (transient sum may dip), only works for aggregable data.
  • ② Multi-replica reads (local replica): replicate the hot key to several Redis nodes; clients pick one at random. Cost: write must fan out to N replicas, weak consistency from async replication. Redis Cluster's READONLY mode does this.
  • ③ Multi-layer cache: in-process L1: each app instance keeps the hot key in local Caffeine with a 1s TTL. 100 app servers × 1 origin call/s = 100 QPS to Redis. Cost: up to 1s of staleness, memory waste (100 copies), invalidation pain.
  • ④ CDN / edge cache: for public data (celebrity follower count), put it on the CDN; edges absorb 1M QPS. Cost: only for publicly cacheable content, slow invalidation (30s+).
  • ⑤ Pre-compute + push: don't let users query at all — pre-generate into each user's timeline cache (Twitter fanout-on-write). Cost: huge write amplification, unrealistic for 100M-follower celebrities.

Production combo: usually ① + ③ together — shards take writes, local L1 takes reads.

5. On restart, hit rate is 0% and the DB melts. How do you pre-warm? And how do you keep pre-warming itself from killing the DB?

When to pre-warm:

  • Before a new release (blue-green: warm green, then swing);
  • During a Redis cluster swap (old dump → new import);
  • The night before a big sale (load predictable hot data ahead of time).

How to pre-warm:

  • Offline analysis + bulk load: from the last 7 days' access logs, compute top 100k keys, dump KV pairs offline, write them to Redis via RESTORE or a pipeline (not one SET at a time).
  • Slow-start in production: a new instance gets 1% traffic on launch, cache fills gradually, ramp to 100% over 5 minutes. Envoy's slow_start_config does this out of the box.
  • Dual-cache switchover: old Redis still serves; mirror a shadow stream to the new Redis to warm it; cut over once hit rate is high.

Don't melt the DB while pre-warming:

  • Rate-limit: the pre-warm script uses a token bucket (e.g., 5k QPS cap) so DB connections don't peg.
  • Read replica: warm from replicas to spare the primary's transaction capacity.
  • Off-peak: warm during the trough (3–5am); or piggyback on natural traffic ("first user triggers origin, everyone else benefits").
  • Single-flight: pre-warming needs herd protection too — one worker per key.

Real story: Pinterest melted the DB for 30 minutes during a 2014 Memcached cluster swap with no pre-warm; the fix was to run old and new in parallel for an hour.