Day 28 Hard CDN / Edge Anycast Tiered Cache Edge Compute

CDN & Edge — Pushing Compute and Cache Within 50km of the UserAnycast, Tiered Cache, Edge Invalidation, Edge Compute

Scenario + Requirements

You're designing the CDN + Edge layer for a global e-commerce / media site: 50M DAU across five continents. The content is a mix of static assets (images/JS/video segments, strongly cacheable) + semi-dynamic pages (product pages, stale-tolerant for seconds) + dynamic APIs (cart, uncacheable).

The core tension: the closer you push content to the user, the higher the hit ratio and lower the latency — but the harder invalidation becomes and the weaker consistency gets. CDN design is finding the balance on this axis.

High-Level Architecture

graph LR
    U["Global users
same Anycast IP"] subgraph PoP["Nearest PoP (lower tier)"] EDGE["Edge node
TLS term · WAF · Edge Worker"] L1["Edge cache L1"] end UPPER["Regional upper-tier cache
upper tier"] SHIELD["Origin Shield
single fetch funnel"] ORG[("Origin
app + object store")] U -->|BGP picks nearest| EDGE --> L1 L1 -.miss.-> UPPER UPPER -.miss.-> SHIELD SHIELD -.miss.-> ORG classDef client fill:#1a2530,stroke:#64c8ff,color:#e8eef5 classDef edge fill:#0e2030,stroke:#5eead4,color:#e8eef5 classDef cache fill:#1a1a30,stroke:#ffb450,color:#e8eef5 classDef origin fill:#2a1530,stroke:#ff7ab6,color:#e8eef5 class U client class EDGE,L1 edge class UPPER,SHIELD cache class ORG origin

Anycast steers users to the nearest PoP; tiering funnels misses toward origin — rightward nodes are fewer and hits more expensive

Component roles: the edge node catches traffic via Anycast, does the TLS handshake, WAF filtering, and runs Edge Workers; the edge cache returns directly on a hit (the vast majority of requests); on a miss it asks the regional upper-tier cache, then the Origin Shield (one designated PoP dedicated to fetching, collapsing misses from thousands of edges into a few origin requests), and only then hits origin. This "lower → upper → shield → origin" funnel is the key to hit ratio and origin protection.

Key Techniques

1. Anycast + BGP: One IP, Nearest Ingress Worldwide

Core trade-off: use the routing layer for load balancing — you get zero-config failover, but give up precise control over which node a user lands on.

Principle: the same IP prefix is announced via BGP simultaneously from hundreds of PoPs worldwide. A user's packets are routed by internet routers to the "network-nearest" PoP (per BGP metrics like AS hop count) — usually also the lowest-latency one. This gives three free wins: nearest ingress (no GeoDNS guessing), DDoS dilution (attack traffic is spread across all PoPs, so per-PoP capacity = whole-network capacity), and automatic failover (when a PoP dies it withdraws its route announcement and BGP converges traffic to the next-nearest PoP within seconds — no DNS change, no TTL wait).

Trade-off (three nearest-ingress approaches):
Real-world cases:

2. Edge Cache Key + Tiered Cache

Core trade-off: more edge nodes dilute the hit ratio; tiering funnels misses and protects origin, at the cost of an extra hop.

Principle: a CDN indexes objects by cache key (default: host + path + part of the query). But when hundreds of PoPs cache independently, each must fetch from origin on its first request — the hit ratio gets "diluted" and origin still gets hit by misses amplified hundreds-fold. Tiered Cache splits PoPs into lower tier (near users) and upper tier (a few large nodes): a lower-tier miss asks the upper tier first, and only an upper-tier miss is allowed to fetch from origin. So origin sees requests from a few upper-tier nodes, not all edges. Combined with request coalescing (concurrent misses for the same key within a node collapse into one fetch, the rest waiting on the result), this crushes the thundering herd.

Trade-off:
# Pseudo-code: edge node handles a request (with coalescing)
def handle(req):
    key = cache_key(req.host, req.path, normalize(req.query))
    obj = cache.get(key)
    if obj and not obj.expired():
        return obj                        # hit — the vast majority go here
    # miss: only one fetch per key, the rest wait (single-flight)
    with coalesce(key):
        obj = cache.get(key)              # double-check: someone may have just filled it
        if obj and not obj.expired():
            return obj
        obj = fetch_from_upper_or_origin(key)   # tiered: ask upper tier first
        cache.set(key, obj, ttl=obj.cache_control_ttl())
        return obj
Real-world cases:

3. Edge Invalidation: Purge Propagation + Stale Fallback

Core trade-off: active purge gives strong consistency but has propagation delay and amplification; TTL is simple but is either too stale or hammers origin.

Principle: an edge cache has hundreds of replicas, so "invalidate globally at once" is essentially a distributed broadcast problem. Three mechanisms: ① TTL expiry (passive, simplest, but short TTL = more fetches, long TTL = staler data); ② explicit purge (by URL / by tag / purge everything — actively pushes invalidation to all PoPs, seconds-fast but with propagation delay; purge-everything empties the cache instantly → origin gets hammered); ③ stale-while-revalidate / soft purge (marked stale but still serves the old value first, refreshing asynchronously in the background) — trading "brief staleness" for "users never wait on a fetch, origin never avalanches". In production it's usually a combo of long TTL + tag-based purge + SWR: tags for precise invalidation, TTL as backstop, SWR for experience.

Trade-off:
Real-world cases:

4. Edge Compute: Running Code at the PoP

Core trade-off: isolates have near-zero startup and high density but per-request limits; containers/VMs are more capable but cold-start slowly and pack less densely.

Principle: pushing auth, A/B routing, personalization, and API aggregation from origin down to the edge saves a full origin round-trip. But "running user code at hundreds of PoPs" demands extreme density and ultra-low cold start. Two routes: V8 isolate (Cloudflare Workers) — thousands of lightweight sandboxes in a single process, with near-zero cold start and tiny memory overhead, but hard per-request CPU/memory caps and no arbitrary native binaries; container / micro-VM (Lambda@Edge etc.) — close to a full runtime, but cold start in the hundreds of milliseconds and low per-node density. The other hard part at the edge is state: the edge is inherently shared-nothing, so strongly consistent data must either go back to origin or use dedicated edge storage (KV is eventually consistent; Durable Objects serialize at a single point).

Trade-off:
OptionCold startDensity/costLimits
V8 Isolate (Workers)~0 (no process boot)very highJS/WASM, hard CPU/mem caps
Container/micro-VM (Lambda@Edge)hundreds of mslowfull power but costly, fewer PoPs
Pure CDN rules (no code)highestonly declarative rewrite/routing
Real-world cases:

Scaling & Optimization

Pitfalls + Interview Follow-ups

Further Reading

Going Deeper

1. Hit ratio is 90% and origin takes 200K RPS. If hit ratio drops to 80%, how much does origin pressure grow? Why is this a "leading" alert?

Origin takes the miss traffic = total RPS × (1 − hit ratio). At 2M RPS: 90% hit → 200K misses; 80% hit → 400K misses, doubled. Every 10-point drop in hit ratio raises origin load by far more than 10% in relative terms — it's non-linear: from 99% to 98%, misses double outright (1% → 2%).

So hit ratio is a leading indicator of an origin avalanche: by the time origin CPU alarms fire, the cache layer has often been failing for a while. Common causes: cache key accidentally including params, mass purges, a new release changing Cache-Control, or hot content's TTLs all expiring together (add TTL jitter). Alert on the hit ratio itself, not just origin.

2. A price change must be "globally consistent within seconds", but CDN purge has propagation delay. How do you design so users never see a wrong price?

Key insight: don't bake the volatile price into strongly cacheable HTML. Separation strategy:

  • Separate skeleton from price: the product page skeleton (images, description) is strongly cached with long TTL; the price comes from a separate uncacheable API (or very short TTL) fetched live by edge/frontend. Then a price change needs no page purge.
  • Versioned URLs: if the price must be embedded, put a version/hash in the URL (/p/123?v=789); a price change swaps the URL — the old URL's cache naturally invalidates, no purge propagation wait.
  • Disable SWR for the price field: better to let this one request go to origin and wait tens of ms than to return a stale price.

The essence: replace "cache everything then scramble to invalidate fast" with "cache what's cacheable, fetch what's volatile live". Purge always has a window; eliminating the dependency architecturally is what's robust.

3. Why does Anycast "naturally resist DDoS" while a single data center can't? How do the capacity models differ?

A single data center's attack-resistance capacity = the bandwidth/scrubbing of that one data center. An attacker just has to exceed it to overwhelm you — forcing you to provision for the "single-point peak", which is extremely expensive.

With Anycast, the same IP is announced from hundreds of PoPs worldwide, and attack traffic is automatically dispersed by source geography via BGP across PoPs: attacks from Europe land on European PoPs, those from Asia on Asian PoPs. So effective attack-resistance capacity ≈ whole-network total, not a single point. To take you down, an attacker must overwhelm every PoP worldwide simultaneously. Add per-PoP scrubbing, and a single failure only loses one region (traffic shifts after BGP withdraws the route). This is why major CDNs make "Anycast + many PoPs" the foundation of DDoS defense.

The cost: you lose precise control over where traffic lands, and you need the operational ability to run global BGP and a vast PoP fleet — which is exactly why CDNs exist as a dedicated business.

4. Edge Workers use V8 isolates rather than containers. Looking at "cold start" and "multi-tenant security" together, what does this choice sacrifice?

An isolate runs thousands of V8 sandboxes within a single process, eliminating the process and kernel isolation overhead of containers/VMs — so cold start is near-zero, per-machine density is very high, and cost is low. The costs:

  • Weaker isolation (same process): containers rely on kernel namespaces/cgroups, micro-VMs on hardware virtualization; isolates rely only on V8's language-level isolation within one process, so a V8 sandbox-escape bug has a larger blast radius. It needs Spectre-class side-channel mitigations, re-grouping at the process level, etc.
  • Limited capabilities: only JS/WASM, no arbitrary native binaries, no local threads/filesystem, hard per-request CPU/memory caps — heavy compute / long tasks don't fit.
  • No persistent local state: isolates can be reclaimed anytime, so state must be externalized (KV eventually consistent / Durable Object single-point serialization).

So the choice is "massive lightweight short requests → isolate; heavy logic / full runtime needed → container/micro-VM". Cloudflare bets that the vast majority of edge cases are the former.

5. One CDN config pushes globally at once — both an advantage and a "global blast radius". How do you make edge deploys as safe as backend ones?

Both Cloudflare 2019 and Fastly 2021 were "one change took effect globally and instantly" network-wide incidents. Govern the edge as your highest-risk deploy surface:

  • Stage config too: roll out rules/Workers/cache config by PoP or percentage (1% PoPs first → monitor → ramp), not all at once. Cloudflare's post-mortem added exactly this.
  • Kill switch / fast rollback: every edge feature has a global toggle, killable in one click, with a rollback path that doesn't depend on the failing component.
  • Resource-exhaustion guards: cap CPU/memory/regex-backtracking per rule/Worker so a single piece of logic can't drag down a node (2019 was exactly a regex pegging CPU).
  • Pre-deploy static checks: run performance/security checks on regexes and configs before release to block "catastrophic backtracking" pre-deploy.
  • Multi-CDN backstop: be able to steer away when a single CDN fails, avoiding a vendor-level single point.

In a sentence: the edge's convenience comes from "global instant effect", and its safety comes from wrapping that global effect in staging, quotas, and switches.