Day 28 Hard CDN / Edge Anycast Tiered Cache Edge Compute

CDN & Edge — Pushing Compute and Cache Within 50km of the UserAnycast, Tiered Cache, Edge Invalidation, Edge Compute

Scenario + Requirements

You're designing the CDN + Edge layer for a global e-commerce / media site: 50M DAU across five continents. The content is a mix of static assets (images/JS/video segments, strongly cacheable) + semi-dynamic pages (product pages, stale-tolerant for seconds) + dynamic APIs (cart, uncacheable).

Latency SLO: static assets p95 < 50ms (must hit the edge); dynamic APIs do TLS termination + routing at the edge, then ride an optimized backbone to origin.
Peak traffic: 2M RPS globally during promotions, ~90% cacheable — meaning origin should see only ~200K RPS. The cache layer must absorb an order of magnitude.
DDoS resilience: the edge must soak up Tbps-scale attack traffic, and a single PoP going down must not take the whole site offline.
Invalidation latency: after a price change, global edge caches must reflect the new price within seconds (a wrong price = real money lost).

The core tension: the closer you push content to the user, the higher the hit ratio and lower the latency — but the harder invalidation becomes and the weaker consistency gets. CDN design is finding the balance on this axis.

High-Level Architecture

graph LR
    U["Global users
same Anycast IP"]
    subgraph PoP["Nearest PoP (lower tier)"]
      EDGE["Edge node
TLS term · WAF · Edge Worker"]
      L1["Edge cache L1"]
    end
    UPPER["Regional upper-tier cache
upper tier"]
    SHIELD["Origin Shield
single fetch funnel"]
    ORG[("Origin
app + object store")]

    U -->|BGP picks nearest| EDGE --> L1
    L1 -.miss.-> UPPER
    UPPER -.miss.-> SHIELD
    SHIELD -.miss.-> ORG

    classDef client fill:#1a2530,stroke:#64c8ff,color:#e8eef5
    classDef edge fill:#0e2030,stroke:#5eead4,color:#e8eef5
    classDef cache fill:#1a1a30,stroke:#ffb450,color:#e8eef5
    classDef origin fill:#2a1530,stroke:#ff7ab6,color:#e8eef5
    class U client
    class EDGE,L1 edge
    class UPPER,SHIELD cache
    class ORG origin

Anycast steers users to the nearest PoP; tiering funnels misses toward origin — rightward nodes are fewer and hits more expensive

Component roles: the edge node catches traffic via Anycast, does the TLS handshake, WAF filtering, and runs Edge Workers; the edge cache returns directly on a hit (the vast majority of requests); on a miss it asks the regional upper-tier cache, then the Origin Shield (one designated PoP dedicated to fetching, collapsing misses from thousands of edges into a few origin requests), and only then hits origin. This "lower → upper → shield → origin" funnel is the key to hit ratio and origin protection.

Key Techniques

1. Anycast + BGP: One IP, Nearest Ingress Worldwide

Core trade-off: use the routing layer for load balancing — you get zero-config failover, but give up precise control over which node a user lands on.

Principle: the same IP prefix is announced via BGP simultaneously from hundreds of PoPs worldwide. A user's packets are routed by internet routers to the "network-nearest" PoP (per BGP metrics like AS hop count) — usually also the lowest-latency one. This gives three free wins: nearest ingress (no GeoDNS guessing), DDoS dilution (attack traffic is spread across all PoPs, so per-PoP capacity = whole-network capacity), and automatic failover (when a PoP dies it withdraws its route announcement and BGP converges traffic to the next-nearest PoP within seconds — no DNS change, no TTL wait).

Trade-off (three nearest-ingress approaches):

Anycast: ✅ fast failover, DDoS-resilient, transparent to clients; ❌ routing is decided by carriers, you can't control exact assignment; BGP re-routing can move a long-lived connection (TCP/WebSocket) mid-flight to a different PoP and break it (short/stateless requests are fine).
GeoDNS (DNS returns different IPs by geography): ✅ fine-grained control; ❌ subject to DNS caching / resolver location (wrong geo when a user is on 8.8.8.8), failover dragged by TTL (minutes).
Unicast + GSLB: ✅ most controllable; ❌ heavy ops, poor scalability.

Real-world cases:

Cloudflare: almost entirely Anycast, one IP announced from all PoPs; it has even evolved to "servers no longer own IPs", with the Anycast edge as the unified ingress.
Google Public DNS / major authoritative DNS: 8.8.8.8 is Anycast, responding from the nearest location worldwide.
Major CDNs (Cloudflare/Fastly): use Anycast so any PoP can absorb DDoS, turning "single-point defense" into "whole-network defense".

2. Edge Cache Key + Tiered Cache

Core trade-off: more edge nodes dilute the hit ratio; tiering funnels misses and protects origin, at the cost of an extra hop.

Principle: a CDN indexes objects by cache key (default: host + path + part of the query). But when hundreds of PoPs cache independently, each must fetch from origin on its first request — the hit ratio gets "diluted" and origin still gets hit by misses amplified hundreds-fold. Tiered Cache splits PoPs into lower tier (near users) and upper tier (a few large nodes): a lower-tier miss asks the upper tier first, and only an upper-tier miss is allowed to fetch from origin. So origin sees requests from a few upper-tier nodes, not all edges. Combined with request coalescing (concurrent misses for the same key within a node collapse into one fetch, the rest waiting on the result), this crushes the thundering herd.

Trade-off:

Flat cache (each PoP fetches origin): ✅ shortest path, simplest; ❌ origin amplified by edge count, low hit ratio for long-tail content.
Tiered Cache: ✅ origin requests collapse by an order of magnitude, long-tail hit ratio improves; ❌ an extra "lower→upper" RTT on a lower-tier miss, needs good topology selection (which upper tier).
Cache key design: too fine (all query/cookies) → hit ratio collapses; too coarse (ignoring key params) → wrong content returned. Must normalize explicitly (sort query, strip tracking params, Vary as needed).

# Pseudo-code: edge node handles a request (with coalescing)
def handle(req):
    key = cache_key(req.host, req.path, normalize(req.query))
    obj = cache.get(key)
    if obj and not obj.expired():
        return obj                        # hit — the vast majority go here
    # miss: only one fetch per key, the rest wait (single-flight)
    with coalesce(key):
        obj = cache.get(key)              # double-check: someone may have just filled it
        if obj and not obj.expired():
            return obj
        obj = fetch_from_upper_or_origin(key)   # tiered: ask upper tier first
        cache.set(key, obj, ttl=obj.cache_control_ttl())
        return obj

Real-world cases:

Cloudflare Tiered Cache / Argo: splits data centers into tiers where only the upper tier fetches origin, and Smart Topology auto-picks the nearest upper tier per origin using latency data.
Netflix Open Connect: embeds cache appliances (OCAs) directly inside ISP facilities, pre-filling popular content off-peak so playback hits almost entirely within the ISP's network.
Fastly: known for "instant global purge + high hit ratio", well suited to frequently changing semi-dynamic content.

3. Edge Invalidation: Purge Propagation + Stale Fallback

Core trade-off: active purge gives strong consistency but has propagation delay and amplification; TTL is simple but is either too stale or hammers origin.

Principle: an edge cache has hundreds of replicas, so "invalidate globally at once" is essentially a distributed broadcast problem. Three mechanisms: ① TTL expiry (passive, simplest, but short TTL = more fetches, long TTL = staler data); ② explicit purge (by URL / by tag / purge everything — actively pushes invalidation to all PoPs, seconds-fast but with propagation delay; purge-everything empties the cache instantly → origin gets hammered); ③ stale-while-revalidate / soft purge (marked stale but still serves the old value first, refreshing asynchronously in the background) — trading "brief staleness" for "users never wait on a fetch, origin never avalanches". In production it's usually a combo of long TTL + tag-based purge + SWR: tags for precise invalidation, TTL as backstop, SWR for experience.

Trade-off:

Pure short TTL: ✅ no purge infrastructure needed; ❌ origin continuously re-fetched, and still an "up to one TTL" window of inconsistency.
Purge by tag: ✅ invalidate a related group at once (e.g. all pages for a product); ❌ requires maintaining object→tag mapping, propagation is not instant.
SWR / soft purge: ✅ users always get a fast response, origin refreshes smoothly; ❌ users may briefly see stale values — be careful with prices (bypass SWR for sensitive fields).

Real-world cases:

Fastly: known for millisecond-level instant purge and surrogate-key (tag) invalidation; e-commerce often uses its key-based bulk invalidation.
Cloudflare: supports purge by URL / tag / everything, and treats stale-while-revalidate as a standard directive to smooth origin spikes.

4. Edge Compute: Running Code at the PoP

Core trade-off: isolates have near-zero startup and high density but per-request limits; containers/VMs are more capable but cold-start slowly and pack less densely.

Principle: pushing auth, A/B routing, personalization, and API aggregation from origin down to the edge saves a full origin round-trip. But "running user code at hundreds of PoPs" demands extreme density and ultra-low cold start. Two routes: V8 isolate (Cloudflare Workers) — thousands of lightweight sandboxes in a single process, with near-zero cold start and tiny memory overhead, but hard per-request CPU/memory caps and no arbitrary native binaries; container / micro-VM (Lambda@Edge etc.) — close to a full runtime, but cold start in the hundreds of milliseconds and low per-node density. The other hard part at the edge is state: the edge is inherently shared-nothing, so strongly consistent data must either go back to origin or use dedicated edge storage (KV is eventually consistent; Durable Objects serialize at a single point).

Trade-off:

Option	Cold start	Density/cost	Limits
V8 Isolate (Workers)	~0 (no process boot)	very high	JS/WASM, hard CPU/mem caps
Container/micro-VM (Lambda@Edge)	hundreds of ms	low	full power but costly, fewer PoPs
Pure CDN rules (no code)	—	highest	only declarative rewrite/routing

Real-world cases:

Cloudflare Workers: uses V8 isolates instead of containers to achieve near-zero cold start while running user code at the edge at scale; KV / Durable Objects provide edge state.
AWS Lambda@Edge / CloudFront Functions: the former runs full Lambda (heavy logic), the latter runs lightweight JS (header rewrites and other high-frequency, low-latency cases).
Fastly Compute: uses a WebAssembly sandbox, emphasizing secure isolation and fast instantiation.

Scaling & Optimization

Dynamic content acceleration: even uncacheable APIs benefit from a CDN — after the edge terminates TLS, traffic rides the CDN's optimized backbone (persistent connections + private routes) to origin, more stable and faster than a direct public-internet hop (the Argo Smart Routing idea).
Origin Shield: as traffic grows, designate one funnel PoP per origin to shave origin RPS by another order of magnitude and smooth origin scaling pressure.
Multi-CDN: use GSLB/RUM data to steer between multiple CDNs by performance and availability, so a single CDN outage (see the Fastly incident below) doesn't take the whole site down.
Edge observability: instrument hit ratio, origin RPS, per-PoP p95, and purge propagation latency; a dropping hit ratio is the leading indicator of an impending origin avalanche.

Pitfalls + Interview Follow-ups

"CDN = static asset caching only": wrong. Modern CDNs also do TLS termination, DDoS scrubbing, edge compute, and dynamic acceleration; you should distinguish "cache" from "edge platform".
Including cookies/tracking params in the cache key: the hit ratio drops to zero instantly and origin gets hammered — a classic production incident. Normalize explicitly and control Vary.
Using purge-everything as a routine tool: emptying the whole network at once = origin instantly absorbs all misses = cache avalanche. Prefer tag/URL-level purge with SWR as backstop.
Ignoring that the edge is a "global blast radius": one line of CDN config/rule pushes to the whole world at once. Cloudflare 2019-07-02: a WAF regex triggered catastrophic backtracking, pegging CPU to 100% and taking the network down for ~27 minutes with traffic down ~82% (official post-mortem); Fastly 2021-06-08: a dormant software bug triggered by one customer's config knocked out ~85% of the network for nearly an hour (official summary). The interview answer: config needs staged rollout and a kill switch too.
Anycast and long-lived connections: the follow-up "why can't some cases use pure Anycast" — BGP re-routing can move a TCP/WebSocket mid-flight to another PoP and break it, needing session affinity or edge forwarding.

Going Deeper

1. Hit ratio is 90% and origin takes 200K RPS. If hit ratio drops to 80%, how much does origin pressure grow? Why is this a "leading" alert?

Origin takes the miss traffic = total RPS × (1 − hit ratio). At 2M RPS: 90% hit → 200K misses; 80% hit → 400K misses, doubled. Every 10-point drop in hit ratio raises origin load by far more than 10% in relative terms — it's non-linear: from 99% to 98%, misses double outright (1% → 2%).

So hit ratio is a leading indicator of an origin avalanche: by the time origin CPU alarms fire, the cache layer has often been failing for a while. Common causes: cache key accidentally including params, mass purges, a new release changing Cache-Control, or hot content's TTLs all expiring together (add TTL jitter). Alert on the hit ratio itself, not just origin.

2. A price change must be "globally consistent within seconds", but CDN purge has propagation delay. How do you design so users never see a wrong price?

Key insight: don't bake the volatile price into strongly cacheable HTML. Separation strategy:

Separate skeleton from price: the product page skeleton (images, description) is strongly cached with long TTL; the price comes from a separate uncacheable API (or very short TTL) fetched live by edge/frontend. Then a price change needs no page purge.
Versioned URLs: if the price must be embedded, put a version/hash in the URL (/p/123?v=789); a price change swaps the URL — the old URL's cache naturally invalidates, no purge propagation wait.
Disable SWR for the price field: better to let this one request go to origin and wait tens of ms than to return a stale price.

The essence: replace "cache everything then scramble to invalidate fast" with "cache what's cacheable, fetch what's volatile live". Purge always has a window; eliminating the dependency architecturally is what's robust.

3. Why does Anycast "naturally resist DDoS" while a single data center can't? How do the capacity models differ?

A single data center's attack-resistance capacity = the bandwidth/scrubbing of that one data center. An attacker just has to exceed it to overwhelm you — forcing you to provision for the "single-point peak", which is extremely expensive.

With Anycast, the same IP is announced from hundreds of PoPs worldwide, and attack traffic is automatically dispersed by source geography via BGP across PoPs: attacks from Europe land on European PoPs, those from Asia on Asian PoPs. So effective attack-resistance capacity ≈ whole-network total, not a single point. To take you down, an attacker must overwhelm every PoP worldwide simultaneously. Add per-PoP scrubbing, and a single failure only loses one region (traffic shifts after BGP withdraws the route). This is why major CDNs make "Anycast + many PoPs" the foundation of DDoS defense.

The cost: you lose precise control over where traffic lands, and you need the operational ability to run global BGP and a vast PoP fleet — which is exactly why CDNs exist as a dedicated business.

4. Edge Workers use V8 isolates rather than containers. Looking at "cold start" and "multi-tenant security" together, what does this choice sacrifice?

An isolate runs thousands of V8 sandboxes within a single process, eliminating the process and kernel isolation overhead of containers/VMs — so cold start is near-zero, per-machine density is very high, and cost is low. The costs:

Weaker isolation (same process): containers rely on kernel namespaces/cgroups, micro-VMs on hardware virtualization; isolates rely only on V8's language-level isolation within one process, so a V8 sandbox-escape bug has a larger blast radius. It needs Spectre-class side-channel mitigations, re-grouping at the process level, etc.
Limited capabilities: only JS/WASM, no arbitrary native binaries, no local threads/filesystem, hard per-request CPU/memory caps — heavy compute / long tasks don't fit.
No persistent local state: isolates can be reclaimed anytime, so state must be externalized (KV eventually consistent / Durable Object single-point serialization).

So the choice is "massive lightweight short requests → isolate; heavy logic / full runtime needed → container/micro-VM". Cloudflare bets that the vast majority of edge cases are the former.

5. One CDN config pushes globally at once — both an advantage and a "global blast radius". How do you make edge deploys as safe as backend ones?

Both Cloudflare 2019 and Fastly 2021 were "one change took effect globally and instantly" network-wide incidents. Govern the edge as your highest-risk deploy surface:

Stage config too: roll out rules/Workers/cache config by PoP or percentage (1% PoPs first → monitor → ramp), not all at once. Cloudflare's post-mortem added exactly this.
Kill switch / fast rollback: every edge feature has a global toggle, killable in one click, with a rollback path that doesn't depend on the failing component.
Resource-exhaustion guards: cap CPU/memory/regex-backtracking per rule/Worker so a single piece of logic can't drag down a node (2019 was exactly a regex pegging CPU).
Pre-deploy static checks: run performance/security checks on regexes and configs before release to block "catastrophic backtracking" pre-deploy.
Multi-CDN backstop: be able to steer away when a single CDN fails, avoiding a vendor-level single point.

In a sentence: the edge's convenience comes from "global instant effect", and its safety comes from wrapping that global effect in staging, quotas, and switches.

← Back to index

Scenario + Requirements

High-Level Architecture

Key Techniques

1. Anycast + BGP: One IP, Nearest Ingress Worldwide

2. Edge Cache Key + Tiered Cache

3. Edge Invalidation: Purge Propagation + Stale Fallback

4. Edge Compute: Running Code at the PoP

Scaling & Optimization

Pitfalls + Interview Follow-ups

Further Reading

Going Deeper