You're designing the CDN + Edge layer for a global e-commerce / media site: 50M DAU across five continents. The content is a mix of static assets (images/JS/video segments, strongly cacheable) + semi-dynamic pages (product pages, stale-tolerant for seconds) + dynamic APIs (cart, uncacheable).
The core tension: the closer you push content to the user, the higher the hit ratio and lower the latency — but the harder invalidation becomes and the weaker consistency gets. CDN design is finding the balance on this axis.
graph LR
U["Global users
same Anycast IP"]
subgraph PoP["Nearest PoP (lower tier)"]
EDGE["Edge node
TLS term · WAF · Edge Worker"]
L1["Edge cache L1"]
end
UPPER["Regional upper-tier cache
upper tier"]
SHIELD["Origin Shield
single fetch funnel"]
ORG[("Origin
app + object store")]
U -->|BGP picks nearest| EDGE --> L1
L1 -.miss.-> UPPER
UPPER -.miss.-> SHIELD
SHIELD -.miss.-> ORG
classDef client fill:#1a2530,stroke:#64c8ff,color:#e8eef5
classDef edge fill:#0e2030,stroke:#5eead4,color:#e8eef5
classDef cache fill:#1a1a30,stroke:#ffb450,color:#e8eef5
classDef origin fill:#2a1530,stroke:#ff7ab6,color:#e8eef5
class U client
class EDGE,L1 edge
class UPPER,SHIELD cache
class ORG origin
Anycast steers users to the nearest PoP; tiering funnels misses toward origin — rightward nodes are fewer and hits more expensive
Component roles: the edge node catches traffic via Anycast, does the TLS handshake, WAF filtering, and runs Edge Workers; the edge cache returns directly on a hit (the vast majority of requests); on a miss it asks the regional upper-tier cache, then the Origin Shield (one designated PoP dedicated to fetching, collapsing misses from thousands of edges into a few origin requests), and only then hits origin. This "lower → upper → shield → origin" funnel is the key to hit ratio and origin protection.
Core trade-off: use the routing layer for load balancing — you get zero-config failover, but give up precise control over which node a user lands on.
Principle: the same IP prefix is announced via BGP simultaneously from hundreds of PoPs worldwide. A user's packets are routed by internet routers to the "network-nearest" PoP (per BGP metrics like AS hop count) — usually also the lowest-latency one. This gives three free wins: nearest ingress (no GeoDNS guessing), DDoS dilution (attack traffic is spread across all PoPs, so per-PoP capacity = whole-network capacity), and automatic failover (when a PoP dies it withdraws its route announcement and BGP converges traffic to the next-nearest PoP within seconds — no DNS change, no TTL wait).
Core trade-off: more edge nodes dilute the hit ratio; tiering funnels misses and protects origin, at the cost of an extra hop.
Principle: a CDN indexes objects by cache key (default: host + path + part of the query). But when hundreds of PoPs cache independently, each must fetch from origin on its first request — the hit ratio gets "diluted" and origin still gets hit by misses amplified hundreds-fold. Tiered Cache splits PoPs into lower tier (near users) and upper tier (a few large nodes): a lower-tier miss asks the upper tier first, and only an upper-tier miss is allowed to fetch from origin. So origin sees requests from a few upper-tier nodes, not all edges. Combined with request coalescing (concurrent misses for the same key within a node collapse into one fetch, the rest waiting on the result), this crushes the thundering herd.
# Pseudo-code: edge node handles a request (with coalescing)
def handle(req):
key = cache_key(req.host, req.path, normalize(req.query))
obj = cache.get(key)
if obj and not obj.expired():
return obj # hit — the vast majority go here
# miss: only one fetch per key, the rest wait (single-flight)
with coalesce(key):
obj = cache.get(key) # double-check: someone may have just filled it
if obj and not obj.expired():
return obj
obj = fetch_from_upper_or_origin(key) # tiered: ask upper tier first
cache.set(key, obj, ttl=obj.cache_control_ttl())
return obj
Core trade-off: active purge gives strong consistency but has propagation delay and amplification; TTL is simple but is either too stale or hammers origin.
Principle: an edge cache has hundreds of replicas, so "invalidate globally at once" is essentially a distributed broadcast problem. Three mechanisms: ① TTL expiry (passive, simplest, but short TTL = more fetches, long TTL = staler data); ② explicit purge (by URL / by tag / purge everything — actively pushes invalidation to all PoPs, seconds-fast but with propagation delay; purge-everything empties the cache instantly → origin gets hammered); ③ stale-while-revalidate / soft purge (marked stale but still serves the old value first, refreshing asynchronously in the background) — trading "brief staleness" for "users never wait on a fetch, origin never avalanches". In production it's usually a combo of long TTL + tag-based purge + SWR: tags for precise invalidation, TTL as backstop, SWR for experience.
stale-while-revalidate as a standard directive to smooth origin spikes.Core trade-off: isolates have near-zero startup and high density but per-request limits; containers/VMs are more capable but cold-start slowly and pack less densely.
Principle: pushing auth, A/B routing, personalization, and API aggregation from origin down to the edge saves a full origin round-trip. But "running user code at hundreds of PoPs" demands extreme density and ultra-low cold start. Two routes: V8 isolate (Cloudflare Workers) — thousands of lightweight sandboxes in a single process, with near-zero cold start and tiny memory overhead, but hard per-request CPU/memory caps and no arbitrary native binaries; container / micro-VM (Lambda@Edge etc.) — close to a full runtime, but cold start in the hundreds of milliseconds and low per-node density. The other hard part at the edge is state: the edge is inherently shared-nothing, so strongly consistent data must either go back to origin or use dedicated edge storage (KV is eventually consistent; Durable Objects serialize at a single point).
| Option | Cold start | Density/cost | Limits |
|---|---|---|---|
| V8 Isolate (Workers) | ~0 (no process boot) | very high | JS/WASM, hard CPU/mem caps |
| Container/micro-VM (Lambda@Edge) | hundreds of ms | low | full power but costly, fewer PoPs |
| Pure CDN rules (no code) | — | highest | only declarative rewrite/routing |
Vary.Origin takes the miss traffic = total RPS × (1 − hit ratio). At 2M RPS: 90% hit → 200K misses; 80% hit → 400K misses, doubled. Every 10-point drop in hit ratio raises origin load by far more than 10% in relative terms — it's non-linear: from 99% to 98%, misses double outright (1% → 2%).
So hit ratio is a leading indicator of an origin avalanche: by the time origin CPU alarms fire, the cache layer has often been failing for a while. Common causes: cache key accidentally including params, mass purges, a new release changing Cache-Control, or hot content's TTLs all expiring together (add TTL jitter). Alert on the hit ratio itself, not just origin.
Key insight: don't bake the volatile price into strongly cacheable HTML. Separation strategy:
/p/123?v=789); a price change swaps the URL — the old URL's cache naturally invalidates, no purge propagation wait.The essence: replace "cache everything then scramble to invalidate fast" with "cache what's cacheable, fetch what's volatile live". Purge always has a window; eliminating the dependency architecturally is what's robust.
A single data center's attack-resistance capacity = the bandwidth/scrubbing of that one data center. An attacker just has to exceed it to overwhelm you — forcing you to provision for the "single-point peak", which is extremely expensive.
With Anycast, the same IP is announced from hundreds of PoPs worldwide, and attack traffic is automatically dispersed by source geography via BGP across PoPs: attacks from Europe land on European PoPs, those from Asia on Asian PoPs. So effective attack-resistance capacity ≈ whole-network total, not a single point. To take you down, an attacker must overwhelm every PoP worldwide simultaneously. Add per-PoP scrubbing, and a single failure only loses one region (traffic shifts after BGP withdraws the route). This is why major CDNs make "Anycast + many PoPs" the foundation of DDoS defense.
The cost: you lose precise control over where traffic lands, and you need the operational ability to run global BGP and a vast PoP fleet — which is exactly why CDNs exist as a dedicated business.
An isolate runs thousands of V8 sandboxes within a single process, eliminating the process and kernel isolation overhead of containers/VMs — so cold start is near-zero, per-machine density is very high, and cost is low. The costs:
So the choice is "massive lightweight short requests → isolate; heavy logic / full runtime needed → container/micro-VM". Cloudflare bets that the vast majority of edge cases are the former.
Both Cloudflare 2019 and Fastly 2021 were "one change took effect globally and instantly" network-wide incidents. Govern the edge as your highest-risk deploy surface:
In a sentence: the edge's convenience comes from "global instant effect", and its safety comes from wrapping that global effect in staging, quotas, and switches.