Day 26 Medium Capacity Estimation Back-of-Envelope QPS / Storage / Bandwidth

Capacity Estimation — Proving on One Napkin Whether a Design HoldsCapacity Estimation: QPS, Storage, Bandwidth & the Discipline of Assumptions

Scenario + Constraints

The interviewer says: "Design a photo-sharing feed (Instagram-like) for 100M DAU." Before you draw a single box, answer one question: is this a one-machine job or a thousand-machine job? The answer dictates the architecture — single DB vs sharding, synchronous fanout vs async, how large the cache needs to be. Capacity estimation (back-of-the-envelope, BOTE) uses a few assumptions and order-of-magnitude constants to land QPS, storage, bandwidth and memory within 10× of the truth in two minutes, collapsing countless candidate architectures down to one or two.

Its purpose is not precision — precision is the job of load tests and real metrics. Its purpose is to eliminate the obviously infeasible: if you compute 200 GB/s of write bandwidth, the "single machine + local disk" plan is dead on the spot. Fix the input assumptions for this issue:

Scale: 100M DAU; 2 posts/user/day (write), 50 feed views/user/day (read).
Read:write ratio: ~50:2 = 25:1 — read-heavy, as nearly all feed/social systems are.
Object: a post's metadata (text + author + timestamp + counts) ≈ 1 KB; images go to object storage/CDN, estimated separately.
Latency SLO: feed first paint p99 < 200 ms.

High-Level: the Derivation Chain

Capacity estimation is not a pile of numbers — it is a directed derivation chain: every quantity flows from two root assumptions, DAU and "per-user behavior." Each arrow below is one multiply/divide. Memorize the chain and you never miss a term.

graph TD
    DAU["DAU = 100M
+ per-user behavior"]
    W["writes/day
200M"]
    R["reads/day
5B"]
    AQ["avg QPS
÷ 1e5 s/day"]
    PQ["peak QPS
× peak factor 2~3"]
    ST["storage
writes × obj size × retention"]
    BW["bandwidth
QPS × payload"]
    MEM["cache memory
hot data × 80/20"]

    DAU --> W --> AQ
    DAU --> R --> AQ
    AQ --> PQ
    W --> ST
    R --> BW
    R --> MEM

    classDef root fill:#2a1530,stroke:#ff7ab6,color:#e8eef5
    classDef mid fill:#1a2530,stroke:#64c8ff,color:#e8eef5
    classDef out fill:#0e2030,stroke:#5eead4,color:#e8eef5
    class DAU root
    class W,R,AQ mid
    class PQ,ST,BW,MEM out

Two root assumptions (scale + per-user behavior) decide everything; QPS sets compute, storage/bandwidth/memory set the shape

Key Techniques

1. Order-of-Magnitude Intuition: Powers of Two, Time Constants, Latency Numbers

Core trade-off: memorize a tiny set of constants → mental-math speed, at the cost of ±2× error. The bedrock is three groups of "free" numbers. First, powers of two and data units: 2^10≈10^3=KB, 2^20=MB, 2^30=GB, 2^40=TB, 2^50=PB. Second, time constants: a day ≈ 86400 ≈ 10^5 seconds (this approximation turns "N times per day" into "÷100k = QPS," the single most-used step in estimation). Third, Jeff Dean's latency numbers: they tell you which step is the bottleneck.

Operation	Magnitude	Meaning
L1 / main-memory reference	~1 ns / ~100 ns	CPU-local, essentially free
Sequential memory read 1 MB	~microseconds (µs)	memory bandwidth is huge
SSD random read	~16 µs	100× slower than memory
Same-datacenter round trip (RTT)	~0.5 ms	floor for one RPC
Disk seek / SSD seq read 1 MB	~ms	batch beats random
Cross-continent RTT (CA↔NL)	~150 ms	speed of light, can't optimize

Why it matters: in a feed with p99 < 200 ms, a single cross-continent RTT eats 150 ms — so you must deploy close to users and never let a request circle the globe. Latency numbers let you see at a glance which designs are physically impossible.

Real-world: Jeff Dean made these the baseline of Google's engineering culture (Latency Numbers Every Programmer Should Know); Colin Scott built an interactive version showing how they shift by year. ByteByteGo (Alex Xu, System Design Interview, Ch. 2) compiles "powers of two + availability nines + QPS derivation" into an interview cheat sheet.

2. From DAU to QPS: Average, Peak, and Read/Write Split

Core trade-off: estimating with average QPS is easy, but a system provisioned for the average dies at peak. Three steps: ① requests/day = DAU × times-per-user; ② average QPS = requests/day ÷ 10^5; ③ peak QPS = average × peak factor. The peak factor comes from intraday unevenness — evening peaks, lunch, viral spikes; social apps typically use 2~3×, while flash sales/livestream openings can exceed 10×.

# This issue's numbers (100M DAU)
write: 1e8 × 2  = 2e8 /day → avg 2,000 QPS → peak ~5k  write QPS
read:  1e8 × 50 = 5e9 /day → avg 50k QPS   → peak ~100k read QPS
read:write ≈ 25:1  →  the read path is the design center (cache, replicas, fanout)

Which number drives which decision:

Peak read QPS (100k) → sizes the cache layer and read replicas; one Redis node handles ~100k QPS, so you must shard.
Peak write QPS (5k) → a single Postgres can do thousands of TPS, so the write primary need not shard yet — the classic case of estimation preventing over-engineering.
Read:write 25:1 → justifies caching and fanout-on-write (do more work at write time to make reads cheap). At 1:1 the cache payoff is doubtful.

Real-world: Twitter has publicly described its timeline as extreme read-heavy (~hundreds of thousands of read QPS vs thousands of write QPS) — exactly the ratio that birthed precomputed, fanout-on-write timelines (see Day 14). In interviews, stating a peak factor and saying out loud "I assume the evening peak is 2× the mean" scores more than quoting a bare number.

3. Storage, Bandwidth, Memory: Compute All Three at Once

Core trade-off: estimate "metadata" and "large objects/media" separately, or one order-of-magnitude slip wrecks the whole table. Each quantity has a formula; the key is picking the right multiplier.

# storage = writes/day × object size × retention
metadata: 2e8/day × 1KB × 365 × 5yr ≈ 365 TB  → shard + hot/cold tiering
media:    assume 30% of posts carry an image, 200KB each
          2e8 × 0.3 × 200KB ≈ 12 TB/day → object store (S3) + CDN, not in DB

# bandwidth = peak QPS × bytes per response
read egress: 100k QPS × (20 posts × 1KB) = 2 GB/s ≈ 16 Gbps (metadata only)
             media bandwidth rides the CDN, absorbed at the edge, not origin

# cache memory = hot data × hit target
hot timelines: 20% DAU = 20M active × 200 ids/user × ~50B
             ≈ 200 GB → fits a mid-size Redis cluster

Repeatedly botched multipliers: ① retention — 5 years vs forever differs by tens of times; ② replica count — storage ×3 (three replicas), never quote raw capacity; ③ media not split out — counting a 200 KB image as 1 KB throws storage and bandwidth off by an order of magnitude.

Real-world: offloading large bytes to object storage/CDN is industry consensus — Instagram, YouTube and Netflix push all "big bytes" to blob + CDN, while the origin stores and serves only metadata and indexes. Splitting these two classes during estimation is what separates you from the candidate who only recites components.

4. The Discipline of Assumptions: State Them, Source Them, Sensitivity-Check

Core trade-off: an estimate's credibility lies not in decimals but in whether the assumptions are stated and survive "what if this number were off by 2×?" A good estimate writes every input explicitly with its source: which are given (DAU), which are industry common sense (read:write), which you simply guessed (peak factor). Then run a sensitivity check — which assumption, if off by 2×, flips the conclusion? That one is the one to confirm with the interviewer, or to monitor closely in production.

# Assumption list (the real deliverable of estimation)
DAU          = 1e8     [given]
posts/user/d = 2       [assumed · IG-like median user]
views/user/d = 50      [assumed · on the high side, a sensitive item] ← 2× off → read QPS doubles
peak factor  = 2.5     [assumed · evening-peak heuristic]
object size  = 1 KB    [metadata; media counted separately]
retention    = 5 yr    [product decision, to confirm]

sensitivity: read QPS is linear in "views/user"  → verify first
             storage   is linear in "retention"  → confirm with product

Interview vs production — two different kinds of "computing":

Interview: BOTE + justify out loud. The goal is to show judgment — give assumptions, derive cleanly, land in the right order of magnitude; nobody checks whether it's 2,000 or 2,300 QPS.
Production: real metrics + load tests + observability regression (see Day 21 / Day 27). BOTE is only the starting point of capacity planning — used to set load-test targets and an initial machine count, after which dashboards, load tests and autoscaling converge the error to reality.

Real-world: in SRE practice, BOTE estimates are used to "set an expected order of magnitude for load tests" and to draft capacity plans; the actual scaling decisions are driven by SLO monitoring and load-test data. Estimation's job is to propose assumptions; production's job is to falsify them. They are not substitutes but a relay.

Scaling & Optimization (What Comes After the Estimate)

Translate numbers into machine counts: peak read 100k QPS ÷ per-node capacity (e.g. 10k QPS) ≈ 10 nodes + redundancy. This turns the estimate into a procurement/scaling list.
Find the binding quantity: which of the four outputs hits the wall first? Here it's read QPS and cache memory — so the next design focus is cache sharding and read replicas, not write-side sharding.
Leave growth headroom: after estimating at current DAU, multiply by a growth factor (e.g. doubling in 18 months) before sizing, to avoid scaling on launch day.
Use the estimate to choose architecture: if write QPS also reaches hundreds of thousands (log/IoT ingestion), the conclusion flips from "single write primary" to "Kafka + LSM storage + sharding." The ultimate value of estimation is to cut the option set down to one or two.

Pitfalls + Interview Questions

1. Quoting peak but forgetting to divide for it, or quoting average but forgetting to multiply. Always state whether the number is average or peak: provision for peak, cost for average.

2. Omitting replicas and overhead factors. Raw data ×3 replicas, ×(index + WAL overhead), plus headroom. When quoting "365 TB," say whether it's logical volume or physical footprint.

3. Lumping media in with metadata. A 200 KB image vs 1 KB of text differ 200×; split them, and media rides the CDN, not the origin.

4. Chasing precision, agonizing over 86400 vs 10^5. Estimation wants orders of magnitude; 10^5 is plenty. Spend the time on "which assumption is most sensitive."

5. Quoting numbers without stating assumptions. The interviewer can't judge you and can't discuss — the deliverable of estimation is the assumption chain, not a lone number.

Common follow-ups: ① Roughly how many seconds in a day, and why approximate to 10^5? ② Can this run on one machine — how do you tell at a glance? ③ If DAU grows 10×, which quantity breaks first and how does the architecture change? ④ Where did your peak factor come from, and how does the conclusion change if it's off by 2×? ⑤ Should storage include replicas and retention, and how do you estimate hot/cold tiering?

Further Resources

Designing Data-Intensive Applications (Martin Kleppmann): Ch. 1 reasons about Twitter timeline read/write volumes — a model of "numbers driving architecture choice."
Jeff Dean — Latency Numbers Every Programmer Should Know: the latency-magnitude baseline; Colin Scott's interactive version shows how they evolve by year.
ByteByteGo / Alex Xu, System Design Interview, Ch. 2 "Back-of-the-envelope Estimation": cheat sheet and worked examples for powers of two, availability nines, and QPS/storage derivation.

Going Deeper (click to expand)

1. Why does "a day ≈ 10^5 seconds" almost never cause trouble in estimation? When does it bite?

The truth is 86400 ≈ 8.64×10^4; using 10^5 overstates by ~16%. In estimation this is benignly wrong: dividing by 10^5 makes average QPS ~16% low, but we immediately multiply by a peak factor of 2~3, and that bias is utterly drowned by the peak factor's own uncertainty — your grip on the peak factor isn't anywhere near 16% precision. So at the order-of-magnitude level, zero impact.

Where it bites: when traffic is not spread across the day but highly concentrated — e.g. a flash sale that runs 5 minutes at 8am. Then "÷10^5" average QPS is meaningless; the real instantaneous QPS = day's total ÷ 300 seconds, possibly hundreds of times the "average." Lesson: amortizing over a day only holds for roughly uniform traffic; concentrated loads must use the actual active window as the denominator.

2. Same 100M DAU — why is write QPS only a few thousand (no sharding) while read QPS hits 100k (must shard)? Where does the asymmetry come from?

The root is read:write ratio compounded by amplification. First, views/user (50) is naturally an order of magnitude higher than writes/user (2). Second, one "read feed" on the backend is usually a fanout — one request → pull N posts + counts + follow graph — so read amplification means even more backend read ops. Third, writes have a natural ceiling (a human can't post hundreds of times a day), while reads can be scrolled endlessly — the product shape makes reads effectively uncapped.

So social/content systems are almost always "easy write DB, strained read path." This is exactly why their core weapons are caching, read replicas, precomputed timelines (fanout-on-write) — not write sharding. The counterexample is IoT/log ingestion — writes far exceed reads, and the architecture's center flips entirely to Kafka + LSM sharded writes. In one line: the read:write ratio decides which side you pile machines on.

3. You estimated metadata at 365 TB over 5 years. The interviewer asks "so how many machines" — how do you get from volume to machine count without stepping on a rake?

You can't just divide 365 TB by per-disk capacity. First inflate logical volume into physical footprint: ① ×3 replicas (HA) → ~1.1 PB; ② add indexes, WAL/redo, fragmentation and reserved space, often another ×1.3~2 in practice; ③ disks can't be filled — leave 30% headroom. Physical need can approach the 2 PB range.

Then divide by per-node effective capacity — and the bottleneck may not be capacity: a single DB node usually hits IOPS/connections/memory limits before disk space. Stuffing 10 TB on one node that can't sustain the QPS is routine. So the right answer is "size node count by whichever of capacity and QPS is tighter, then ×replicas." Quoting only "365 TB ÷ 16 TB ≈ 23 nodes" is a classic point-loser — it drops replicas, overhead and the QPS constraint all at once.

4. Estimation is only good to "within 10×." If it's that crude, how can it guide a multi-million-dollar architecture decision?

Because architecture options often differ by more than an order of magnitude, and the decision only needs to tell magnitudes apart. "One machine vs a thousand," "single write primary suffices vs needs distributed storage," "data fits in memory vs must spill to tiered disk" — these forks are 10×, 100× gulfs, and estimation's ±3× error can't cross them, so it's plenty to push you to the right side.

Conversely, estimation can't answer fine-tuning within a magnitude: 8 nodes or 12, cache at 50 GB or 80 GB — those differences are indistinguishable within estimation's precision and must go to load tests and production metrics. So the correct use is: use estimation to pick the architecture shape (coarse decision), use load tests and monitoring to fix the exact capacity (fine decision). Treating estimation as a precise budget, or refusing it for being imprecise, are both misuses.

5. If the interviewer swaps the constraint from "100M DAU social" to "10k enterprise tenants, 500 users each, B2B SaaS," how does your method change?

Total users = 10k × 500 = 5M, two orders smaller than consumer — but the center of gravity shifts. First, B2B "per-user behavior" is wholly different: high-frequency during work hours, near-zero at night, so the peak factor is more extreme (concentrated 9–18 on weekdays), yet spread across time zones smooths the peak — correct by tenant time-zone distribution. Second, the binding constraint moves from "total volume" to "per-tenant ceiling and tenant isolation": the largest tenant may be 100× the median (power-law), so estimating by the average lets a giant tenant crush shared resources — estimate the "largest tenant," not the "average tenant." Third, multi-tenant storage must count per-tenant overhead (the fixed cost of a dedicated schema/index × 10k); for small tenants, the sum of fixed overheads can exceed the data itself. In one line: consumer estimation looks at total volume and peak; B2B estimation looks at the tail of the tenant distribution and the cost of isolation (echoing Day 27, multi-tenancy and cost engineering).