Design the backend for a 100M-DAU following timeline (Twitter/X Home, Instagram Feed, Weibo following stream): you follow a few hundred accounts, and pull-to-refresh must return their newest, sorted posts within p99 < 200ms. The hard part isn't storing posts — it's the read/write amplification asymmetry: one post must reach millions of followers, while one refresh must aggregate and sort across hundreds of followees. The first question: do you do the merge work at write time or read time?
graph TD
POST["Post / Write API"] --> TW[("Tweet Store
source of truth")]
POST --> FO["Fanout service
look up social graph"]
FO -->|normal: push| TLC[("Home Timeline Cache
Redis · one bounded list/user")]
FO -.celeb: skip fanout.-> CEL[("Celebrity posts
pulled at read time")]
READ["Refresh / Read API"] --> MIX["Timeline mixing service"]
MIX -->|① read pre-materialized| TLC
MIX -->|② pull celeb latest| CEL
MIX --> RANK["Rank + diversity"]
RANK --> HYDRATE["Hydrate
fetch post bodies by id"]
HYDRATE --> TW
HYDRATE --> OUT["Return 20-post feed"]
classDef w fill:#1a2530,stroke:#64c8ff,color:#e8eef5
classDef cache fill:#1a1a30,stroke:#ffb450,color:#e8eef5
classDef store fill:#2a1530,stroke:#ff7ab6,color:#e8eef5
class POST,READ,FO,MIX,RANK,HYDRATE,OUT w
class TLC,CEL cache
class TW store
Write path pre-materializes normal users' timelines; read path merges "pre-materialized" + "celebrity real-time pull" then ranks
Component roles: Tweet Store is the source of truth for posts (a KV sharded by tweet_id). The Fanout service, on each post, looks up the social graph and pushes the tweet_id into every follower's Home Timeline Cache — that's "do more on write". Celebrities are the exception: fanout cost is too high, so their posts aren't pre-materialized and are pulled at read time. The mixing service is the crux: at read time it merges the pre-materialized timeline with celebrities' latest posts, ranks them, then hydrates only the top-N (timelines store only ids; bodies are fetched from Tweet Store to save memory).
One-line trade-off: trade "write amplification at post time" for "read amplification at refresh time" — you can only save on one end, and celebrities force you to manage both.
Principle: Push (fanout-on-write) writes the post id into every follower's timeline at post time, read takes it out in one shot — read O(1), write O(followers). Pull (fanout-on-read) precomputes nothing; on refresh it queries all your followees' recent posts and merge-sorts them — write O(1), read O(followees × per-query). The choice depends on read:write ratio and fanout distribution: read-heavy (100:1) with most users having few followers → push amortizes cost onto the sparse writes, a win; but a celebrity with tens of millions of followers triggers tens of millions of writes per post, and push collapses. So industry is almost always Hybrid: push for normal users, pull for celebrities, merge at read time.
| Push (write fanout) | Pull (read fanout) | Hybrid | |
|---|---|---|---|
| Post cost | high (O followers) | low (one write) | normal high / celeb low |
| Refresh cost | low (read one list) | high (merge N sources) | read 1 list + pull few celebs |
| New-post delay | after fanout done | real-time | normal lag / celeb real-time |
| Failure mode | celebrity write blowup | active-user read blowup | merge logic complexity |
One-line trade-off: push converts read cost into write cost, and write cost explodes linearly with follower count — unsustainable at celebrity scale.
Principle: the fanout write count for one post = the author's follower count. A normal person triggers a few hundred cache writes, easily absorbed; but an account with 50M followers triggers 50M Redis writes per post — even at 100µs each, serial is hours, and it instantly saturates cache-cluster bandwidth. Worse is thundering-herd overlap: celebrities have many online followers at once, so fanout storm and read storm collide. The fix is a fanout threshold: accounts above it join a "celebrity list", their posts skip fanout and are pulled and merged into each reader's timeline at read time. The cost is one extra "pull celeb latest + merge" on the read path, but it avoids the write-side avalanche.
# Hybrid fanout: split by threshold at post time (pseudo-code)
FANOUT_THRESHOLD = 100_000 # followers above this -> read-time pull
def on_post(author_id, tweet_id, ts):
tweet_store.put(tweet_id, ...) # always write source of truth first
n = social_graph.follower_count(author_id)
if n > FANOUT_THRESHOLD:
celeb_recent.zadd(author_id, ts, tweet_id) # celeb latest, pulled at read
return # no fanout
# normal user: async fanout, batched pipeline, don't block the post
for batch in chunks(social_graph.followers(author_id), 1000):
pipe = redis.pipeline()
for fid in batch:
pipe.zadd(f"tl:{fid}", ts, tweet_id)
pipe.zremrangebyrank(f"tl:{fid}", 0, -801) # bounded: keep newest 800
pipe.execute()
One-line trade-off: trade "one extra hydrate to fetch bodies at read time" for "shrinking the timeline cache memory by one or two orders of magnitude".
Principle: the home timeline is an ordered structure in Redis (a sorted set / list scored by time), one per user, storing only tweet_ids, not bodies. Why: ① a body is a few KB; storing 100M users × 800 entries of full bodies is a TB-scale memory disaster, while 8-byte ids shrink it 100×; ② posts can be edited/deleted, and a body redundantly copied into every timeline would need to be re-fanned-out on update. At read time you take the top-N ids and batch-hydrate bodies from Tweet Store (one MGET), filtering out deleted/blocked ones. The timeline must also be bounded — keep only the newest few hundred, truncate older, or an active big account's followers' timelines grow unboundedly. Cold data (scrolling far back) falls back to pull.
# Read path: merge + hydrate (pseudo-code)
def get_home_timeline(uid, limit=20):
ids = redis.zrevrange(f"tl:{uid}", 0, 400) # ① pre-materialized part
for cid in following_celebs(uid): # ② pull celeb live
ids += celeb_recent.zrevrange(cid, 0, 50)
ids = dedup(ids)
ranked = rank(uid, ids)[:limit] # ③ rank, take top-N only
tweets = tweet_store.mget(ranked) # ④ hydrate only top-N
return [t for t in tweets if visible(uid, t)] # ⑤ filter deleted/blocked
One-line trade-off: trade "engagement uplift" for "real-time, explainability, and compute budget" — the ranking model can only run on the small candidate set after aggregation.
Principle: early feeds were pure reverse-chronological — simple, real-time, predictable. After information overload, the shift was to ML ranking: after aggregating candidates (push-materialized + pull-celebs, a few hundred), a model predicts multiple engagement signals per post (like/comment/reshare/dwell probability), weighted into a score. This echoes Day 13's multi-stage funnel — the difference is feed candidates come from the follow graph not full-corpus retrieval, so the candidate set is far smaller (hundreds vs billions), allowing a fairly heavy ranking model directly. After ranking comes re-ranking to inject diversity (spread same-author, avoid a single-topic streak) and business rules (ad insertion, recency boosts).
Likely follow-ups: ① A celebrity with tens of millions of followers posts — how do you avoid blowing up the system? ② How do you set the push/pull threshold, on what metric? ③ When I newly follow someone, how do their historical posts enter my timeline (backfill)? ④ How do you budget the end-to-end post-to-visible latency? ⑤ Chronological vs ranked feed — what does each sacrifice, and which product should pick which?
Modeling: for a single account, push is worthwhile roughly when "the number of times it's read ≫ the number of fanout writes its posts trigger" — the crossover is essentially the marginal benefit of a fanout write: when one fanout write saves less expected read cost than the write itself costs, switch to pull.
Intuition: a normal person posts rarely and is refreshed repeatedly → reads ≫ writes → push (write once, save countless reads). A celebrity's follower count both raises write cost and, because their posts get buried by ranking with low per-person exposure, lowers write benefit — squeezed from both ends, they land in the pull region. This is the theoretical basis for the hybrid threshold.
Crux: B is a celebrity served by pull and isn't pre-materialized — so "at follow time" you actually don't need to write historical posts into A's timeline; A's next refresh will auto-merge B's latest. This is a hidden benefit of hybrid: following a celebrity needs no backfill.
If B is a normal account (push): B's historical posts aren't in A's pre-materialized timeline. Options: pull B's recent posts and merge at read time on first refresh (simple, common), or async zadd them into A's timeline in the background. Pitfall: don't backfill inside the synchronous "follow" request — if A follows a batch at once (contacts import), synchronous backfill amplifies into a mini fanout storm; it must be async, rate-limited, and respect the bounded truncation.
Key insight: the home timeline is derived data, not the source of truth — the real posts are in Tweet Store. Losing the timeline cache isn't "data loss", it's a "materialized view invalidation" that can be rebuilt.
Recovery: ① short-term degrade to pull — that shard's users refresh by live-merging their followees' latest, slow but usable, buying time for rebuild; ② background re-fanout rebuild, at the cost of a degraded read path + a write storm (rate-limit it). This also explains why timelines must be bounded — rebuild only needs to backfill the recent few hundred, not full history. Counter-lesson: if you'd made the timeline the only store (no Tweet Store), this would be true data loss — a derived layer must never be authoritative.
How it arises: under chronological, "follow = guaranteed delivery", users have a stable mental model. After switching to ranking, the model filters/downranks by predicted engagement, and low-interaction friends' posts get buried; users don't know ranking exists and just feel "I followed them but didn't see it". This is the inherent cost of a ranked feed: trading individual predictability and trust for aggregate engagement uplift.
Mitigation: ① give strong-tie/explicit subscriptions a guaranteed exposure slot; ② offer a "Latest" chronological view as a switchable tab (Twitter/Instagram both did this); ③ add diversity and coverage constraints in ranking so top-interaction accounts don't dominate; ④ clearly label "For You vs Latest". Fundamentally it's admitting offline engagement metrics ≠ user trust, needing ecosystem-side metrics as a backstop (echoes Day 13's echo-chamber discussion).
Naive estimate: 1e8 users × 800 entries × 8 bytes ≈ 640 GB of raw id data alone. But a Redis sorted set element also stores a score (8-byte double) plus skiplist/dict pointer overhead, so realistically 50~100 bytes per element, bloating the total to several TB.
Architectural implications: ① one machine can't hold it → the timeline cache must be sharded (hash by uid, echoes Day 4); ② this is just ids — thankfully no bodies, or 800 × few KB × 1e8 = PB-scale, the hard constraint behind "store ids + hydrate"; ③ memory is too expensive → activity-aware materialization and tiering (hot in memory, cold to Cassandra) are necessities not optimizations; ④ it also explains why the bounded 800 matters — it directly sets the cache tier's total memory budget; double the cap, double the cost.