Design the backend for an instant messenger with 1 billion users and 50 million concurrent connections (WhatsApp, Discord, WeChat). Feed (Day 14) is a read-amplification problem — one post read by millions; chat is the inverse, a connection-amplification problem — tens of millions of never-closing long connections that must be pushed to at any moment. The hard part isn't "storing messages" but maintaining a sea of long connections and guaranteeing every message is delivered exactly once, in order, without loss.
The core idea is to decouple "connection" from "logic": the Gateway layer only maintains long connections and shuttles bytes; the business layer is stateless and scales horizontally. The key component is the Session Registry — a routing table of "user → which Gateway node they're connected to" — the addressing core that lets a message find its recipient. The sender's message first hits durable storage (so it's never lost), then the Registry is queried to route it to the recipient's Gateway for push; if the recipient is offline, it stays in storage to be pulled on reconnect.
One-line trade-off: pay the "complexity of holding tens of millions of stateful connections" to get "true bidirectional, low-latency, low-idle-overhead real-time push".
Principle: chat requires the server to push proactively. Three options: short polling (periodic HTTP pull) has high latency and floods the server with empty requests; long polling (hold the request until data arrives) saves some but still pays a full HTTP round-trip per message with frequent connection rebuilds; WebSocket upgrades one handshake into a full-duplex long connection, after which bidirectional push costs almost nothing — the modern default. The cost: the Gateway becomes stateful — each connection holds memory + fds, and node restarts/scaling drop huge numbers of connections. So the connection layer needs a lightweight high-concurrency runtime — WhatsApp uses Erlang/BEAM, Discord uses Elixir, both based on the actor model: each connection is an independent lightweight process (not an OS thread), scheduled by the VM, letting a single box hold millions of connections.
| Approach | Real-time | Idle overhead | Cost |
|---|---|---|---|
| Short polling | Poor (interval-bound) | Very high (empty floods) | Simple but unscalable |
| Long polling | Medium | Medium (frequent rebuilds) | HTTP-friendly stopgap |
| WebSocket | Great (<200ms) | Low (connection reuse) | Stateful gateway, hard to scale |
Another key piece is presence and heartbeats: the server detects liveness via periodic heartbeats, and a dropped connection must be promptly removed from the Registry — otherwise messages get routed to dead connections. But presence broadcast is an amplification bomb: a single user going online/offline must notify all friends/group members, so it must be aggregated and rate-limited.
One-line trade-off: pay a "partition key bucketed by channel + time" to "avoid a huge group hammering a single partition into a hot spot".
Principle: messages must be persisted before delivery (never just in memory, or a node crash loses them). The read pattern is "pull the latest N messages / messages after a timestamp for a conversation", a natural fit for wide-column stores (Cassandra/ScyllaDB): partition key channel_id, clustering key a time-ordered message ID (Snowflake, echoing Day 11), so a conversation's messages are physically adjacent and time-sorted. The trap is hot partitions: a huge channel (a million people in one channel) crams every message into a single channel_id partition, pinning all read/write to one replica set and spiking p99. The fix is bucketing — change the key to (channel_id, time_bucket), splitting by time window (e.g. every N days) to spread a big partition horizontally. 1:1 and group use the same model: a 1:1 is just a two-person channel.
# Bucketed message table (CQL-style pseudo-code)
CREATE TABLE messages (
channel_id bigint,
bucket int, # time bucket, avoids hot mega-partitions
message_id bigint, # Snowflake: time-ordered, doubles as sort key
author_id bigint,
payload blob, # ciphertext under E2E
PRIMARY KEY ((channel_id, bucket), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);
# Read latest: locate current bucket, take N by message_id DESC
(channel_id, bucket) composite partition key precisely to stop active channels forming hot partitions; later migrated from Cassandra to ScyllaDB (C++, no GC), shrinking 177 nodes to 72 and bringing historical-read p99 from 40–125ms down to a steady ~15ms (Discord eng blog, "How Discord Stores Trillions of Messages").One-line trade-off: pay "at-least-once + client-side idempotent dedup" to get "something far simpler than exactly-once that still loses nothing and shows nothing twice".
Principle: networks drop packets and clients reconnect-and-resend; true exactly-once is brutally expensive in distributed systems (echoing Day 8). The standard answer is at-least-once delivery + endpoint idempotency: the sender attaches a client-generated idempotency ID (client message id) to each message, and the server dedups on it — retries never create two. Delivery rides an ACK chain: each hop (server received, peer received, peer read) returns an ACK; no ACK means resend. Ordering rides a per-conversation monotonic seq: the client sorts by seq and detects gaps (got 5 but not 4 → proactively fetch 4), offloading the "server guarantees total order" burden to per-conversation client-side ordering and avoiding expensive global ordering.
# Send + idempotent dedup + ACK retry (pseudo-code)
def send(msg):
msg.client_id = uuid() # client idempotency key
while not acked(msg.client_id):
conn.push(msg)
if wait_ack(timeout=5s): break # only durable after server ACK
# on reconnect dedup by client_id; server: INSERT IF NOT EXISTS
def on_receive(msg): # receiver
if msg.seq <= last_seen: return # duplicate, drop
if msg.seq > last_seen + 1:
fetch_gap(last_seen+1, msg.seq) # detect gap, backfill
deliver(msg); send_ack(msg.seq); last_seen = msg.seq
One-line trade-off: pay "the server can't read plaintext at all, plus complex key management" to get "forward secrecy + post-compromise recovery strong privacy".
Principle: under E2E the server is just a ciphertext courier. The industry standard is the Signal Protocol: X3DH negotiates an initial shared key between two parties (one may be offline), then the Double Ratchet continually evolves it — each message uses a one-time message key, discarded after use. Two ratchets: a DH ratchet (new DH public key exchanged each round) + a KDF ratchet (chained derivation). This yields two core security properties: forward secrecy (a key leak today can't decrypt yesterday's messages) + post-compromise security (after a leak, a fresh DH restores safety). Groups can't pair everyone N², so they use Sender Keys: each member has a sending-chain key distributed to the group, encrypting once and reusing within the group, cutting fanout from N² to N.
channel_id alone as the partition key lets active channels hammer a single partition. Use (channel_id, bucket).Common follow-ups: ① How do you hold 50M connections in memory, and what's the bottleneck? ② A sends to an offline B — what's the message's full lifecycle? ③ With multi-device login, how are messages synced across devices without dup or loss? ④ A million-member group — how does one message fan out without blowing up? ⑤ With E2E on, how do server-side search and moderation still work?
The bottleneck is memory and file descriptors, not CPU. Idle connections burn no compute, but each one holds: kernel socket buffers (a few KB to tens of KB each way) + application session state + an fd. 2M connections of kernel buffers alone could be tens of GB. Estimate: at ~10KB combined per connection, 2M × 10KB ≈ 20GB — before any business state. So you ① tune the kernel (fd limits, socket buffers, ephemeral ports); ② pick a lightweight connection runtime (BEAM processes, not OS threads); ③ shrink per-connection app state. This is exactly why WhatsApp tuned FreeBSD and used Erlang — squeezing the "fixed cost per connection" to a minimum is what lets one box hold millions.
Flow: ① A's message reaches the Chat Service, is persisted (with the client_id idempotency key) → an ACK returns to A (A shows "delivered to server" ✓); ② the Registry shows B has no active connection → the message stays in storage / B's pending queue; ③ B comes online, opens a WebSocket, the Registry records B→GatewayX; ④ B pulls offline messages "after my last seq" → delivery → B ACKs → the server marks delivered. Failure points: if step ① "ACKs before persisting", a crash loses it while A thinks it succeeded; if B doesn't dedup by seq after pulling, a reconnect-repull shows duplicates; if the Registry doesn't promptly record B online, new messages may briefly fail to route. Every step must be retryable + idempotent as a backstop.
Structurally it's the same push/pull choice as Feed. Writing a copy to each of a million inboxes = write-fanout explosion, infeasible. Big groups use a shared conversation log (one stored copy) + pull on read: members each pull deltas "after my last-read seq"; online members are found via the Registry and pushed a notification (not a full copy of the message). The difference: Feed is often async, seconds-tolerant, while chat wants real-time push, so "notify online members in real time + each pulls" is the compromise. Further optimization: aggregate online members of the same group by Gateway, so a Gateway receives one notification then locally fans to that node's members, cutting cross-node traffic — Discord's guild process is exactly this aggregation point.
Search: the server can't build an inverted index (ciphertext is meaningless) → only the client can decrypt, build a local index, and search locally; switch devices and old history isn't searchable. Multi-device sync: either each device independently joins key negotiation (its own session), or session keys/history are securely transferred between devices — both far more complex than plaintext sync. Discord's choice: its core value is server-side capability — full-text search, content moderation, spam/minor-safety protection, bot ecosystem, AI features — all of which need the server to read plaintext. E2E fundamentally conflicts with these. So whether to do E2E is a product positioning decision: privacy-first (Signal/WhatsApp) vs server-capability-first (Discord/Slack). There's no purely technical "more correct".
The risk is a thundering herd on reconnect: millions of clients drop → reconnect simultaneously → handshakes, auth, Registry writes, and offline-message pulls all flood in at once, potentially knocking over the just-recovered Gateway and backends again, oscillating in a "connect then get crushed" loop. Mitigations: ① client exponential backoff with jitter on reconnect (echoing Day 23's retry+jitter), spreading reconnects across the timeline; ② rolling restart of Gateways instead of all-at-once (Day 22), dropping only a small batch each time; ③ reconnect auth and offline pulls go through rate-limiting/queuing to protect backends; ④ use connection migration / graceful shutdown, telling clients ahead of time to move to another node rather than hard-dropping. Core idea: never let a huge client population's behavior synchronize — jitter is the key tool for breaking up that synchrony.