Design a "nearby drivers" service for a 50M-DAU ride-hailing app: when a rider opens the app, return the 20 nearest available drivers within a 3km radius in p99 < 100ms, then dispatch. The hard part is that both reads and writes are heavy — this is not "design Yelp's static venue search," which is read-dominated.
Three core questions: ① how to turn lat/lon into an indexable key (geo-encoding); ② how to query points within a radius efficiently (nearby search); ③ how to shard millions of moving points without overloading one machine (dispatch architecture).
graph TD
R["Rider App
fires nearby request"]
D["Driver App
GPS report every 4s"]
GW["API Gateway
WebSocket long-lived"]
DISP["Dispatch service (DISCO)
supply/demand matching"]
RING["Consistent hash ring
shard by geo cell"]
N1["Geo Shard A
cell set → in-mem grid"]
N2["Geo Shard B"]
N3["Geo Shard C"]
DB[("Location store
Redis GEO / memory")]
D -->|① location update| GW --> DISP
R -->|② nearby+dispatch| GW --> DISP
DISP -->|③ route by cell| RING
RING --> N1 & N2 & N3
N1 & N2 & N3 -.->|④ durable backup| DB
classDef client fill:#1a2530,stroke:#64c8ff,color:#e8eef5
classDef svc fill:#0e2030,stroke:#5eead4,color:#e8eef5
classDef shard fill:#1a1a30,stroke:#ffb450,color:#e8eef5
classDef store fill:#2a1530,stroke:#ff7ab6,color:#e8eef5
class R,D client
class GW,DISP,RING svc
class N1,N2,N3 shard
class DB store
A driver's location is geo-encoded into a cell ID, then the cell ID is the shard key routing to the owning shard; a query touches only "target cell + one ring of neighbors" and the few shards owning them
Component roles: the gateway keeps driver long-lived connections and ingests GPS; Dispatch (Uber calls it DISCO) encodes lat/lon into a geo cell and uses a consistent hash ring (Uber uses Ringpop) to shard supply/demand to in-memory nodes by cell; each shard maintains a "cell → driver set" grid in memory, so a query only reads "center cell + one ring of neighbors." The source of truth is memory; the location store (Redis GEO or async flush) is only a crash-recovery backup.
Principle: indexing 2D lat/lon directly in a B-Tree doesn't work — range queries must constrain both dimensions at once. The core idea is a space-filling curve that reduces 2D to 1D: encode lat/lon into a locality-preserving integer/string where spatially adjacent ≈ encoding-adjacent, so "nearby" degenerates into "shared prefix" or "adjacent ID range," queryable by an ordinary ordered index (B-Tree / sorted set). The three mainstream schemes differ in "what cell shape, what curve."
| Geohash | S2 (Google) | H3 (Uber) | |
|---|---|---|---|
| Cell shape | rectangle | spherical quad | hexagon (+12 pentagons) |
| Curve | Z-order (interleave bits) | Hilbert (on cube faces) | Hilbert-like (icosahedron) |
| Encoding | base32 string | 64-bit cell ID | 64-bit cell ID |
| Neighbor distance | uneven (diag≠edge) | uneven | uniform (6 equidistant) |
| Hierarchy | per-char refine | 30 levels, strict quad-nest | 16 levels, approx-nest |
| Pain point | boundary jump + polar distortion | quad adjacency uneven | imperfect parent/child nesting |
kRing (k rings of neighbors), traffic aggregation, and ML features; ❌ mathematically hexagons can't perfectly tile a sphere, so 12 pentagons must be inserted (at icosahedron vertices), and a parent cell can't split into exactly 7 children (only approximately), giving cross-level aggregation error.# Geohash core: binary-split lat/lon + bit interleave (pseudo-code)
def geohash(lat, lon, precision_bits=50):
lat_rng, lon_rng = [-90, 90], [-180, 180]
bits, is_lon = [], True # alternate lon/lat bits
while len(bits) < precision_bits:
rng = lon_rng if is_lon else lat_rng
mid = (rng[0] + rng[1]) / 2
val = lon if is_lon else lat
if val >= mid: bits.append(1); rng[0] = mid # upper half → 1
else: bits.append(0); rng[1] = mid # lower half → 0
is_lon = not is_lon
return bits # longer = finer; longer shared prefix = closer
# (but NOT the converse!)
GEOADD/GEOSEARCH pack a geohash encoding into a sorted-set score and range-query via ZRANGEBYSCORE.Principle: the naive approach computes the rider's distance to every driver then sorts — O(N) is fatal at millions of moving points. Two mainstream routes: ① fixed grid (bucketing) — cut space into equal cells, hang each driver in one cell bucket, and at query time look only at "center cell + 8 surrounding neighbor" buckets, then compute exact distance; ② tree index (quadtree / R-Tree) — adaptive partitioning by density, big cells in sparse areas, fine cells in dense ones. For high-frequency moving points, the fixed grid wins, because an update is an O(1) "remove from old bucket, add to new bucket," while a tree must rebalance, split, and merge at high cost.
# Fixed grid + neighbor-ring query (pseudo-code)
def nearby(lat, lon, radius_m, k=20):
center = cell_of(lat, lon) # encode into a cell ID
# radius → how many rings to scan: radius / cell_edge, round up
ring = ceil(radius_m / CELL_EDGE_M)
candidates = []
for c in k_ring(center, ring): # center + ring of neighbor cells
candidates += grid[c] # drivers in bucket, O(candidates)
# exact haversine only on candidates, filter radius, take nearest k
in_range = [(haversine(lat,lon,d.lat,d.lon), d)
for d in candidates if haversine(...) <= radius_m]
return heapq.nsmallest(k, in_range) # top-k, O(M log k)
kRing / S2's cell neighbors). This is exactly why geohash alone is unreliable for nearby queries.
ST_DWithin radius queries — the standard for static geo data (venues, parcels, Yelp-style search).Principle: cells only coarse-filter candidates; final ranking needs true spherical distance. The haversine formula gives the great-circle (geodesic) distance between two lat/lon points, more correct than planar Euclidean (Earth is a sphere). But haversine has trig functions and isn't free per candidate, so in practice you coarse-rank with something cheap (squared lat/lon delta, or cell distance) and compute exact haversine only on the top candidates. Another oft-ignored point: dispatch wants road ETA, not straight-line distance — the straight-line nearest driver might be across a river, with the longest actual arrival time.
sin/cos/atan2 cost, can't run over the full set at scale.# haversine: great-circle distance between two points (meters)
import math
def haversine(lat1, lon1, lat2, lon2):
R = 6371000 # Earth radius (m)
p1, p2 = math.radians(lat1), math.radians(lat2)
dp = math.radians(lat2 - lat1)
dl = math.radians(lon2 - lon1)
a = math.sin(dp/2)**2 + math.cos(p1)*math.cos(p2)*math.sin(dl/2)**2
return 2 * R * math.asin(math.sqrt(a))
# Optimization: coarse-rank by (dlat²+dlon²), haversine only on top-N,
# then re-rank by road ETA for dispatch (straight-line near ≠ fast arrival)
GEODIST internally uses haversine, with selectable units (m/km).Principle: one machine can neither hold nor serve the reads/writes of all global drivers, so you must shard. Key decision: shard by geo cell, not by user ID — because a nearby query is inherently geographically local, sharding by cell lets "one query hit only a few adjacent shards" instead of fanning out to all nodes. Uber's DISCO uses Ringpop (consistent hashing + gossip membership) to map geo cells to nodes: supply (drivers) and demand (riders) in the same area land on the same/adjacent nodes, and matching happens locally. Adding/removing a node migrates only an adjacent arc of cells on the ring, not a full rehash.
# Route by geo cell + merge across neighbor shards (pseudo-code)
def dispatch_nearby(lat, lon, radius):
cells = k_ring(cell_of(lat, lon), ring_for(radius))
# neighbor cells may live on different shards: query in parallel, merge
shards = {ring.lookup(c) for c in cells} # consistent-hash lookup
results = parallel_map(shards, lambda s: s.query(cells, lat, lon))
return rank_by_eta(flatten(results))[:20]
# Hot-cell taming: monitor driver count & QPS per cell,
# above threshold drop the cell one resolution (split into children)
# and spread across nodes
Reason one: equidistant neighbors. A square cell has 4 edge neighbors and 4 diagonal neighbors, and the diagonal center-distance is √2× the edge one — so "take one ring of neighbors" covers different directions at different radii. A hexagon's 6 neighbors are all equidistant and edge-sharing, so kRing(k) approximates a true circle — more uniform for radius queries, traffic diffusion, and ML spatial features.
Reason two: uniform quantization error. When you quantize a continuous point to a cell center, the hexagon is the "closest to a circle" tileable polygon in the plane, giving smaller average quantization error than a square and weaker edge effects.
Cost: hexagons can't perfectly tile a sphere — Euler's theorem forces exactly 12 pentagons (at the icosahedron's 12 vertices). And a parent cell can't hold an integer number of children (splitting into 7 borrows from adjacent cells at the edges), so H3's hierarchy is approximate nesting, with boundary error in cross-resolution aggregation. S2's quads do nest strictly as a quadtree — exactly why S2 is preferred where precise hierarchical range scans matter (database geo keys).
Cell too small (500m): a 3km radius needs about 3000/500 = 6 rings of neighbors, roughly 3·6·(6+1)+1 ≈ 127 hexagonal cells, one lookup each — many buckets, high merge cost, but few candidates per bucket so distance math is cheap.
Cell too big (5km): only center + 1 ring = 7 cells, but each covers 5km, so buckets hold many drivers well beyond 3km, the haversine candidate count explodes, and boundary waste is severe.
Principle for choosing cell size: make cell edge ≈ typical query radius, so a query hits "center + 1–2 rings," keeping both bucket count and per-bucket candidates bounded. But a single resolution can't serve "3km find-a-ride vs 100km intercity carpool" — so production systems use multi-resolution: pick resolution by query radius dynamically (both H3/S2 are hierarchical), coarse cells for large radii, fine for small. Uber also tunes resolution by regional density, using finer cells in dense CBDs.
Core insight: location data is "high-frequency, ephemeral, low-value per point" — the vast majority of intermediate positions are never read, so there's no need to persist them.
Contrast with an account balance — "low-frequency, must persist, strongly consistent," never to be lost. Location data is the opposite: accept it's ephemeral, self-heal from the latest report — that's the root of cheaply absorbing massive writes.
This is the inherent Achilles' heel of geo-sharding: shard by cell, and a hot cell necessarily overloads one node. Several measures in concert:
Note this is isomorphic to the hot key of Day 2 and the sharding hot spot of Day 4 — the fix is always the combination of "split finer + replicas + local cache + demand-side throttling."
PostGIS is the Swiss army knife of geo data, perfect for static/low-update scenarios (venues, parcels, administrative boundaries, Yelp-style search), with ST_DWithin radius queries, polygon intersection, and nearest-neighbor out of the box, plus transactions and persistence. But dispatch hits walls in three places:
Conclusion: the choice is driven by update frequency — static geo data on PostGIS; millions of high-frequency moving points on an in-memory grid + consistent-hash sharding (the Uber route). Forcing moving points into an R-Tree is the most common over-engineering.