Day 19 Hard Geospatial S2 / H3 / Geohash Uber Dispatch

Geospatial Systems — Slice the Sphere into Cells, Then Find the Nearest PersonGeohash vs S2 vs H3, Nearby Queries, Uber Dispatch

Problem & Constraints

Design a "nearby drivers" service for a 50M-DAU ride-hailing app: when a rider opens the app, return the 20 nearest available drivers within a 3km radius in p99 < 100ms, then dispatch. The hard part is that both reads and writes are heavy — this is not "design Yelp's static venue search," which is read-dominated.

Three core questions: ① how to turn lat/lon into an indexable key (geo-encoding); ② how to query points within a radius efficiently (nearby search); ③ how to shard millions of moving points without overloading one machine (dispatch architecture).

High-Level Architecture

graph TD
    R["Rider App
fires nearby request"] D["Driver App
GPS report every 4s"] GW["API Gateway
WebSocket long-lived"] DISP["Dispatch service (DISCO)
supply/demand matching"] RING["Consistent hash ring
shard by geo cell"] N1["Geo Shard A
cell set → in-mem grid"] N2["Geo Shard B"] N3["Geo Shard C"] DB[("Location store
Redis GEO / memory")] D -->|① location update| GW --> DISP R -->|② nearby+dispatch| GW --> DISP DISP -->|③ route by cell| RING RING --> N1 & N2 & N3 N1 & N2 & N3 -.->|④ durable backup| DB classDef client fill:#1a2530,stroke:#64c8ff,color:#e8eef5 classDef svc fill:#0e2030,stroke:#5eead4,color:#e8eef5 classDef shard fill:#1a1a30,stroke:#ffb450,color:#e8eef5 classDef store fill:#2a1530,stroke:#ff7ab6,color:#e8eef5 class R,D client class GW,DISP,RING svc class N1,N2,N3 shard class DB store

A driver's location is geo-encoded into a cell ID, then the cell ID is the shard key routing to the owning shard; a query touches only "target cell + one ring of neighbors" and the few shards owning them

Component roles: the gateway keeps driver long-lived connections and ingests GPS; Dispatch (Uber calls it DISCO) encodes lat/lon into a geo cell and uses a consistent hash ring (Uber uses Ringpop) to shard supply/demand to in-memory nodes by cell; each shard maintains a "cell → driver set" grid in memory, so a query only reads "center cell + one ring of neighbors." The source of truth is memory; the location store (Redis GEO or async flush) is only a crash-recovery backup.

Key Technical Points

1. Geo-encoding: Geohash vs S2 vs H3 — slicing the sphere into indexable cells

Principle: indexing 2D lat/lon directly in a B-Tree doesn't work — range queries must constrain both dimensions at once. The core idea is a space-filling curve that reduces 2D to 1D: encode lat/lon into a locality-preserving integer/string where spatially adjacent ≈ encoding-adjacent, so "nearby" degenerates into "shared prefix" or "adjacent ID range," queryable by an ordinary ordered index (B-Tree / sorted set). The three mainstream schemes differ in "what cell shape, what curve."

GeohashS2 (Google)H3 (Uber)
Cell shaperectanglespherical quadhexagon (+12 pentagons)
CurveZ-order (interleave bits)Hilbert (on cube faces)Hilbert-like (icosahedron)
Encodingbase32 string64-bit cell ID64-bit cell ID
Neighbor distanceuneven (diag≠edge)unevenuniform (6 equidistant)
Hierarchyper-char refine30 levels, strict quad-nest16 levels, approx-nest
Pain pointboundary jump + polar distortionquad adjacency unevenimperfect parent/child nesting
Trade-off:
# Geohash core: binary-split lat/lon + bit interleave (pseudo-code)
def geohash(lat, lon, precision_bits=50):
    lat_rng, lon_rng = [-90, 90], [-180, 180]
    bits, is_lon = [], True            # alternate lon/lat bits
    while len(bits) < precision_bits:
        rng = lon_rng if is_lon else lat_rng
        mid = (rng[0] + rng[1]) / 2
        val = lon if is_lon else lat
        if val >= mid: bits.append(1); rng[0] = mid   # upper half → 1
        else:          bits.append(0); rng[1] = mid   # lower half → 0
        is_lon = not is_lon
    return bits   # longer = finer; longer shared prefix = closer
                  # (but NOT the converse!)
Real-world:

2. Nearby queries: grid buckets vs quadtree/R-Tree — avoiding the O(N) scan

Principle: the naive approach computes the rider's distance to every driver then sorts — O(N) is fatal at millions of moving points. Two mainstream routes: ① fixed grid (bucketing) — cut space into equal cells, hang each driver in one cell bucket, and at query time look only at "center cell + 8 surrounding neighbor" buckets, then compute exact distance; ② tree index (quadtree / R-Tree) — adaptive partitioning by density, big cells in sparse areas, fine cells in dense ones. For high-frequency moving points, the fixed grid wins, because an update is an O(1) "remove from old bucket, add to new bucket," while a tree must rebalance, split, and merge at high cost.

Trade-off:
# Fixed grid + neighbor-ring query (pseudo-code)
def nearby(lat, lon, radius_m, k=20):
    center = cell_of(lat, lon)             # encode into a cell ID
    # radius → how many rings to scan: radius / cell_edge, round up
    ring = ceil(radius_m / CELL_EDGE_M)
    candidates = []
    for c in k_ring(center, ring):         # center + ring of neighbor cells
        candidates += grid[c]              # drivers in bucket, O(candidates)
    # exact haversine only on candidates, filter radius, take nearest k
    in_range = [(haversine(lat,lon,d.lat,d.lon), d)
                for d in candidates if haversine(...) <= radius_m]
    return heapq.nsmallest(k, in_range)    # top-k, O(M log k)
Boundary trap: querying only the center cell misses a close neighbor "just outside the cell edge" — physically very near but landing in a neighbor cell. So you must query center + one ring of neighbors (H3's kRing / S2's cell neighbors). This is exactly why geohash alone is unreliable for nearby queries.

Real-world:

3. Distance & correctness: haversine, approximation, and the traps

Principle: cells only coarse-filter candidates; final ranking needs true spherical distance. The haversine formula gives the great-circle (geodesic) distance between two lat/lon points, more correct than planar Euclidean (Earth is a sphere). But haversine has trig functions and isn't free per candidate, so in practice you coarse-rank with something cheap (squared lat/lon delta, or cell distance) and compute exact haversine only on the top candidates. Another oft-ignored point: dispatch wants road ETA, not straight-line distance — the straight-line nearest driver might be across a river, with the longest actual arrival time.

Trade-off:
# haversine: great-circle distance between two points (meters)
import math
def haversine(lat1, lon1, lat2, lon2):
    R = 6371000                              # Earth radius (m)
    p1, p2 = math.radians(lat1), math.radians(lat2)
    dp = math.radians(lat2 - lat1)
    dl = math.radians(lon2 - lon1)
    a = math.sin(dp/2)**2 + math.cos(p1)*math.cos(p2)*math.sin(dl/2)**2
    return 2 * R * math.asin(math.sqrt(a))
# Optimization: coarse-rank by (dlat²+dlon²), haversine only on top-N,
# then re-rank by road ETA for dispatch (straight-line near ≠ fast arrival)
Real-world:

4. Dispatch architecture: geo-sharding millions of moving points & taming hot spots

Principle: one machine can neither hold nor serve the reads/writes of all global drivers, so you must shard. Key decision: shard by geo cell, not by user ID — because a nearby query is inherently geographically local, sharding by cell lets "one query hit only a few adjacent shards" instead of fanning out to all nodes. Uber's DISCO uses Ringpop (consistent hashing + gossip membership) to map geo cells to nodes: supply (drivers) and demand (riders) in the same area land on the same/adjacent nodes, and matching happens locally. Adding/removing a node migrates only an adjacent arc of cells on the ring, not a full rehash.

Trade-off: choosing the shard key
# Route by geo cell + merge across neighbor shards (pseudo-code)
def dispatch_nearby(lat, lon, radius):
    cells = k_ring(cell_of(lat, lon), ring_for(radius))
    # neighbor cells may live on different shards: query in parallel, merge
    shards = {ring.lookup(c) for c in cells}      # consistent-hash lookup
    results = parallel_map(shards, lambda s: s.query(cells, lat, lon))
    return rank_by_eta(flatten(results))[:20]

# Hot-cell taming: monitor driver count & QPS per cell,
# above threshold drop the cell one resolution (split into children)
# and spread across nodes
Real-world:

Scaling & Optimization

Common Pitfalls + Interview Follow-ups

1. Querying only the center cell misses boundary neighbors. A long shared geohash prefix = physically close, but not the converse (two points across a boundary are physically close yet have totally different prefixes). You must query "center + one ring of neighbors" — the interviewer's favorite trap.
2. Dispatching by straight-line distance. Haversine-nearest ≠ fastest arrival. A driver across a river / one-way street is straight-line near but ETA far. Interviewers probe "how do you define nearest" — the answer is road ETA.
3. The fixed grid degenerates to O(N) at hot spots. One airport cell with tens of thousands of drivers makes a single-bucket radius query collapse. Either go adaptive-resolution or re-split the hot cell.
4. Hash-sharding geo data by user ID. Looks even, but every nearby query must fan out to the whole cluster. Geo data must be sharded by cell to preserve locality.
5. Building an R-Tree for moving points as if they were static venues. 500K updates/s will crush the tree's rebalancing. Use an in-memory grid (O(1) bucket swap) for moving points; leave R-Trees for static data.

Deeper Resources

Going Deeper (click to expand)

1. Why are hexagons "better" than squares for a geo grid? Give at least two mathematical reasons, and the cost.

Reason one: equidistant neighbors. A square cell has 4 edge neighbors and 4 diagonal neighbors, and the diagonal center-distance is √2× the edge one — so "take one ring of neighbors" covers different directions at different radii. A hexagon's 6 neighbors are all equidistant and edge-sharing, so kRing(k) approximates a true circle — more uniform for radius queries, traffic diffusion, and ML spatial features.

Reason two: uniform quantization error. When you quantize a continuous point to a cell center, the hexagon is the "closest to a circle" tileable polygon in the plane, giving smaller average quantization error than a square and weaker edge effects.

Cost: hexagons can't perfectly tile a sphere — Euler's theorem forces exactly 12 pentagons (at the icosahedron's 12 vertices). And a parent cell can't hold an integer number of children (splitting into 7 borrows from adjacent cells at the edges), so H3's hierarchy is approximate nesting, with boundary error in cross-resolution aggregation. S2's quads do nest strictly as a quadtree — exactly why S2 is preferred where precise hierarchical range scans matter (database geo keys).

2. For a 3km-radius query, should the cell edge be 500m or 5km? What happens each way, and how do you choose?

Cell too small (500m): a 3km radius needs about 3000/500 = 6 rings of neighbors, roughly 3·6·(6+1)+1 ≈ 127 hexagonal cells, one lookup each — many buckets, high merge cost, but few candidates per bucket so distance math is cheap.

Cell too big (5km): only center + 1 ring = 7 cells, but each covers 5km, so buckets hold many drivers well beyond 3km, the haversine candidate count explodes, and boundary waste is severe.

Principle for choosing cell size: make cell edge ≈ typical query radius, so a query hits "center + 1–2 rings," keeping both bucket count and per-bucket candidates bounded. But a single resolution can't serve "3km find-a-ride vs 100km intercity carpool" — so production systems use multi-resolution: pick resolution by query radius dynamically (both H3/S2 are hierarchical), coarse cells for large radii, fine for small. Uber also tunes resolution by regional density, using finer cells in dense CBDs.

3. Drivers report every 4s = 500K writes/s. Do all these writes really hit a database? How do you absorb them?

Core insight: location data is "high-frequency, ephemeral, low-value per point" — the vast majority of intermediate positions are never read, so there's no need to persist them.

  • The truth lives in memory. The current position sits in an in-memory grid; a write is just an O(1) "remove from old cell bucket, add to new bucket," never touching disk. 500K writes/s is nothing for an in-memory hash structure.
  • Persistence is only for recovery. Async snapshot + WAL; losing a few seconds of positions is fine — drivers report fresh positions 4s later and self-heal, an RPO of a few seconds is acceptable.
  • Tame write amplification. Stationary drivers adaptively lower report frequency; gateway-side coalescing keeps only the latest of multiple reports within 4s for the same driver.
  • Spread by sharding. 500K/s spread by cell across hundreds of shards = a few thousand writes/s each.

Contrast with an account balance — "low-frequency, must persist, strongly consistent," never to be lost. Location data is the opposite: accept it's ephemeral, self-heal from the latest report — that's the root of cheaply absorbing massive writes.

4. A concert lets out and 30K people hail a ride from one cell at once. How does this hot cell avoid crushing a single node?

This is the inherent Achilles' heel of geo-sharding: shard by cell, and a hot cell necessarily overloads one node. Several measures in concert:

  • Dynamic re-sharding (drop resolution): monitor per-cell supply/demand and QPS; above threshold split the hot cell into children (H3 down one level = 7) across nodes. The hard part is that writes in that area must briefly queue during migration.
  • Local L1 cache: dispatch nodes cache a driver snapshot of the hot cell locally (TTL 1–2s); most queries read local, source pressure plummets — the cost is 1–2s staleness, which dispatch tolerates.
  • Demand-side shedding: surge pricing is essentially using price to spread instantaneous demand across the time axis — product-level backpressure.
  • Batched matching: instead of greedily matching each request in realtime, do a globally optimal assignment over a 1–2s window — it raises the acceptance rate and incidentally flattens the QPS spike.

Note this is isomorphic to the hot key of Day 2 and the sharding hot spot of Day 4 — the fix is always the combination of "split finer + replicas + local cache + demand-side throttling."

5. Why not just use Postgres + PostGIS's GiST index for everything? Where's its boundary?

PostGIS is the Swiss army knife of geo data, perfect for static/low-update scenarios (venues, parcels, administrative boundaries, Yelp-style search), with ST_DWithin radius queries, polygon intersection, and nearest-neighbor out of the box, plus transactions and persistence. But dispatch hits walls in three places:

  • Write throughput: R-Tree/GiST is a tree; each update may trigger node splits/merges and index-page rewrites. 500K moves/s will crush WAL and index maintenance — trees are read-optimized, not for high-frequency writes.
  • Horizontal scale: a single Postgres instance's geo index can't hold global moving points, and sharding by cell + cross-shard nearby queries isn't natively supported — you'd have to build it on top (degenerating to a homegrown grid).
  • Latency: geo retrieval gets only ~20ms; a disk DB's query latency (even on buffer-pool hits) is hard to keep stably at that scale — only an in-memory grid will do.

Conclusion: the choice is driven by update frequency — static geo data on PostGIS; millions of high-frequency moving points on an in-memory grid + consistent-hash sharding (the Uber route). Forcing moving points into an R-Tree is the most common over-engineering.