Day 19 Hard Geospatial S2 / H3 / Geohash Uber Dispatch

Geospatial Systems — Slice the Sphere into Cells, Then Find the Nearest PersonGeohash vs S2 vs H3, Nearby Queries, Uber Dispatch

Problem & Constraints

Design a "nearby drivers" service for a 50M-DAU ride-hailing app: when a rider opens the app, return the 20 nearest available drivers within a 3km radius in p99 < 100ms, then dispatch. The hard part is that both reads and writes are heavy — this is not "design Yelp's static venue search," which is read-dominated.

Write-heavy: peak 2M online drivers, each reporting GPS every 4s → 500K location writes/s, every write invalidating the old position.
Read-heavy: peak 100K QPS of nearby queries, each filtering top-N out of millions of moving points.
Latency: the dispatch path is p99 < 100ms; geo retrieval gets only ~20ms of that.
Dynamic data: drivers move constantly; the index must support high-frequency updates, not a one-time static build like venues.
Geographic skew: airports, concerts, rush-hour CBDs — local density can be 100× the average, a natural breeding ground for hot spots.

Three core questions: ① how to turn lat/lon into an indexable key (geo-encoding); ② how to query points within a radius efficiently (nearby search); ③ how to shard millions of moving points without overloading one machine (dispatch architecture).

High-Level Architecture

graph TD
    R["Rider App
fires nearby request"]
    D["Driver App
GPS report every 4s"]
    GW["API Gateway
WebSocket long-lived"]
    DISP["Dispatch service (DISCO)
supply/demand matching"]
    RING["Consistent hash ring
shard by geo cell"]
    N1["Geo Shard A
cell set → in-mem grid"]
    N2["Geo Shard B"]
    N3["Geo Shard C"]
    DB[("Location store
Redis GEO / memory")]

    D -->|① location update| GW --> DISP
    R -->|② nearby+dispatch| GW --> DISP
    DISP -->|③ route by cell| RING
    RING --> N1 & N2 & N3
    N1 & N2 & N3 -.->|④ durable backup| DB

    classDef client fill:#1a2530,stroke:#64c8ff,color:#e8eef5
    classDef svc fill:#0e2030,stroke:#5eead4,color:#e8eef5
    classDef shard fill:#1a1a30,stroke:#ffb450,color:#e8eef5
    classDef store fill:#2a1530,stroke:#ff7ab6,color:#e8eef5
    class R,D client
    class GW,DISP,RING svc
    class N1,N2,N3 shard
    class DB store

A driver's location is geo-encoded into a cell ID, then the cell ID is the shard key routing to the owning shard; a query touches only "target cell + one ring of neighbors" and the few shards owning them

Component roles: the gateway keeps driver long-lived connections and ingests GPS; Dispatch (Uber calls it DISCO) encodes lat/lon into a geo cell and uses a consistent hash ring (Uber uses Ringpop) to shard supply/demand to in-memory nodes by cell; each shard maintains a "cell → driver set" grid in memory, so a query only reads "center cell + one ring of neighbors." The source of truth is memory; the location store (Redis GEO or async flush) is only a crash-recovery backup.

Key Technical Points

1. Geo-encoding: Geohash vs S2 vs H3 — slicing the sphere into indexable cells

Principle: indexing 2D lat/lon directly in a B-Tree doesn't work — range queries must constrain both dimensions at once. The core idea is a space-filling curve that reduces 2D to 1D: encode lat/lon into a locality-preserving integer/string where spatially adjacent ≈ encoding-adjacent, so "nearby" degenerates into "shared prefix" or "adjacent ID range," queryable by an ordinary ordered index (B-Tree / sorted set). The three mainstream schemes differ in "what cell shape, what curve."

	Geohash	S2 (Google)	H3 (Uber)
Cell shape	rectangle	spherical quad	hexagon (+12 pentagons)
Curve	Z-order (interleave bits)	Hilbert (on cube faces)	Hilbert-like (icosahedron)
Encoding	base32 string	64-bit cell ID	64-bit cell ID
Neighbor distance	uneven (diag≠edge)	uneven	uniform (6 equidistant)
Hierarchy	per-char refine	30 levels, strict quad-nest	16 levels, approx-nest
Pain point	boundary jump + polar distortion	quad adjacency uneven	imperfect parent/child nesting

Trade-off:

Geohash: ✅ dead simple, human-readable, prefix = containment, the basis of Redis GEO; ❌ boundary jumps — two physically adjacent points can have completely different geohash prefixes (across a cell edge), causing "nearby" to miss them; and rectangular cells distort badly at high latitudes.
S2: ✅ seamless on the sphere, strict quadtree hierarchy (a cell perfectly contains its children), range scans naturally efficient; used by Google Maps/Bigtable geo keys; ❌ a quad's diagonal vs edge neighbors aren't equidistant, making cell-count estimation for ring/radius queries fiddly.
H3: ✅ hexagons are the only planar tiling where all neighbors are equidistant, naturally uniform for kRing (k rings of neighbors), traffic aggregation, and ML features; ❌ mathematically hexagons can't perfectly tile a sphere, so 12 pentagons must be inserted (at icosahedron vertices), and a parent cell can't split into exactly 7 children (only approximately), giving cross-level aggregation error.

# Geohash core: binary-split lat/lon + bit interleave (pseudo-code)
def geohash(lat, lon, precision_bits=50):
    lat_rng, lon_rng = [-90, 90], [-180, 180]
    bits, is_lon = [], True            # alternate lon/lat bits
    while len(bits) < precision_bits:
        rng = lon_rng if is_lon else lat_rng
        mid = (rng[0] + rng[1]) / 2
        val = lon if is_lon else lat
        if val >= mid: bits.append(1); rng[0] = mid   # upper half → 1
        else:          bits.append(0); rng[1] = mid   # lower half → 0
        is_lon = not is_lon
    return bits   # longer = finer; longer shared prefix = closer
                  # (but NOT the converse!)

Real-world:

Google S2: Maps, Bigtable, and Wikidata geo keys are all built on S2, slicing the sphere into order-preserving 64-bit IDs via a Hilbert curve (s2geometry.io, open-sourced 2017).
Uber H3: hexagons power pricing, dispatch, and supply/demand heatmap analytics; open-sourced 2018 (H3 blog / github.com/uber/h3), now a de-facto industry standard.
Redis GEO: GEOADD/GEOSEARCH pack a geohash encoding into a sorted-set score and range-query via ZRANGEBYSCORE.

2. Nearby queries: grid buckets vs quadtree/R-Tree — avoiding the O(N) scan

Principle: the naive approach computes the rider's distance to every driver then sorts — O(N) is fatal at millions of moving points. Two mainstream routes: ① fixed grid (bucketing) — cut space into equal cells, hang each driver in one cell bucket, and at query time look only at "center cell + 8 surrounding neighbor" buckets, then compute exact distance; ② tree index (quadtree / R-Tree) — adaptive partitioning by density, big cells in sparse areas, fine cells in dense ones. For high-frequency moving points, the fixed grid wins, because an update is an O(1) "remove from old bucket, add to new bucket," while a tree must rebalance, split, and merge at high cost.

Trade-off:

Fixed grid: ✅ O(1) update/query, simple, friendly to moving points; ❌ fixed cell size — too large and a dense bucket holds tens of thousands (degenerates to O(N)); too small and a radius query scans a huge swath of buckets. The airport is the counterexample.
Quadtree / R-Tree (PostGIS GiST): ✅ density-adaptive, exact radius/polygon queries, great for static venues (Yelp); ❌ high-frequency updates need rebalancing, severe write amplification, unfit for 500K moves/s.
Hierarchical grid (H3/S2 multi-resolution): ✅ a compromise — pick different resolutions by regional density; ❌ cross-resolution aggregation and neighbor math get complex.

# Fixed grid + neighbor-ring query (pseudo-code)
def nearby(lat, lon, radius_m, k=20):
    center = cell_of(lat, lon)             # encode into a cell ID
    # radius → how many rings to scan: radius / cell_edge, round up
    ring = ceil(radius_m / CELL_EDGE_M)
    candidates = []
    for c in k_ring(center, ring):         # center + ring of neighbor cells
        candidates += grid[c]              # drivers in bucket, O(candidates)
    # exact haversine only on candidates, filter radius, take nearest k
    in_range = [(haversine(lat,lon,d.lat,d.lon), d)
                for d in candidates if haversine(...) <= radius_m]
    return heapq.nsmallest(k, in_range)    # top-k, O(M log k)

Boundary trap: querying only the center cell misses a close neighbor "just outside the cell edge" — physically very near but landing in a neighbor cell. So you must query center + one ring of neighbors (H3's kRing / S2's cell neighbors). This is exactly why geohash alone is unreliable for nearby queries.

Real-world:

PostGIS: uses a GiST index (R-Tree variant) for ST_DWithin radius queries — the standard for static geo data (venues, parcels, Yelp-style search).
Uber: dispatch uses no tree but an in-memory grid + consistent-hash sharding to absorb high-frequency moving points (see point 4).
Tile38 / Redis GEO: geofencing and nearby queries via geohash grid + sorted set, fit for mid-scale realtime location.

3. Distance & correctness: haversine, approximation, and the traps

Principle: cells only coarse-filter candidates; final ranking needs true spherical distance. The haversine formula gives the great-circle (geodesic) distance between two lat/lon points, more correct than planar Euclidean (Earth is a sphere). But haversine has trig functions and isn't free per candidate, so in practice you coarse-rank with something cheap (squared lat/lon delta, or cell distance) and compute exact haversine only on the top candidates. Another oft-ignored point: dispatch wants road ETA, not straight-line distance — the straight-line nearest driver might be across a river, with the longest actual arrival time.

Trade-off:

Euclidean (planar): ✅ fastest, no trig; ❌ only approximate at small scale (<tens of km), large error across latitudes / wide ranges.
Haversine (great-circle): ✅ spherically correct, precise enough; ❌ has sin/cos/atan2 cost, can't run over the full set at scale.
Road-network ETA (graph shortest path / live traffic): ✅ what the business actually wants; ❌ very expensive, only for the few candidates after cell coarse-filtering, and depends on a live traffic service.

# haversine: great-circle distance between two points (meters)
import math
def haversine(lat1, lon1, lat2, lon2):
    R = 6371000                              # Earth radius (m)
    p1, p2 = math.radians(lat1), math.radians(lat2)
    dp = math.radians(lat2 - lat1)
    dl = math.radians(lon2 - lon1)
    a = math.sin(dp/2)**2 + math.cos(p1)*math.cos(p2)*math.sin(dl/2)**2
    return 2 * R * math.asin(math.sqrt(a))
# Optimization: coarse-rank by (dlat²+dlon²), haversine only on top-N,
# then re-rank by road ETA for dispatch (straight-line near ≠ fast arrival)

Real-world:

Uber: after cells coarse-filter candidate drivers, final dispatch ranking uses ETA (road network + live traffic), not straight-line distance — "nearest" is by arrival time.
Redis GEO: GEODIST internally uses haversine, with selectable units (m/km).
Map navigation (Google Maps): straight-line distance is only for pre-filtering; real routes use graph shortest-path algorithms like contraction hierarchies.

4. Dispatch architecture: geo-sharding millions of moving points & taming hot spots

Principle: one machine can neither hold nor serve the reads/writes of all global drivers, so you must shard. Key decision: shard by geo cell, not by user ID — because a nearby query is inherently geographically local, sharding by cell lets "one query hit only a few adjacent shards" instead of fanning out to all nodes. Uber's DISCO uses Ringpop (consistent hashing + gossip membership) to map geo cells to nodes: supply (drivers) and demand (riders) in the same area land on the same/adjacent nodes, and matching happens locally. Adding/removing a node migrates only an adjacent arc of cells on the ring, not a full rehash.

Trade-off: choosing the shard key

Shard by geo cell: ✅ great query locality, matching completes on few nodes; ❌ geographic hot spots — an airport/concert cell overloads a single node, requiring dynamic re-splitting or spreading the hot cell across nodes.
Shard by hashing user/driver ID: ✅ perfectly even load; ❌ a nearby query must fan out to all shards then merge, blowing up p99 — a non-starter.
Static range sharding (by city/district): ✅ intuitive, simple ops; ❌ load varies wildly between cities, cross-boundary queries are hard.

# Route by geo cell + merge across neighbor shards (pseudo-code)
def dispatch_nearby(lat, lon, radius):
    cells = k_ring(cell_of(lat, lon), ring_for(radius))
    # neighbor cells may live on different shards: query in parallel, merge
    shards = {ring.lookup(c) for c in cells}      # consistent-hash lookup
    results = parallel_map(shards, lambda s: s.query(cells, lat, lon))
    return rank_by_eta(flatten(results))[:20]

# Hot-cell taming: monitor driver count & QPS per cell,
# above threshold drop the cell one resolution (split into children)
# and spread across nodes

Real-world:

Uber DISCO: a Go realtime matching engine that uses Ringpop to shard geo space across nodes and do request forwarding and fault tolerance (High Scalability: How Uber Scales Their Real-time Market Platform, 2015).
Lyft: similarly uses geo grids + realtime supply/demand matching; pricing and dispatch rely on geohash/H3-style grids for regional aggregation.
DoorDash / Meituan: delivery dispatch is isomorphic — couriers are moving supply, orders are demand, sharded and matched by delivery zone (geo cell).

Scaling & Optimization

Scale reads: location data is "read-heavy and write-heavy"; give each geo shard in-memory read replicas (async sync), serve nearby queries from replicas, keep writes on the primary.
Dynamic re-sharding for hot spots: monitor per-cell density/QPS; auto-drop hot cells (airports) to finer resolution (split into children) across nodes; merge cold cells to save memory.
Multi-resolution index: maintain coarse + fine cells together — large-radius queries use coarse cells (fewer buckets), small-radius use fine cells (precise filter).
Persistence & recovery: periodic snapshot + WAL of the in-memory grid so a restarted node rebuilds quickly; expire locations via TTL (a driver silent for 30s is considered offline).
Multi-region: geography splits naturally by macro-region (NA/EU/APAC clusters); cross-region queries are rare — a natural sharding boundary.

Common Pitfalls + Interview Follow-ups

1. Querying only the center cell misses boundary neighbors. A long shared geohash prefix = physically close, but not the converse (two points across a boundary are physically close yet have totally different prefixes). You must query "center + one ring of neighbors" — the interviewer's favorite trap.

2. Dispatching by straight-line distance. Haversine-nearest ≠ fastest arrival. A driver across a river / one-way street is straight-line near but ETA far. Interviewers probe "how do you define nearest" — the answer is road ETA.

3. The fixed grid degenerates to O(N) at hot spots. One airport cell with tens of thousands of drivers makes a single-bucket radius query collapse. Either go adaptive-resolution or re-split the hot cell.

4. Hash-sharding geo data by user ID. Looks even, but every nearby query must fan out to the whole cluster. Geo data must be sharded by cell to preserve locality.

5. Building an R-Tree for moving points as if they were static venues. 500K updates/s will crush the tree's rebalancing. Use an in-memory grid (O(1) bucket swap) for moving points; leave R-Trees for static data.

Deeper Resources

《Designing Data-Intensive Applications》Ch 3 (Kleppmann): multi-dimensional indexes, R-Trees, space-filling curves, and PostGIS in essence.
Google S2 Geometry: s2geometry.io + 2017 announcement, Hilbert curve and spherical cell hierarchy.
Uber H3: H3 blog + github.com/uber/h3, the design trade-offs of a hexagonal grid.
High Scalability: How Uber Scales Their Real-time Market Platform (2015): the classic breakdown of DISCO, Ringpop, and geo-sharding.

Going Deeper (click to expand)

1. Why are hexagons "better" than squares for a geo grid? Give at least two mathematical reasons, and the cost.

Reason one: equidistant neighbors. A square cell has 4 edge neighbors and 4 diagonal neighbors, and the diagonal center-distance is √2× the edge one — so "take one ring of neighbors" covers different directions at different radii. A hexagon's 6 neighbors are all equidistant and edge-sharing, so kRing(k) approximates a true circle — more uniform for radius queries, traffic diffusion, and ML spatial features.

Reason two: uniform quantization error. When you quantize a continuous point to a cell center, the hexagon is the "closest to a circle" tileable polygon in the plane, giving smaller average quantization error than a square and weaker edge effects.

Cost: hexagons can't perfectly tile a sphere — Euler's theorem forces exactly 12 pentagons (at the icosahedron's 12 vertices). And a parent cell can't hold an integer number of children (splitting into 7 borrows from adjacent cells at the edges), so H3's hierarchy is approximate nesting, with boundary error in cross-resolution aggregation. S2's quads do nest strictly as a quadtree — exactly why S2 is preferred where precise hierarchical range scans matter (database geo keys).

2. For a 3km-radius query, should the cell edge be 500m or 5km? What happens each way, and how do you choose?

Cell too small (500m): a 3km radius needs about 3000/500 = 6 rings of neighbors, roughly 3·6·(6+1)+1 ≈ 127 hexagonal cells, one lookup each — many buckets, high merge cost, but few candidates per bucket so distance math is cheap.

Cell too big (5km): only center + 1 ring = 7 cells, but each covers 5km, so buckets hold many drivers well beyond 3km, the haversine candidate count explodes, and boundary waste is severe.

Principle for choosing cell size: make cell edge ≈ typical query radius, so a query hits "center + 1–2 rings," keeping both bucket count and per-bucket candidates bounded. But a single resolution can't serve "3km find-a-ride vs 100km intercity carpool" — so production systems use multi-resolution: pick resolution by query radius dynamically (both H3/S2 are hierarchical), coarse cells for large radii, fine for small. Uber also tunes resolution by regional density, using finer cells in dense CBDs.

3. Drivers report every 4s = 500K writes/s. Do all these writes really hit a database? How do you absorb them?

Core insight: location data is "high-frequency, ephemeral, low-value per point" — the vast majority of intermediate positions are never read, so there's no need to persist them.

The truth lives in memory. The current position sits in an in-memory grid; a write is just an O(1) "remove from old cell bucket, add to new bucket," never touching disk. 500K writes/s is nothing for an in-memory hash structure.
Persistence is only for recovery. Async snapshot + WAL; losing a few seconds of positions is fine — drivers report fresh positions 4s later and self-heal, an RPO of a few seconds is acceptable.
Tame write amplification. Stationary drivers adaptively lower report frequency; gateway-side coalescing keeps only the latest of multiple reports within 4s for the same driver.
Spread by sharding. 500K/s spread by cell across hundreds of shards = a few thousand writes/s each.

Contrast with an account balance — "low-frequency, must persist, strongly consistent," never to be lost. Location data is the opposite: accept it's ephemeral, self-heal from the latest report — that's the root of cheaply absorbing massive writes.

4. A concert lets out and 30K people hail a ride from one cell at once. How does this hot cell avoid crushing a single node?

This is the inherent Achilles' heel of geo-sharding: shard by cell, and a hot cell necessarily overloads one node. Several measures in concert:

Dynamic re-sharding (drop resolution): monitor per-cell supply/demand and QPS; above threshold split the hot cell into children (H3 down one level = 7) across nodes. The hard part is that writes in that area must briefly queue during migration.
Local L1 cache: dispatch nodes cache a driver snapshot of the hot cell locally (TTL 1–2s); most queries read local, source pressure plummets — the cost is 1–2s staleness, which dispatch tolerates.
Demand-side shedding: surge pricing is essentially using price to spread instantaneous demand across the time axis — product-level backpressure.
Batched matching: instead of greedily matching each request in realtime, do a globally optimal assignment over a 1–2s window — it raises the acceptance rate and incidentally flattens the QPS spike.

Note this is isomorphic to the hot key of Day 2 and the sharding hot spot of Day 4 — the fix is always the combination of "split finer + replicas + local cache + demand-side throttling."

5. Why not just use Postgres + PostGIS's GiST index for everything? Where's its boundary?

PostGIS is the Swiss army knife of geo data, perfect for static/low-update scenarios (venues, parcels, administrative boundaries, Yelp-style search), with ST_DWithin radius queries, polygon intersection, and nearest-neighbor out of the box, plus transactions and persistence. But dispatch hits walls in three places:

Write throughput: R-Tree/GiST is a tree; each update may trigger node splits/merges and index-page rewrites. 500K moves/s will crush WAL and index maintenance — trees are read-optimized, not for high-frequency writes.
Horizontal scale: a single Postgres instance's geo index can't hold global moving points, and sharding by cell + cross-shard nearby queries isn't natively supported — you'd have to build it on top (degenerating to a homegrown grid).
Latency: geo retrieval gets only ~20ms; a disk DB's query latency (even on buffer-pool hits) is hard to keep stably at that scale — only an in-memory grid will do.

Conclusion: the choice is driven by update frequency — static geo data on PostGIS; millions of high-frequency moving points on an in-memory grid + consistent-hash sharding (the Uber route). Forcing moving points into an R-Tree is the most common over-engineering.