Design the public API of a developer platform (think GitHub / Stripe), serving your own iOS/Android/Web apps and external integrations. The hard part isn't writing endpoints — it's that once a contract is public, breaking it costs every caller. An integration may run untouched for 5 years; delete one field and it dies.
Today: protocol choice, pagination, versioning, and the rate-limit contract — the four parts of API design most likely to bite in production and get probed in interviews.
Core idea: one Gateway centralizes auth, rate limiting, and version routing; externally expose REST + GraphQL (web-friendly, evolvable, cacheable); internally services speak gRPC (strongly-typed, high throughput). Different protocols on each side of the trust boundary is the default shape of nearly every large platform.
Core trade-off: cacheable simplicity of a resource model vs client-driven field selection vs internal strongly-typed performance.
Principle: three abstractions. REST models the world as resources + HTTP verbs — stateless, cacheable by URL via CDN/browser, but prone to over-fetching (extra fields) or under-fetching (a view needs multiple round trips). GraphQL lets the client declare exactly which fields it wants in one query — one round trip, no over/under-fetch — at the cost of harder caching and rate limiting (everything is POST /graphql, so URL is no longer a cache key). gRPC runs on HTTP/2 + Protobuf — binary, strongly-typed IDL, bidirectional streaming, best latency/throughput — but browsers can't use it directly and it's hard to debug, so it's natively a service-to-service protocol.
| Dimension | REST | GraphQL | gRPC |
|---|---|---|---|
| Fetching | fixed resources (over/under-fetch) | per-field, one shot | fixed RPC methods |
| HTTP caching | ✅ native (GET+URL) | ❌ hard (POST) | ❌ n/a |
| Typing/contract | weak (OpenAPI bolt-on) | ✅ schema-typed | ✅ Protobuf-typed |
| Performance | medium (JSON text) | medium | ✅ high (binary+stream) |
| Browser direct | ✅ | ✅ | ❌ (needs gRPC-Web) |
| Best fit | public cacheable resources | multi-client aggregation, variable fields | internal microservices, low latency |
# REST: rendering one PR page needs several round trips (under-fetch + N+1)
GET /repos/o/r/pulls/42 -> PR body (with many unused fields, over-fetch)
GET /repos/o/r/pulls/42/commits -> commit list
GET /repos/o/r/pulls/42/reviews -> reviews
GET /users/alice -> author info (one per author, N+1)
# GraphQL: one round trip, only the fields you ask for
query {
repository(owner:"o", name:"r") {
pullRequest(number:42) {
title
author { login avatarUrl }
commits(last:5) { nodes { oid } }
reviews(last:10) { nodes { state } }
}
}
}
Core trade-off: offset's random page-jumping simplicity vs cursor's stability and efficiency on large, high-write datasets.
Principle: Offset pagination (LIMIT 20 OFFSET 100000) is simple and can jump to any page, but has two flaws: (1) deep pages are slow — the DB scans and discards the first 100000 rows, O(offset); (2) data drift — inserts/deletes during paging shift the window, causing skips or duplicates. Cursor / Keyset pagination uses an ordered anchor from the last row of the previous page (WHERE id > last_seen ORDER BY id), seeks via index in O(log n), and newly inserted rows don't affect already-paged windows — stable. The cost: no page jumping, no total count.
| Offset | Cursor (Keyset) | |
|---|---|---|
| Deep-page perf | poor O(offset) | ✅ O(log n) via index |
| Consistency under writes | ❌ drift/skip/dup | ✅ stable |
| Jump to page N | ✅ yes | ❌ sequential only |
| Total count/pages | ✅ available | ❌ usually not |
| Fits | small sets, admin panels | large sets, public APIs, infinite scroll |
-- Offset: deep page scans+discards OFFSET rows; inserts during paging misalign
SELECT * FROM events ORDER BY id LIMIT 20 OFFSET 100000;
-- Keyset/cursor: anchor on previous page's last id, seek the primary-key index
SELECT * FROM events
WHERE id > :last_seen_id -- decoded from the opaque cursor
ORDER BY id LIMIT 20;
-- next_cursor = encode(last row's id); new inserts don't disturb paged windows
Core trade-off: explicit versions (/v2) and the freedom to break vs date-based versions + a compatibility layer: zero breakage, perpetual maintenance cost.
Principle: first define what a breaking change is — adding a field/endpoint is backward-compatible (old clients ignore new fields); removing a field, changing a type/semantics/default is breaking. Three strategies: (1) URL path (/v1//v2) — intuitive, but each major version means maintaining a whole parallel codebase; (2) header negotiation — clean URLs, but invisible and easily dropped by proxies/caches; (3) date-based + compatibility layer (the Stripe way) — an account is pinned to its signup-time version, core logic only writes the latest, and responses pass through a backward transform chain that downgrades to the client's pinned shape. New and old clients never break; the cost is a compatibility layer that accretes year over year.
# Core logic only emits the latest response; the compat layer downgrades
# step by step to the account's pinned version
def respond(account, payload_latest):
v = account.api_version # e.g. "2018-02-28"
for change in breaking_changes_after(v): # applied newest-first
payload_latest = change.downgrade(payload_latest)
return payload_latest
# adding a field = backward-compatible, not in the chain
# removing a field / changing a type = breaking, needs a downgrade()
Core trade-off: silent drop / bare 429 vs cooperative limiting with Retry-After + quota headers + idempotency keys. (Algorithm internals — token bucket / sliding window / distributed limiting — are Day 10.)
Principle: at the API-design level, rate limiting is a contract, not pure defense. It should tell the client how much quota is left, when to retry, and how to retry safely: (1) on limit, return 429 + Retry-After (exactly how long to wait); (2) every response carries X-RateLimit-Remaining / Reset, so clients slow down proactively instead of hitting a wall; (3) per-key quota tiers (free/paid); (4) write requests carry an Idempotency-Key, so a rate-limited retry doesn't double-charge (see Day 7). The cardinal sin is a bare 429 — clients can only retry blindly and immediately, creating a retry storm that knocks the backend over a second time.
# Rate limiting is a contract: client reads Retry-After; idempotency key makes retry safe
resp = POST("/v1/charges", body,
headers={"Idempotency-Key": key}) # server dedupes on same key
if resp.status == 429:
wait = resp.headers.get("Retry-After") # server tells you how long (seconds)
sleep(wait + jitter()) # add jitter to avoid a retry storm
retry() # same key -> no double charge
# also read X-RateLimit-Remaining to slow down before hitting 429
X-RateLimit-Remaining / Reset to every response and returns 403/429 with the reset time on overage — the de facto standard for public-API rate-limit headers.ETag + If-None-Match for conditional requests (304 saves bandwidth), If-Match for optimistic concurrency (prevents lost updates); put cacheable GETs behind a CDN.posts { comments { author { posts ... } } }) can force exponential joins — a classic DoS vector.Likely interview follow-ups:
Why caching is lost: HTTP caches key on URL + method and only cache idempotent GETs. REST's GET /users/42 is naturally a stable cache key. GraphQL sends everything as POST /graphql with the query in the body — one URL maps to infinitely many queries, so intermediaries can't cache by URL, and POST isn't cached by default. Caching responsibility moves from "free HTTP infrastructure" to "the application layer itself" (e.g., Apollo's normalized client cache).
How persisted queries help: the client pre-registers a query with the server for a hash; at runtime it sends only the hash + variables. Because the body is small and deterministic, you can switch to GET /graphql?sha256=... — the URL becomes a stable cache key again, and CDN/rate limiting work once more. Bonus: the server accepts only whitelisted queries, automatically blocking malicious arbitrary ones. The cost is an extra registration step and reduced flexibility for dynamically assembled queries.
It looks perfect on the surface: zero breakage for callers, engineers write only the latest code. But the cost is hidden and compounds yearly:
So it suits a company like Stripe — "API as product, vast and uncontrollable caller base" — where paying a perpetual maintenance tax for stability pays off. For internal scenarios where you can push client upgrades, a simple /v2 + deprecation window is more economical. The essence: shift the pain of breakage from all callers onto the platform itself; whether it's worth it depends on how many and how uncontrollable the callers are.
Why opaque: if the cursor is a bare last_id=12345, clients start depending on its structure — guessing, hand-crafting, skipping around. The moment you want to change the implementation (from a single-column id to a (created_at, id) composite anchor for time-ordering, or adding shard routing), every client that relied on the raw structure breaks. Encoding it as an opaque string (base64 of an internal struct) excludes pagination internals from the public contract — exactly AIP-158's point: what can be parsed will be depended on, and what's depended on can't be changed.
New constraints: (1) cursors expire — the pinned sort key/snapshot point can't be valid forever, so clients must handle "cursor expired" and restart from the top; (2) no page jumps, no totals, clashing with "show page N of M"; (3) the cursor carries no authorization — you must re-authorize every request, or someone with another's cursor could page across their data; (4) the sort key must be unique and ordered, or boundary rows skip/duplicate (use a (timestamp, id) composite to break ties).
curl-able and human-readable like JSON, raising debugging cost for outside developers; tooling support lags REST..proto and generate stubs; coordinating proto upgrades is costly — you can't force thousands of third parties to sync.But these "drawbacks" are virtues internally: you control all services, can upgrade protos uniformly, want low latency and strong types over human readability, and east-west traffic needs no CDN. So protocol choice is fundamentally about "are the callers controllable?": controllable (internal) → optimize machine efficiency, pick gRPC; uncontrollable (public) → optimize human reach and evolvability, pick REST/GraphQL.
Why more insidious: REST's N+1 is visible to the client (you can count requests). GraphQL's N+1 hides in server-side resolvers: query posts { author { name } } and the framework calls the author resolver once per post — 100 posts means 100 SELECT author WHERE id=? queries. From outside it's "one query," yet the backend explodes into 1+100 DB hits that vary with query shape — invisible unless you watch metrics.
How DataLoader solves it: within a single request it does two things — (1) batch: instead of running each load(id) immediately, it collects all ids in the same tick and merges them into one WHERE id IN (...) at the end of the event loop, turning N into 1; (2) per-request cache: a repeated load(same id) in the same request returns from cache.
Boundaries: the cache is valid only within the single request's lifecycle (preventing cross-request stale reads, ensuring intra-query consistency), not global; it solves "fetch amplification," not "the query being too deep/complex" — the latter needs query-cost analysis. They're two orthogonal lines of defense.