Day 9 Medium API Design REST / GraphQL / gRPC Pagination · Versioning

API Design — Once a Contract Is Public, Breaking It Costs Every CallerREST vs GraphQL vs gRPC · Pagination · Versioning · Rate Limiting

Scenario & Constraints

Design the public API of a developer platform (think GitHub / Stripe), serving your own iOS/Android/Web apps and external integrations. The hard part isn't writing endpoints — it's that once a contract is public, breaking it costs every caller. An integration may run untouched for 5 years; delete one field and it dies.

Scale: 100k+ third-party integrations, 1B+ API calls/day, read/write ratio ~50:1.
Heterogeneous clients: mobile wants minimal bytes (selected fields), Web wants one aggregated call, internal services want strongly-typed high throughput. One protocol rarely fits all.
Never break: a published contract is assumed permanent. Breaking it = mass integration errors, support storms, front-page outages.
Stable pagination: while listing charges / issues, data keeps being written — paging must never skip, never duplicate.
Abuse protection: one key can't crush the shared backend — needs fair limiting plus a predictable, developer-friendly 429.

Today: protocol choice, pagination, versioning, and the rate-limit contract — the four parts of API design most likely to bite in production and get probed in interviews.

High-Level Architecture

graph LR subgraph Clients["Clients"] M["Mobile
minimize bytes"] W["Web"] T["3rd-party
may not update for 5y"] end GW["API Gateway
auth · rate limit · version routing"] REST["REST /v1
resources · HTTP-cacheable"] GQL["GraphQL
field selection · one round trip"] subgraph Internal["Internal (gRPC mesh)"] S1["Order svc"] S2["User svc"] S3["Billing svc"] end DB[("DB / Cache")] M --> GW W --> GW T --> GW GW --> REST GW --> GQL REST -->|gRPC| S1 GQL -->|gRPC| S2 S1 --> S3 S1 --> DB classDef gw fill:#2a1530,stroke:#ff7ab6,color:#e8eef5 classDef edge fill:#0e2030,stroke:#5eead4,color:#e8eef5 class GW gw class REST,GQL edge

Core idea: one Gateway centralizes auth, rate limiting, and version routing; externally expose REST + GraphQL (web-friendly, evolvable, cacheable); internally services speak gRPC (strongly-typed, high throughput). Different protocols on each side of the trust boundary is the default shape of nearly every large platform.

Key Technical Points

1. Protocol Choice: REST vs GraphQL vs gRPC

Core trade-off: cacheable simplicity of a resource model vs client-driven field selection vs internal strongly-typed performance.

Principle: three abstractions. REST models the world as resources + HTTP verbs — stateless, cacheable by URL via CDN/browser, but prone to over-fetching (extra fields) or under-fetching (a view needs multiple round trips). GraphQL lets the client declare exactly which fields it wants in one query — one round trip, no over/under-fetch — at the cost of harder caching and rate limiting (everything is POST /graphql, so URL is no longer a cache key). gRPC runs on HTTP/2 + Protobuf — binary, strongly-typed IDL, bidirectional streaming, best latency/throughput — but browsers can't use it directly and it's hard to debug, so it's natively a service-to-service protocol.

Dimension	REST	GraphQL	gRPC
Fetching	fixed resources (over/under-fetch)	per-field, one shot	fixed RPC methods
HTTP caching	✅ native (GET+URL)	❌ hard (POST)	❌ n/a
Typing/contract	weak (OpenAPI bolt-on)	✅ schema-typed	✅ Protobuf-typed
Performance	medium (JSON text)	medium	✅ high (binary+stream)
Browser direct	✅	✅	❌ (needs gRPC-Web)
Best fit	public cacheable resources	multi-client aggregation, variable fields	internal microservices, low latency

# REST: rendering one PR page needs several round trips (under-fetch + N+1)
GET /repos/o/r/pulls/42          -> PR body (with many unused fields, over-fetch)
GET /repos/o/r/pulls/42/commits  -> commit list
GET /repos/o/r/pulls/42/reviews  -> reviews
GET /users/alice                 -> author info (one per author, N+1)

# GraphQL: one round trip, only the fields you ask for
query {
  repository(owner:"o", name:"r") {
    pullRequest(number:42) {
      title
      author { login avatarUrl }
      commits(last:5) { nodes { oid } }
      reviews(last:10) { nodes { state } }
    }
  }
}

How to choose:

Public, resource-shaped, strongly cacheable, wildly varied callers → REST (lowest cognitive bar, CDN-friendly).
Multi-client UIs, variable field needs, want to kill round trips → GraphQL (but you must add query-cost limits, or one nested query crushes the backend).
Internal, low-latency, strong contract, streaming → gRPC. Almost nobody ships gRPC as a public API.

Real-world cases:

GitHub: after years of REST v3, shipped a GraphQL API v4, publicly citing that REST "needed 2-3 calls for one resource, sending too much and too little at once" as the main motivation.
Netflix: evolved from a monolithic GraphQL server to a Federated GraphQL supergraph, letting dozens of backend teams each own a subgraph and deploy independently, ending the "One Graph" deploy traffic jam.
gRPC: descended from Google's internal Stubby, widely used for service-to-service traffic at Uber, Square, Netflix and others — while still exposing REST/GraphQL externally.

2. Pagination: Offset vs Cursor (Keyset)

Core trade-off: offset's random page-jumping simplicity vs cursor's stability and efficiency on large, high-write datasets.

Principle: Offset pagination (LIMIT 20 OFFSET 100000) is simple and can jump to any page, but has two flaws: (1) deep pages are slow — the DB scans and discards the first 100000 rows, O(offset); (2) data drift — inserts/deletes during paging shift the window, causing skips or duplicates. Cursor / Keyset pagination uses an ordered anchor from the last row of the previous page (WHERE id > last_seen ORDER BY id), seeks via index in O(log n), and newly inserted rows don't affect already-paged windows — stable. The cost: no page jumping, no total count.

	Offset	Cursor (Keyset)
Deep-page perf	poor O(offset)	✅ O(log n) via index
Consistency under writes	❌ drift/skip/dup	✅ stable
Jump to page N	✅ yes	❌ sequential only
Total count/pages	✅ available	❌ usually not
Fits	small sets, admin panels	large sets, public APIs, infinite scroll

-- Offset: deep page scans+discards OFFSET rows; inserts during paging misalign
SELECT * FROM events ORDER BY id LIMIT 20 OFFSET 100000;

-- Keyset/cursor: anchor on previous page's last id, seek the primary-key index
SELECT * FROM events
WHERE id > :last_seen_id          -- decoded from the opaque cursor
ORDER BY id LIMIT 20;
-- next_cursor = encode(last row's id); new inserts don't disturb paged windows

Real-world cases:

Slack: Evolving API Pagination at Slack documents the journey "no pagination → offset → cursor" — under high-write channel messages, offset re-returns newly inserted messages; only cursors stay stable.
Google API standard: AIP-158 mandates an opaque (non-parseable) page_token — hiding pagination internals inside the token so the underlying implementation can change without breaking callers.

3. Versioning: Making the Contract "Never Break"

Core trade-off: explicit versions (/v2) and the freedom to break vs date-based versions + a compatibility layer: zero breakage, perpetual maintenance cost.

Principle: first define what a breaking change is — adding a field/endpoint is backward-compatible (old clients ignore new fields); removing a field, changing a type/semantics/default is breaking. Three strategies: (1) URL path (/v1//v2) — intuitive, but each major version means maintaining a whole parallel codebase; (2) header negotiation — clean URLs, but invisible and easily dropped by proxies/caches; (3) date-based + compatibility layer (the Stripe way) — an account is pinned to its signup-time version, core logic only writes the latest, and responses pass through a backward transform chain that downgrades to the client's pinned shape. New and old clients never break; the cost is a compatibility layer that accretes year over year.

graph LR REQ["request
API-Version: 2018-02-28
account pinned to old version"] CORE["core logic
only knows latest schema"] T1["transform
2020 → 2019"] T2["transform
2019 → 2018"] RESP["response
downgraded to 2018 shape"] REQ --> CORE CORE -->|latest shape| T1 --> T2 --> RESP classDef c fill:#2a1530,stroke:#ff7ab6,color:#e8eef5 class CORE c

# Core logic only emits the latest response; the compat layer downgrades
# step by step to the account's pinned version
def respond(account, payload_latest):
    v = account.api_version                    # e.g. "2018-02-28"
    for change in breaking_changes_after(v):   # applied newest-first
        payload_latest = change.downgrade(payload_latest)
    return payload_latest
# adding a field = backward-compatible, not in the chain
# removing a field / changing a type = breaking, needs a downgrade()

The trade-off: URL versions are simple and visible but explode in count (one codebase per /vN, a maintenance nightmare); date-based versions mean zero breakage for callers, but every breaking change requires a permanently-living downgrade function — shifting complexity from the client to the platform. The more "developer-friendly" the platform, the heavier its own compatibility burden.

Real-world cases:

Stripe: APIs as infrastructure: future-proofing Stripe with versioning details date-based versions + a response compatibility layer — engineers write only the latest code, while requests/responses are transformed both ways at the boundary per account version, so a 10-year-old integration still runs today.
Google AIP / GitHub: favor additive evolution (add, don't remove), deferring deletions to rare major versions to maximize the backward-compatible window.

4. The Rate-Limit Contract: 429 Is a Protocol, Not Just a Refusal

Core trade-off: silent drop / bare 429 vs cooperative limiting with Retry-After + quota headers + idempotency keys. (Algorithm internals — token bucket / sliding window / distributed limiting — are Day 10.)

Principle: at the API-design level, rate limiting is a contract, not pure defense. It should tell the client how much quota is left, when to retry, and how to retry safely: (1) on limit, return 429 + Retry-After (exactly how long to wait); (2) every response carries X-RateLimit-Remaining / Reset, so clients slow down proactively instead of hitting a wall; (3) per-key quota tiers (free/paid); (4) write requests carry an Idempotency-Key, so a rate-limited retry doesn't double-charge (see Day 7). The cardinal sin is a bare 429 — clients can only retry blindly and immediately, creating a retry storm that knocks the backend over a second time.

# Rate limiting is a contract: client reads Retry-After; idempotency key makes retry safe
resp = POST("/v1/charges", body,
            headers={"Idempotency-Key": key})   # server dedupes on same key
if resp.status == 429:
    wait = resp.headers.get("Retry-After")       # server tells you how long (seconds)
    sleep(wait + jitter())                        # add jitter to avoid a retry storm
    retry()                                       # same key -> no double charge
# also read X-RateLimit-Remaining to slow down before hitting 429

Real-world cases:

GitHub: the REST API attaches X-RateLimit-Remaining / Reset to every response and returns 403/429 with the reset time on overage — the de facto standard for public-API rate-limit headers.
Stripe: rate limiting + Idempotency-Key together — retries triggered by network blips or limiting are deduplicated by the idempotency key, guaranteeing "charge at most once."

Scaling & Optimization

GraphQL governance: add persisted queries (client sends only a query hash, can use GET → regains HTTP caching) + query cost / depth limits (weight each field, reject over threshold, blocking deep-nesting DoS).
REST caching & concurrency: ETag + If-None-Match for conditional requests (304 saves bandwidth), If-Match for optimistic concurrency (prevents lost updates); put cacheable GETs behind a CDN.
gRPC for browsers: use gRPC-Web or do gRPC↔JSON transcoding at the gateway — internal traffic enjoys gRPC, external stays JSON.
GraphQL → Federation: when the monolithic graph becomes a deploy bottleneck, split into per-team subgraphs composed by a gateway supergraph.
Push capabilities into the Gateway: auth, rate limiting, observability, version routing centralized; business services focus on logic.

Pitfalls & Interview Questions

1. Offset pagination on a public API. Deep pages are slow and data drift causes skips/dups. Large, high-write datasets demand cursors with opaque tokens that hide the implementation.

2. Exposing internal fields / DB schema directly as the API. That turns your database structure into a public contract — any schema change now breaks the API. APIs need an independent DTO boundary.

3. GraphQL with no query cost / depth limit. A deeply nested query (posts { comments { author { posts ... } } }) can force exponential joins — a classic DoS vector.

4. Treating "adding a field" as a breaking change worthy of a new version. Adding fields is backward-compatible; old clients ignore them. Misjudging this explodes version count and maintenance cost.

5. Returning a bare 429. Without Retry-After / X-RateLimit, clients can only retry blindly → a retry storm that knocks the backend over again. Rate limiting is a contract — make it cooperative.

Likely interview follow-ups:

What fundamentally separates REST, GraphQL, and gRPC? When is GraphQL essential, when should you reach for gRPC?
Why is GraphQL hard to cache over HTTP? How do persisted queries partially recover it?
Why use an opaque cursor token instead of a raw id? What capability does it give up?
What counts as breaking vs not? Where's the cost of date-based versions + a compat layer vs /v2?
A third party keeps hitting 429 and retrying hard — how do you make it "back off politely" at the contract level?
How does GraphQL's N+1 arise, and how does DataLoader solve it?

Deeper Resources

Designing Data-Intensive Applications, Ch 4 §Encoding & Evolution (Kleppmann): backward/forward compatibility and schema evolution — the theoretical root of API versioning.
Stripe — APIs as infrastructure: future-proofing Stripe with versioning: the authoritative take on date-based versions + compatibility layers.
Slack — Evolving API Pagination at Slack: the real offset → cursor evolution and its pitfalls.
Google — AIP-158: Pagination: the spec for opaque page_token and standard List methods.
GitHub — The GitHub GraphQL API / Netflix — GraphQL Federation: industrial motivations for REST→GraphQL and federated supergraphs.

Going Deeper

1. GraphQL kills over/under-fetching but throws away REST's native HTTP/CDN caching advantage. Why? How do persisted queries partially recover it?

Why caching is lost: HTTP caches key on URL + method and only cache idempotent GETs. REST's GET /users/42 is naturally a stable cache key. GraphQL sends everything as POST /graphql with the query in the body — one URL maps to infinitely many queries, so intermediaries can't cache by URL, and POST isn't cached by default. Caching responsibility moves from "free HTTP infrastructure" to "the application layer itself" (e.g., Apollo's normalized client cache).

How persisted queries help: the client pre-registers a query with the server for a hash; at runtime it sends only the hash + variables. Because the body is small and deterministic, you can switch to GET /graphql?sha256=... — the URL becomes a stable cache key again, and CDN/rate limiting work once more. Bonus: the server accepts only whitelisted queries, automatically blocking malicious arbitrary ones. The cost is an extra registration step and reduced flexibility for dynamically assembled queries.

2. If date-based versions + a compat layer let a contract "never break," why doesn't everyone do it? What's the second-order cost?

It looks perfect on the surface: zero breakage for callers, engineers write only the latest code. But the cost is hidden and compounds yearly:

The compat layer lives forever: each breaking change needs a downgrade that can never be deleted (as long as one account is pinned to that old version). Over years that's hundreds of transforms whose interactions explode the test matrix.
Debugging gets harder: a bug may surface only when "some old version passes through some transform chain" — reproducing it means first restoring the client's pinned version.
Needs infrastructure to back it: automated tests across all versions, version replay, a transform DSL. Teams without this get dragged down by the compat layer.

So it suits a company like Stripe — "API as product, vast and uncontrollable caller base" — where paying a perpetual maintenance tax for stability pays off. For internal scenarios where you can push client upgrades, a simple /v2 + deprecation window is more economical. The essence: shift the pain of breakage from all callers onto the platform itself; whether it's worth it depends on how many and how uncontrollable the callers are.

3. Why insist on an opaque (non-parseable) cursor token instead of exposing last_id directly? What new constraints does this introduce?

Why opaque: if the cursor is a bare last_id=12345, clients start depending on its structure — guessing, hand-crafting, skipping around. The moment you want to change the implementation (from a single-column id to a (created_at, id) composite anchor for time-ordering, or adding shard routing), every client that relied on the raw structure breaks. Encoding it as an opaque string (base64 of an internal struct) excludes pagination internals from the public contract — exactly AIP-158's point: what can be parsed will be depended on, and what's depended on can't be changed.

New constraints: (1) cursors expire — the pinned sort key/snapshot point can't be valid forever, so clients must handle "cursor expired" and restart from the top; (2) no page jumps, no totals, clashing with "show page N of M"; (3) the cursor carries no authorization — you must re-authorize every request, or someone with another's cursor could page across their data; (4) the sort key must be unique and ordered, or boundary rows skip/duplicate (use a (timestamp, id) composite to break ties).

4. Public APIs are almost universally REST/GraphQL; gRPC stays internal. What would you pay to force gRPC as a public API?

Browsers can't connect directly: gRPC relies on low-level HTTP/2 features (trailers) that browser fetch can't access, requiring a gRPC-Web + proxy layer — extra infra friction for third parties.
Hard to debug: binary Protobuf isn't curl-able and human-readable like JSON, raising debugging cost for outside developers; tooling support lags REST.
Tight IDL coupling: callers must obtain the .proto and generate stubs; coordinating proto upgrades is costly — you can't force thousands of third parties to sync.
No caching/CDN: with no URL semantics, HTTP caching infrastructure is unusable.

But these "drawbacks" are virtues internally: you control all services, can upgrade protos uniformly, want low latency and strong types over human readability, and east-west traffic needs no CDN. So protocol choice is fundamentally about "are the callers controllable?": controllable (internal) → optimize machine efficiency, pick gRPC; uncontrollable (public) → optimize human reach and evolvability, pick REST/GraphQL.

5. GraphQL's N+1 is more insidious than REST's — why? How does DataLoader solve it, and where are its "batch + cache" boundaries?

Why more insidious: REST's N+1 is visible to the client (you can count requests). GraphQL's N+1 hides in server-side resolvers: query posts { author { name } } and the framework calls the author resolver once per post — 100 posts means 100 SELECT author WHERE id=? queries. From outside it's "one query," yet the backend explodes into 1+100 DB hits that vary with query shape — invisible unless you watch metrics.

How DataLoader solves it: within a single request it does two things — (1) batch: instead of running each load(id) immediately, it collects all ids in the same tick and merges them into one WHERE id IN (...) at the end of the event loop, turning N into 1; (2) per-request cache: a repeated load(same id) in the same request returns from cache.

Boundaries: the cache is valid only within the single request's lifecycle (preventing cross-request stale reads, ensuring intra-query consistency), not global; it solves "fetch amplification," not "the query being too deep/complex" — the latter needs query-cost analysis. They're two orthogonal lines of defense.