Design the authorization system for a B2B SaaS platform (think Notion / GitHub / Figma): the org structure is Org → Team → Project → Resource, four levels with cascading inheritance — an Org admin can touch everything, a Project member only their own project. It looks like a simple "can this user read this doc?", but it's the hottest path every API request must pass through: one mistake is a privilege-escalation leak, one slow check drags down the whole site.
Treat it as a real system and size it:
This issue covers four things: what model expresses permissions, how to compute hierarchical inheritance fast, how to isolate tenants, and how to make the audit log non-repudiable.
graph TD
APP["Service
API / frontend BFF"]
PDP["Policy Decision Point
Check(user, perm, obj)?"]
CACHE["Permission cache
sharded+replicated · in-mem filter"]
STORE["Relation store
relation tuples · Postgres/Spanner"]
CDC["CDC / invalidation
binlog → Kafka"]
AUDIT["Audit log
append-only · hash chain"]
APP -->|"1. check"| PDP
PDP -->|"2. hit"| CACHE
CACHE -->|"3. miss → origin"| STORE
STORE -->|"4. mutation"| CDC
CDC -->|"5. invalidate"| CACHE
APP -->|"6. perm change / sensitive access"| AUDIT
The PDP (Policy Decision Point) is the core: business code never judges permissions itself — it asks the PDP "does this subject have this permission on this object?". The PDP centralizes both the decision logic (policy) and the decision data (who's in which group, who owns what) — authorization logic pushed down to the data layer rather than scattered across every microservice, the shared centralizing philosophy of both Airbnb Himeji and Google Zanzibar. The cache layer is sharded + replicated and filters relations in memory, hitting origin only on miss; the write path turns mutations into cache-invalidation events via CDC. The audit log is a side channel — independent, never on the check hot path.
Trade-off in one line: RBAC is simple but roles explode; ABAC is flexible but hard to audit and hard to answer "who can access X"; ReBAC (Zanzibar-style) naturally expresses hierarchy and sharing but you must build a graph-traversal engine.
[Principle] The three models answer the same question with different data structures:
(object#relation@subject); a decision is a reachability traversal over the relation graph. "doc:readme#viewer@user:alice", "doc:readme#viewer@group:eng#member" — a viewer can be a user or "all members of group eng", so hierarchy and sharing unfold naturally. This is Google Zanzibar's core abstraction.| Model | Strength | Sacrifice / pitfall | Fits |
|---|---|---|---|
| RBAC | Simple, auditable, everyone gets it | Role explosion; can't express "only this project" | Few permission dimensions, flat org |
| ABAC | Highly flexible, context-aware (time/IP/attrs) | Hard to reverse "who can access X"; hard to reason about; accidental allows | Compliance/context-sensitive, dynamic rules |
| ReBAC | Native hierarchy/inheritance/fine-grained sharing; reversible | Must build/adopt a graph engine; deep traversal risks latency | Multi-level, collaborative sharing (docs/code/projects) |
[Pseudo-code] A ReBAC check is essentially a recursive graph traversal with "userset rewrite":
# pseudo-code — Zanzibar-style check: does obj's relation contain subject?
def check(obj, relation, subject) -> bool:
for tuple in tuples(obj, relation): # read (obj#relation@?)
u = tuple.subject
if u == subject:
return True # direct hit
if u.is_userset(): # e.g. group:eng#member
if check(u.obj, u.relation, subject):# recursively expand userset
return True
# userset rewrite: viewer inherits from editor, owner, or parent's viewer
for rule in rewrite_rules(obj.type, relation): # e.g. viewer = editor + parent.viewer
if check_rewrite(obj, rule, subject):
return True
return False
The magic is in rewrite_rules: viewer = self | editor | parent→viewer lets permissions cascade down the hierarchy — no need to copy permissions onto every resource.
Trade-off in one line: real-time recursive traversal is cheap to write but expensive to read (long latency tail on deep hierarchies); materialized expansion is fast to read but amplifies writes and adds invalidation-consistency problems.
[Principle] With Org→Team→Project→Doc four levels, "can Alice read this doc?" may have to climb from the doc up to the Org and expand the whole group tree — one check triggers dozens of tuple reads, and depth and fan-out determine the tail latency. Two optimization routes:
check(group:eng, member, alice); merge concurrent origin reads with single-flight. Simple, cheap writes, but p99 jitter on cold cache or deep graphs.Consistency pitfall: permission revocation must take effect promptly, or "someone just removed from a project can still read for 5 minutes" is a security incident. Zanzibar uses a Zookie (a token carrying a timestamp) to guarantee "a permission read is no older than a given write", binding external consistency to cache freshness — the key thing that sets it apart from an ordinary cache.
# pseudo-code — fast revocation: version + CDC invalidation, not pure TTL
def revoke(group, member):
db.delete(tuple(group, "member", member)) # 1. persist
bump_version(group) # 2. bump that object's perm version
cdc.publish(InvalidateEvent(group)) # 3. binlog→Kafka→cache nodes drop key
# read side: cache entries carry version; check validates version not stale, else origin
Trade-off in one line: a shared DB saves cost but "miss one WHERE and you leak across tenants"; a DB-per-tenant gives the strongest isolation but ops/migration cost explodes linearly with tenant count.
[Principle] Three isolation tiers map to three cost tiers:
| Approach | Isolation | Cost/ops | Failure mode |
|---|---|---|---|
Shared DB + tenant_id row-level | Logical (weak) | Lowest; one table set for all | Forget WHERE tenant_id → cross-tenant leak; noisy neighbor |
| Shared DB + schema-per-tenant | Medium | Medium; switch schema per connection | Thousands of schemas slow metadata/migrations |
| DB-per-tenant | Strong (physical) | Highest; N databases to run | Linear blow-up; small tenants waste resources |
Mainstream SaaS defaults to shared DB + row-level (cost wins) and uses Postgres Row-Level Security (RLS) to turn "forgot WHERE" from a human discipline into a database-enforced guarantee: once the policy is set and the connection sets app.tenant_id, the DB automatically appends the tenant filter to every query — no leak even if the app forgets it. Large customers (with physical-isolation compliance) then get a dedicated DB, a hybrid tiered approach.
-- Postgres RLS: push tenant isolation down to the database
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON documents
USING (tenant_id = current_setting('app.tenant_id')::uuid);
-- each connection, when entering a tenant context:
SET app.tenant_id = '...'; -- all queries afterward are auto-filtered, no manual WHERE
RESET the tenant context before returning to the pool, or the next tenant inherits a connection still carrying the previous tenant_id. Also note table owner / BYPASSRLS roles skip policies — don't run migrations as superuser.Trade-off in one line: audit must be written inside the business transaction so it isn't lost, yet must not slow/pollute the business DB; to be trustworthy it must resist tampering, but "fully immutable" conflicts with "operable".
[Principle] The fundamental difference from ordinary logging: an audit log is evidence and must be complete (no lost events), tamper-evident (unchangeable after the fact), traceable (who/what/when/from where).
hash(prev_hash + payload); alter any middle record and every subsequent hash breaks, exposing the tampering. Periodically anchor the chain head to an external store (WORM / notarization) to even defend against deleting whole segments.# pseudo-code — hash-chained audit, tampering detectable
def append_audit(event):
prev = store.last() # take previous record
record = {
"actor": event.actor, "action": event.action,
"target": event.target, "ts": now(),
"prev_hash": prev.hash,
}
record["hash"] = sha256(prev.hash + canonical_json(record))
store.append(record) # append-only, never UPDATE/DELETE
def verify(): # patrol: recompute each hash, alert on break
for r in store.scan():
assert r["hash"] == sha256(r["prev_hash"] + canonical_json(strip_hash(r)))
if user.role==...; at scale, extract a centralized authorization service (Zanzibar/Himeji/OpenFGA) for unified model, audit, and caching.tenant_id → cross-tenant read. Don't forget BYPASSRLS roles and migrations skipping policies.A normal cache's staleness means "reading an old value"; in authorization, staleness is a security incident — a just-revoked user reading old permissions = escalation. But the subtler issue is causal consistency: a user first sets a doc to private (writes the ACL), then shares the link; when the recipient opens it, they must read the "now private" state, or they read a private doc as public. Pure TTL can't guarantee "this read is no older than that write".
A Zookie is a token carrying a version/timestamp: writing an ACL returns a Zookie, subsequent checks carry it, and the system guarantees the snapshot used for the decision is no older than the moment the Zookie names. It turns freshness from a fuzzy TTL into a provable causal lower bound — binding cache consistency to external consistency. The cost is maintaining a global version (Zanzibar leans on Spanner's TrueTime), hard for ordinary systems to replicate, so most implementations approximate it with "version + active CDC invalidation".
Never expand into 2M post#viewer@user:X tuples — posting once inserts 2M rows, changing visibility deletes 2M rows. This is the classic fanout write amplification (see Day 14).
The right move is an indirect userset reference: store one tuple post#viewer@user:celeb#follower, i.e. "viewer = celeb's follower set", then expand the follower relation at check time. But this shifts the cost to reads: each check must traverse the follower set to test membership — too slow for a 2M-member set.
This is exactly where Zanzibar's Leopard index earns its keep: precompute a flattened membership bitmap for deep/wide groups, turning "is X in celeb's followers" from graph traversal into an index lookup. The trade-off returns to point ②: maintain an extra materialized index on write for O(1) reads. At heart this is fanout-on-write vs fanout-on-read replayed in the authorization context.
Checking each one (fetch all projects, then filter) = N+1 authorization + fetching lots of unauthorized data only to discard it, and pagination breaks (fewer than 20 after filtering). This question separates juniors from seniors.
Two correct answers: ① Reverse index / materialization — maintain a user → accessible_objects list and query directly WHERE id IN (accessible set) LIMIT 20, pushing authorization into the data query, done in one shot with correct pagination. ② Authorization-aware query — compile the permission condition into SQL (RLS or a join on the permission table) so the database filters during the scan.
Zanzibar provides Expand/reverse APIs precisely for this. The key insight: "check a single object" and "list accessible objects" are two different problems; the latter can't be solved by looping the former — one is a point query, the other needs a reverse index. Many authz performance incidents come from looping check to build lists.
This is the classic tiered tenancy problem. Don't migrate everyone to dedicated DBs for one customer — cost and ops explode. Instead: keep the shared DB as the default pool (the vast majority of small/mid tenants), while supporting routing a specific tenant to a dedicated DB/schema.
The key is a routing-layer abstraction: add a tenant → shard/datasource mapping in the data-access layer; app code resolves to the right connection by tenant_id. Pooled tenants use RLS isolation, dedicated tenants get a physical DB. Onboarding a new such customer is just a routing config change, no business-code change.
The hard part is cross-tier migration: moving a tenant from shared to dedicated requires online data migration (dual-write → backfill → verify → switch reads → switch writes, see Day 22's expand-contract). And unify schema evolution — the dedicated DB must not lag the shared DB's migration version. This question tests "make an exception for your most expensive customer, but don't let the exception pollute the main path".
This is a direct clash of two compliance requirements: audit must be immutably retained, GDPR requires deletable personal data. Directly deleting an audit record breaks the hash chain and destroys non-repudiation.
Common solutions: ① Crypto-shredding — don't store personal data in cleartext; encrypt it with a per-user key; to "delete" the user, destroy the key, the ciphertext is permanently undecryptable but the record (and its hash) remains, chain intact. ② Separate PII — the audit chain stores only identifiers like actor_id / action / target_id and event structure; real PII goes in a deletable side table, the chain stores references; deleting PII doesn't touch the chain. ③ Tombstone — don't physically delete; append a "deleted" event, changing visibility rather than the data itself.
The core insight: "immutability" constrains the fact that an audit event happened; "deletability" constrains the personal-data payload within it — decouple the two layers and you satisfy both. This is why audit design must separate "event skeleton" from "PII payload" from day one.