Day 30 Hard Authorization RBAC/ABAC/ReBAC Multi-tenancy Audit Log

Authorization & Account Systems — Graph Traversal, Isolation, Non-repudiation Behind One CheckRBAC vs ABAC vs ReBAC · Multi-tenancy · Audit Log

Problem Scenario + Requirements

Design the authorization system for a B2B SaaS platform (think Notion / GitHub / Figma): the org structure is Org → Team → Project → Resource, four levels with cascading inheritance — an Org admin can touch everything, a Project member only their own project. It looks like a simple "can this user read this doc?", but it's the hottest path every API request must pass through: one mistake is a privilege-escalation leak, one slow check drags down the whole site.

Treat it as a real system and size it:

This issue covers four things: what model expresses permissions, how to compute hierarchical inheritance fast, how to isolate tenants, and how to make the audit log non-repudiable.

High-Level Architecture

graph TD
    APP["Service
API / frontend BFF"] PDP["Policy Decision Point
Check(user, perm, obj)?"] CACHE["Permission cache
sharded+replicated · in-mem filter"] STORE["Relation store
relation tuples · Postgres/Spanner"] CDC["CDC / invalidation
binlog → Kafka"] AUDIT["Audit log
append-only · hash chain"] APP -->|"1. check"| PDP PDP -->|"2. hit"| CACHE CACHE -->|"3. miss → origin"| STORE STORE -->|"4. mutation"| CDC CDC -->|"5. invalidate"| CACHE APP -->|"6. perm change / sensitive access"| AUDIT

The PDP (Policy Decision Point) is the core: business code never judges permissions itself — it asks the PDP "does this subject have this permission on this object?". The PDP centralizes both the decision logic (policy) and the decision data (who's in which group, who owns what) — authorization logic pushed down to the data layer rather than scattered across every microservice, the shared centralizing philosophy of both Airbnb Himeji and Google Zanzibar. The cache layer is sharded + replicated and filters relations in memory, hitting origin only on miss; the write path turns mutations into cache-invalidation events via CDC. The audit log is a side channel — independent, never on the check hot path.

Key Technical Points

① Authorization model: RBAC vs ABAC vs ReBAC — express permissions via "roles", "attributes", or "relationships"

Trade-off in one line: RBAC is simple but roles explode; ABAC is flexible but hard to audit and hard to answer "who can access X"; ReBAC (Zanzibar-style) naturally expresses hierarchy and sharing but you must build a graph-traversal engine.

[Principle] The three models answer the same question with different data structures:

ModelStrengthSacrifice / pitfallFits
RBACSimple, auditable, everyone gets itRole explosion; can't express "only this project"Few permission dimensions, flat org
ABACHighly flexible, context-aware (time/IP/attrs)Hard to reverse "who can access X"; hard to reason about; accidental allowsCompliance/context-sensitive, dynamic rules
ReBACNative hierarchy/inheritance/fine-grained sharing; reversibleMust build/adopt a graph engine; deep traversal risks latencyMulti-level, collaborative sharing (docs/code/projects)

[Pseudo-code] A ReBAC check is essentially a recursive graph traversal with "userset rewrite":

# pseudo-code — Zanzibar-style check: does obj's relation contain subject?
def check(obj, relation, subject) -> bool:
    for tuple in tuples(obj, relation):          # read (obj#relation@?)
        u = tuple.subject
        if u == subject:
            return True                          # direct hit
        if u.is_userset():                       # e.g. group:eng#member
            if check(u.obj, u.relation, subject):# recursively expand userset
                return True
    # userset rewrite: viewer inherits from editor, owner, or parent's viewer
    for rule in rewrite_rules(obj.type, relation):   # e.g. viewer = editor + parent.viewer
        if check_rewrite(obj, rule, subject):
            return True
    return False

The magic is in rewrite_rules: viewer = self | editor | parent→viewer lets permissions cascade down the hierarchy — no need to copy permissions onto every resource.

Real case: Google Zanzibar unified authorization across Drive / Calendar / Cloud / Photos / YouTube with relation tuples, storing trillions of ACLs and serving millions of checks/sec, holding p95 < 10ms and >99.999% availability for 3 years (Zanzibar paper, USENIX ATC 2019). OpenFGA (open-sourced by Auth0/Okta, donated to CNCF) packages this ReBAC + some ABAC capability into a ready-to-use engine (openfga.dev).

② Hierarchical inheritance & check performance — deep traversal vs materialized reverse index

Trade-off in one line: real-time recursive traversal is cheap to write but expensive to read (long latency tail on deep hierarchies); materialized expansion is fast to read but amplifies writes and adds invalidation-consistency problems.

[Principle] With Org→Team→Project→Doc four levels, "can Alice read this doc?" may have to climb from the doc up to the Org and expand the whole group tree — one check triggers dozens of tuple reads, and depth and fan-out determine the tail latency. Two optimization routes:

Consistency pitfall: permission revocation must take effect promptly, or "someone just removed from a project can still read for 5 minutes" is a security incident. Zanzibar uses a Zookie (a token carrying a timestamp) to guarantee "a permission read is no older than a given write", binding external consistency to cache freshness — the key thing that sets it apart from an ordinary cache.

# pseudo-code — fast revocation: version + CDC invalidation, not pure TTL
def revoke(group, member):
    db.delete(tuple(group, "member", member))      # 1. persist
    bump_version(group)                            # 2. bump that object's perm version
    cdc.publish(InvalidateEvent(group))           # 3. binlog→Kafka→cache nodes drop key
# read side: cache entries carry version; check validates version not stale, else origin
Real case: Airbnb Himeji is built on Zanzibar ideas: a sharded + replicated cache layer filters in memory, missing through to Aurora; data mutations are caught by SpinalTap and invalidate the cache over Kafka. It stores tens of billions of relations, serves nearly a million entities/sec at ~12ms p99 (Airbnb Tech Blog, 2021). Notion instead computes inheritance via the block tree's upward parent pointer — each block recognizes a single parent source, avoiding multi-parent ambiguity (The data model behind Notion's flexibility).

③ Multi-tenant isolation — shared DB row-level vs schema-per-tenant vs DB-per-tenant

Trade-off in one line: a shared DB saves cost but "miss one WHERE and you leak across tenants"; a DB-per-tenant gives the strongest isolation but ops/migration cost explodes linearly with tenant count.

[Principle] Three isolation tiers map to three cost tiers:

ApproachIsolationCost/opsFailure mode
Shared DB + tenant_id row-levelLogical (weak)Lowest; one table set for allForget WHERE tenant_id → cross-tenant leak; noisy neighbor
Shared DB + schema-per-tenantMediumMedium; switch schema per connectionThousands of schemas slow metadata/migrations
DB-per-tenantStrong (physical)Highest; N databases to runLinear blow-up; small tenants waste resources

Mainstream SaaS defaults to shared DB + row-level (cost wins) and uses Postgres Row-Level Security (RLS) to turn "forgot WHERE" from a human discipline into a database-enforced guarantee: once the policy is set and the connection sets app.tenant_id, the DB automatically appends the tenant filter to every query — no leak even if the app forgets it. Large customers (with physical-isolation compliance) then get a dedicated DB, a hybrid tiered approach.

-- Postgres RLS: push tenant isolation down to the database
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON documents
  USING (tenant_id = current_setting('app.tenant_id')::uuid);
-- each connection, when entering a tenant context:
SET app.tenant_id = '...';   -- all queries afterward are auto-filtered, no manual WHERE
Pitfall: RLS is dangerous with connection-pool reuse — a connection must RESET the tenant context before returning to the pool, or the next tenant inherits a connection still carrying the previous tenant_id. Also note table owner / BYPASSRLS roles skip policies — don't run migrations as superuser.
Real case: Salesforce is the classic shared-DB multi-tenant (org_id threads through the data layer); Slack, Notion and others are primarily logical isolation with stronger isolation for enterprise customers. Postgres RLS is the default multi-tenant mechanism on platforms like Supabase (PostgreSQL docs · Row Security Policies).

④ Audit log — append-only and tamper-evidence

Trade-off in one line: audit must be written inside the business transaction so it isn't lost, yet must not slow/pollute the business DB; to be trustworthy it must resist tampering, but "fully immutable" conflicts with "operable".

[Principle] The fundamental difference from ordinary logging: an audit log is evidence and must be complete (no lost events), tamper-evident (unchangeable after the fact), traceable (who/what/when/from where).

# pseudo-code — hash-chained audit, tampering detectable
def append_audit(event):
    prev = store.last()                  # take previous record
    record = {
        "actor": event.actor, "action": event.action,
        "target": event.target, "ts": now(),
        "prev_hash": prev.hash,
    }
    record["hash"] = sha256(prev.hash + canonical_json(record))
    store.append(record)                 # append-only, never UPDATE/DELETE

def verify():                            # patrol: recompute each hash, alert on break
    for r in store.scan():
        assert r["hash"] == sha256(r["prev_hash"] + canonical_json(strip_hash(r)))
Real case: AWS CloudTrail offers log file integrity validation (hashing and signing each batch); GitHub / Stripe expose audit-log APIs to enterprise customers for compliance export. SOC2 / GDPR require "permission changes and data access be traceable" — hash chain + WORM is a common implementation.

Scaling & Optimization

Common Pitfalls + Interview Follow-ups

Deeper Resources

Deeper Thinking

1. What problem does Zanzibar's Zookie solve that a plain TTL cannot? Why does authorization especially need it?

A normal cache's staleness means "reading an old value"; in authorization, staleness is a security incident — a just-revoked user reading old permissions = escalation. But the subtler issue is causal consistency: a user first sets a doc to private (writes the ACL), then shares the link; when the recipient opens it, they must read the "now private" state, or they read a private doc as public. Pure TTL can't guarantee "this read is no older than that write".

A Zookie is a token carrying a version/timestamp: writing an ACL returns a Zookie, subsequent checks carry it, and the system guarantees the snapshot used for the decision is no older than the moment the Zookie names. It turns freshness from a fuzzy TTL into a provable causal lower bound — binding cache consistency to external consistency. The cost is maintaining a global version (Zanzibar leans on Spanner's TrueTime), hard for ordinary systems to replicate, so most implementations approximate it with "version + active CDC invalidation".

2. A celebrity has 2M followers and "all followers can see my public posts." How do you store this with ReBAC tuples? What happens if you expand it directly?

Never expand into 2M post#viewer@user:X tuples — posting once inserts 2M rows, changing visibility deletes 2M rows. This is the classic fanout write amplification (see Day 14).

The right move is an indirect userset reference: store one tuple post#viewer@user:celeb#follower, i.e. "viewer = celeb's follower set", then expand the follower relation at check time. But this shifts the cost to reads: each check must traverse the follower set to test membership — too slow for a 2M-member set.

This is exactly where Zanzibar's Leopard index earns its keep: precompute a flattened membership bitmap for deep/wide groups, turning "is X in celeb's followers" from graph traversal into an index lookup. The trade-off returns to point ②: maintain an extra materialized index on write for O(1) reads. At heart this is fanout-on-write vs fanout-on-read replayed in the authorization context.

3. A list page must show "the first 20 of the 1000 projects the user can access, paginated." What happens if you check each one? What's the right architecture?

Checking each one (fetch all projects, then filter) = N+1 authorization + fetching lots of unauthorized data only to discard it, and pagination breaks (fewer than 20 after filtering). This question separates juniors from seniors.

Two correct answers: ① Reverse index / materialization — maintain a user → accessible_objects list and query directly WHERE id IN (accessible set) LIMIT 20, pushing authorization into the data query, done in one shot with correct pagination. ② Authorization-aware query — compile the permission condition into SQL (RLS or a join on the permission table) so the database filters during the scan.

Zanzibar provides Expand/reverse APIs precisely for this. The key insight: "check a single object" and "list accessible objects" are two different problems; the latter can't be solved by looping the former — one is a point query, the other needs a reverse index. Many authz performance incidents come from looping check to build lists.

4. You run shared DB + RLS for multi-tenancy, working fine. A financial customer demands "my data must never sit in the same database as others." How do you evolve without a rewrite?

This is the classic tiered tenancy problem. Don't migrate everyone to dedicated DBs for one customer — cost and ops explode. Instead: keep the shared DB as the default pool (the vast majority of small/mid tenants), while supporting routing a specific tenant to a dedicated DB/schema.

The key is a routing-layer abstraction: add a tenant → shard/datasource mapping in the data-access layer; app code resolves to the right connection by tenant_id. Pooled tenants use RLS isolation, dedicated tenants get a physical DB. Onboarding a new such customer is just a routing config change, no business-code change.

The hard part is cross-tier migration: moving a tenant from shared to dedicated requires online data migration (dual-write → backfill → verify → switch reads → switch writes, see Day 22's expand-contract). And unify schema evolution — the dedicated DB must not lag the shared DB's migration version. This question tests "make an exception for your most expensive customer, but don't let the exception pollute the main path".

5. Your audit log uses a hash chain for tamper-evidence. But ops must delete a user's data (GDPR right to erasure) — deleting breaks the chain. How do you resolve this conflict?

This is a direct clash of two compliance requirements: audit must be immutably retained, GDPR requires deletable personal data. Directly deleting an audit record breaks the hash chain and destroys non-repudiation.

Common solutions: ① Crypto-shredding — don't store personal data in cleartext; encrypt it with a per-user key; to "delete" the user, destroy the key, the ciphertext is permanently undecryptable but the record (and its hash) remains, chain intact. ② Separate PII — the audit chain stores only identifiers like actor_id / action / target_id and event structure; real PII goes in a deletable side table, the chain stores references; deleting PII doesn't touch the chain. ③ Tombstone — don't physically delete; append a "deleted" event, changing visibility rather than the data itself.

The core insight: "immutability" constrains the fact that an audit event happened; "deletability" constrains the personal-data payload within it — decouple the two layers and you satisfy both. This is why audit design must separate "event skeleton" from "PII payload" from day one.