Day 30 Hard Authorization RBAC/ABAC/ReBAC Multi-tenancy Audit Log

Authorization & Account Systems — Graph Traversal, Isolation, Non-repudiation Behind One CheckRBAC vs ABAC vs ReBAC · Multi-tenancy · Audit Log

Problem Scenario + Requirements

Design the authorization system for a B2B SaaS platform (think Notion / GitHub / Figma): the org structure is Org → Team → Project → Resource, four levels with cascading inheritance — an Org admin can touch everything, a Project member only their own project. It looks like a simple "can this user read this doc?", but it's the hottest path every API request must pass through: one mistake is a privilege-escalation leak, one slow check drags down the whole site.

Treat it as a real system and size it:

Tenants: 100K Orgs; the largest Org has 50K users and millions of resources.
QPS: one page render does tens to hundreds of checks (one per list row), so peak authorization-check QPS exceeds 500K — an order of magnitude above business QPS.
Latency SLO: check p99 < 10ms (the real numbers Zanzibar/Himeji hit), or it dominates total request latency.
Correctness: default deny; revocation must take effect fast (a stale cache must not let an ejected user keep reading).
Isolation: tenant data physically/logically isolated — one tenant must never read another's data.
Auditability: who changed whose permission, when, and what they accessed — traceable and tamper-evident (SOC2 / GDPR).

This issue covers four things: what model expresses permissions, how to compute hierarchical inheritance fast, how to isolate tenants, and how to make the audit log non-repudiable.

High-Level Architecture

graph TD
    APP["Service
API / frontend BFF"]
    PDP["Policy Decision Point
Check(user, perm, obj)?"]
    CACHE["Permission cache
sharded+replicated · in-mem filter"]
    STORE["Relation store
relation tuples · Postgres/Spanner"]
    CDC["CDC / invalidation
binlog → Kafka"]
    AUDIT["Audit log
append-only · hash chain"]
    APP -->|"1. check"| PDP
    PDP -->|"2. hit"| CACHE
    CACHE -->|"3. miss → origin"| STORE
    STORE -->|"4. mutation"| CDC
    CDC -->|"5. invalidate"| CACHE
    APP -->|"6. perm change / sensitive access"| AUDIT

The PDP (Policy Decision Point) is the core: business code never judges permissions itself — it asks the PDP "does this subject have this permission on this object?". The PDP centralizes both the decision logic (policy) and the decision data (who's in which group, who owns what) — authorization logic pushed down to the data layer rather than scattered across every microservice, the shared centralizing philosophy of both Airbnb Himeji and Google Zanzibar. The cache layer is sharded + replicated and filters relations in memory, hitting origin only on miss; the write path turns mutations into cache-invalidation events via CDC. The audit log is a side channel — independent, never on the check hot path.

Key Technical Points

① Authorization model: RBAC vs ABAC vs ReBAC — express permissions via "roles", "attributes", or "relationships"

Trade-off in one line: RBAC is simple but roles explode; ABAC is flexible but hard to audit and hard to answer "who can access X"; ReBAC (Zanzibar-style) naturally expresses hierarchy and sharing but you must build a graph-traversal engine.

[Principle] The three models answer the same question with different data structures:

RBAC: user → role → permission. "Alice is admin, admin can delete." A decision is a table lookup.
ABAC: a boolean expression over attributes. "if user.dept == resource.dept && time < 18:00 then allow." A decision is evaluating a policy (e.g. OPA/Rego, AWS IAM policy).
ReBAC: model permissions as relation tuples between objects (object#relation@subject); a decision is a reachability traversal over the relation graph. "doc:readme#viewer@user:alice", "doc:readme#viewer@group:eng#member" — a viewer can be a user or "all members of group eng", so hierarchy and sharing unfold naturally. This is Google Zanzibar's core abstraction.

Model	Strength	Sacrifice / pitfall	Fits
RBAC	Simple, auditable, everyone gets it	Role explosion; can't express "only this project"	Few permission dimensions, flat org
ABAC	Highly flexible, context-aware (time/IP/attrs)	Hard to reverse "who can access X"; hard to reason about; accidental allows	Compliance/context-sensitive, dynamic rules
ReBAC	Native hierarchy/inheritance/fine-grained sharing; reversible	Must build/adopt a graph engine; deep traversal risks latency	Multi-level, collaborative sharing (docs/code/projects)

[Pseudo-code] A ReBAC check is essentially a recursive graph traversal with "userset rewrite":

# pseudo-code — Zanzibar-style check: does obj's relation contain subject?
def check(obj, relation, subject) -> bool:
    for tuple in tuples(obj, relation):          # read (obj#relation@?)
        u = tuple.subject
        if u == subject:
            return True                          # direct hit
        if u.is_userset():                       # e.g. group:eng#member
            if check(u.obj, u.relation, subject):# recursively expand userset
                return True
    # userset rewrite: viewer inherits from editor, owner, or parent's viewer
    for rule in rewrite_rules(obj.type, relation):   # e.g. viewer = editor + parent.viewer
        if check_rewrite(obj, rule, subject):
            return True
    return False

The magic is in rewrite_rules: viewer = self | editor | parent→viewer lets permissions cascade down the hierarchy — no need to copy permissions onto every resource.

Real case: Google Zanzibar unified authorization across Drive / Calendar / Cloud / Photos / YouTube with relation tuples, storing trillions of ACLs and serving millions of checks/sec, holding p95 < 10ms and >99.999% availability for 3 years (Zanzibar paper, USENIX ATC 2019). OpenFGA (open-sourced by Auth0/Okta, donated to CNCF) packages this ReBAC + some ABAC capability into a ready-to-use engine (openfga.dev).

② Hierarchical inheritance & check performance — deep traversal vs materialized reverse index

Trade-off in one line: real-time recursive traversal is cheap to write but expensive to read (long latency tail on deep hierarchies); materialized expansion is fast to read but amplifies writes and adds invalidation-consistency problems.

[Principle] With Org→Team→Project→Doc four levels, "can Alice read this doc?" may have to climb from the doc up to the Org and expand the whole group tree — one check triggers dozens of tuple reads, and depth and fan-out determine the tail latency. Two optimization routes:

Real-time traversal + cache sub-results: cache and reuse each intermediate decision like check(group:eng, member, alice); merge concurrent origin reads with single-flight. Simple, cheap writes, but p99 jitter on cold cache or deep graphs.
Materialized reverse index: pre-expand and store "object → all users with access" or "user → all accessible objects", so a check degrades to one lookup. Zanzibar's Leopard index precomputes flattened membership sets for deep/wide nested groups. The cost is write amplification (one group change updates a huge number of materialized rows) and invalidation complexity.

Consistency pitfall: permission revocation must take effect promptly, or "someone just removed from a project can still read for 5 minutes" is a security incident. Zanzibar uses a Zookie (a token carrying a timestamp) to guarantee "a permission read is no older than a given write", binding external consistency to cache freshness — the key thing that sets it apart from an ordinary cache.

# pseudo-code — fast revocation: version + CDC invalidation, not pure TTL
def revoke(group, member):
    db.delete(tuple(group, "member", member))      # 1. persist
    bump_version(group)                            # 2. bump that object's perm version
    cdc.publish(InvalidateEvent(group))           # 3. binlog→Kafka→cache nodes drop key
# read side: cache entries carry version; check validates version not stale, else origin

Real case: Airbnb Himeji is built on Zanzibar ideas: a sharded + replicated cache layer filters in memory, missing through to Aurora; data mutations are caught by SpinalTap and invalidate the cache over Kafka. It stores tens of billions of relations, serves nearly a million entities/sec at ~12ms p99 (Airbnb Tech Blog, 2021). Notion instead computes inheritance via the block tree's upward parent pointer — each block recognizes a single parent source, avoiding multi-parent ambiguity (The data model behind Notion's flexibility).

③ Multi-tenant isolation — shared DB row-level vs schema-per-tenant vs DB-per-tenant

Trade-off in one line: a shared DB saves cost but "miss one WHERE and you leak across tenants"; a DB-per-tenant gives the strongest isolation but ops/migration cost explodes linearly with tenant count.

[Principle] Three isolation tiers map to three cost tiers:

Approach	Isolation	Cost/ops	Failure mode
Shared DB + `tenant_id` row-level	Logical (weak)	Lowest; one table set for all	Forget `WHERE tenant_id` → cross-tenant leak; noisy neighbor
Shared DB + schema-per-tenant	Medium	Medium; switch schema per connection	Thousands of schemas slow metadata/migrations
DB-per-tenant	Strong (physical)	Highest; N databases to run	Linear blow-up; small tenants waste resources

Mainstream SaaS defaults to shared DB + row-level (cost wins) and uses Postgres Row-Level Security (RLS) to turn "forgot WHERE" from a human discipline into a database-enforced guarantee: once the policy is set and the connection sets app.tenant_id, the DB automatically appends the tenant filter to every query — no leak even if the app forgets it. Large customers (with physical-isolation compliance) then get a dedicated DB, a hybrid tiered approach.

-- Postgres RLS: push tenant isolation down to the database
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON documents
  USING (tenant_id = current_setting('app.tenant_id')::uuid);
-- each connection, when entering a tenant context:
SET app.tenant_id = '...';   -- all queries afterward are auto-filtered, no manual WHERE

Pitfall: RLS is dangerous with connection-pool reuse — a connection must RESET the tenant context before returning to the pool, or the next tenant inherits a connection still carrying the previous tenant_id. Also note table owner / BYPASSRLS roles skip policies — don't run migrations as superuser.

Real case: Salesforce is the classic shared-DB multi-tenant (org_id threads through the data layer); Slack, Notion and others are primarily logical isolation with stronger isolation for enterprise customers. Postgres RLS is the default multi-tenant mechanism on platforms like Supabase (PostgreSQL docs · Row Security Policies).

④ Audit log — append-only and tamper-evidence

Trade-off in one line: audit must be written inside the business transaction so it isn't lost, yet must not slow/pollute the business DB; to be trustworthy it must resist tampering, but "fully immutable" conflicts with "operable".

[Principle] The fundamental difference from ordinary logging: an audit log is evidence and must be complete (no lost events), tamper-evident (unchangeable after the fact), traceable (who/what/when/from where).

No loss: write critical events like permission changes via the Outbox pattern (see Day 7) — the audit record and the business mutation in the same transaction, then CDC ships it to audit storage, avoiding "business succeeded but audit lost".
Tamper-evident: append-only + a hash chain — each record stores hash(prev_hash + payload); alter any middle record and every subsequent hash breaks, exposing the tampering. Periodically anchor the chain head to an external store (WORM / notarization) to even defend against deleting whole segments.
No business impact: audit goes to independent storage (append-optimized columnar/object store), separate from OLTP; queries are low-frequency analytics, off the business hot path.

# pseudo-code — hash-chained audit, tampering detectable
def append_audit(event):
    prev = store.last()                  # take previous record
    record = {
        "actor": event.actor, "action": event.action,
        "target": event.target, "ts": now(),
        "prev_hash": prev.hash,
    }
    record["hash"] = sha256(prev.hash + canonical_json(record))
    store.append(record)                 # append-only, never UPDATE/DELETE

def verify():                            # patrol: recompute each hash, alert on break
    for r in store.scan():
        assert r["hash"] == sha256(r["prev_hash"] + canonical_json(strip_hash(r)))

Real case: AWS CloudTrail offers log file integrity validation (hashing and signing each batch); GitHub / Stripe expose audit-log APIs to enterprise customers for compliance export. SOC2 / GDPR require "permission changes and data access be traceable" — hash chain + WORM is a common implementation.

Scaling & Optimization

From embedded to centralized PDP: early on, permissions live in each service as if user.role==...; at scale, extract a centralized authorization service (Zanzibar/Himeji/OpenFGA) for unified model, audit, and caching.
Read/write split + reverse index: for "which projects can I see", use a materialized user→objects reverse index instead of checking each object one by one (N+1 authorization).
Policy as code: model ABAC with OPA/Rego to version, test, and canary policies, avoiding hand-editing rules in production.
Cache warming & degradation: authorization is the hottest path; warm cold starts from historical hot keys; when the PDP is unavailable, fail-closed (default deny) not fail-open — availability yields to security.

Common Pitfalls + Interview Follow-ups

Fail-open vs fail-closed: what if the authz service dies? Fail-closed (deny) is almost always right — better to reject legitimate requests than allow escalation. But weigh it: denying all on a core read path takes the whole site down, so limited degradation may be OK for low-sensitivity operations.
N+1 authorization: 100 list rows = 100 checks, p99 amplified. Use BatchCheck or a reverse index to get the full set at once. "How do you do list-page permission filtering" is a classic.
Revocation lag: pure TTL caches delay revocation until expiry. You must invalidate actively via CDC/version — "how fast does ejection take effect" is a classic follow-up.
RLS connection-pool context leak: reusing a connection without RESET of tenant_id → cross-tenant read. Don't forget BYPASSRLS roles and migrations skipping policies.
Using the business DB as audit source: if a business table can UPDATE, it isn't trustworthy audit. Audit must be append-only + tamper-evident + independent storage.
Baking permissions into a JWT, then unable to revoke: with many permission claims inside the JWT, a permission change leaves the old token valid (see Day 24). Authorization decisions should hit the PDP live; the JWT carries identity only.

Deeper Resources

Zanzibar: Google's Consistent, Global Authorization System — USENIX ATC 2019, the foundational ReBAC paper (relation tuples, userset rewrite, Zookie, Leopard index).
Himeji: a scalable centralized system for authorization at Airbnb — an industrial Zanzibar implementation (sharded cache, CDC invalidation, Aurora storage).
OpenFGA · Authorization Concepts — RBAC / ABAC / ReBAC / Zanzibar concept comparison and a hands-on implementation (Auth0/Okta, CNCF).
The data model behind Notion's flexibility — permission inheritance via block tree + parent pointer.
PostgreSQL · Row Security Policies — native database implementation of row-level multi-tenant isolation.
Designing Data-Intensive Applications (Kleppmann) — the consistency & causality chapters, to understand the external consistency behind Zookie.

Deeper Thinking

1. What problem does Zanzibar's Zookie solve that a plain TTL cannot? Why does authorization especially need it?

A normal cache's staleness means "reading an old value"; in authorization, staleness is a security incident — a just-revoked user reading old permissions = escalation. But the subtler issue is causal consistency: a user first sets a doc to private (writes the ACL), then shares the link; when the recipient opens it, they must read the "now private" state, or they read a private doc as public. Pure TTL can't guarantee "this read is no older than that write".

A Zookie is a token carrying a version/timestamp: writing an ACL returns a Zookie, subsequent checks carry it, and the system guarantees the snapshot used for the decision is no older than the moment the Zookie names. It turns freshness from a fuzzy TTL into a provable causal lower bound — binding cache consistency to external consistency. The cost is maintaining a global version (Zanzibar leans on Spanner's TrueTime), hard for ordinary systems to replicate, so most implementations approximate it with "version + active CDC invalidation".

2. A celebrity has 2M followers and "all followers can see my public posts." How do you store this with ReBAC tuples? What happens if you expand it directly?

Never expand into 2M post#viewer@user:X tuples — posting once inserts 2M rows, changing visibility deletes 2M rows. This is the classic fanout write amplification (see Day 14).

The right move is an indirect userset reference: store one tuple post#viewer@user:celeb#follower, i.e. "viewer = celeb's follower set", then expand the follower relation at check time. But this shifts the cost to reads: each check must traverse the follower set to test membership — too slow for a 2M-member set.

This is exactly where Zanzibar's Leopard index earns its keep: precompute a flattened membership bitmap for deep/wide groups, turning "is X in celeb's followers" from graph traversal into an index lookup. The trade-off returns to point ②: maintain an extra materialized index on write for O(1) reads. At heart this is fanout-on-write vs fanout-on-read replayed in the authorization context.

3. A list page must show "the first 20 of the 1000 projects the user can access, paginated." What happens if you check each one? What's the right architecture?

Checking each one (fetch all projects, then filter) = N+1 authorization + fetching lots of unauthorized data only to discard it, and pagination breaks (fewer than 20 after filtering). This question separates juniors from seniors.

Two correct answers: ① Reverse index / materialization — maintain a user → accessible_objects list and query directly WHERE id IN (accessible set) LIMIT 20, pushing authorization into the data query, done in one shot with correct pagination. ② Authorization-aware query — compile the permission condition into SQL (RLS or a join on the permission table) so the database filters during the scan.

Zanzibar provides Expand/reverse APIs precisely for this. The key insight: "check a single object" and "list accessible objects" are two different problems; the latter can't be solved by looping the former — one is a point query, the other needs a reverse index. Many authz performance incidents come from looping check to build lists.

4. You run shared DB + RLS for multi-tenancy, working fine. A financial customer demands "my data must never sit in the same database as others." How do you evolve without a rewrite?

This is the classic tiered tenancy problem. Don't migrate everyone to dedicated DBs for one customer — cost and ops explode. Instead: keep the shared DB as the default pool (the vast majority of small/mid tenants), while supporting routing a specific tenant to a dedicated DB/schema.

The key is a routing-layer abstraction: add a tenant → shard/datasource mapping in the data-access layer; app code resolves to the right connection by tenant_id. Pooled tenants use RLS isolation, dedicated tenants get a physical DB. Onboarding a new such customer is just a routing config change, no business-code change.

The hard part is cross-tier migration: moving a tenant from shared to dedicated requires online data migration (dual-write → backfill → verify → switch reads → switch writes, see Day 22's expand-contract). And unify schema evolution — the dedicated DB must not lag the shared DB's migration version. This question tests "make an exception for your most expensive customer, but don't let the exception pollute the main path".

5. Your audit log uses a hash chain for tamper-evidence. But ops must delete a user's data (GDPR right to erasure) — deleting breaks the chain. How do you resolve this conflict?

This is a direct clash of two compliance requirements: audit must be immutably retained, GDPR requires deletable personal data. Directly deleting an audit record breaks the hash chain and destroys non-repudiation.

Common solutions: ① Crypto-shredding — don't store personal data in cleartext; encrypt it with a per-user key; to "delete" the user, destroy the key, the ciphertext is permanently undecryptable but the record (and its hash) remains, chain intact. ② Separate PII — the audit chain stores only identifiers like actor_id / action / target_id and event structure; real PII goes in a deletable side table, the chain stores references; deleting PII doesn't touch the chain. ③ Tombstone — don't physically delete; append a "deleted" event, changing visibility rather than the data itself.

The core insight: "immutability" constrains the fact that an audit event happened; "deletability" constrains the personal-data payload within it — decouple the two layers and you satisfy both. This is why audit design must separate "event skeleton" from "PII payload" from day one.

← Back to index