Day 7 Hard Distributed Systems Transactions Saga / Outbox Microservices

Distributed Transactions — Don't. And if you must, keep them tiny.2PC / 3PC · Saga (Orchestration vs Choreography) · Transactional Outbox · Idempotency · Why Microservices Avoid Distributed Transactions

Scenario & Constraints

Design a cross-border e-commerce checkout flow. A single checkout must complete 5 things at once:

Order service writes the order (own Postgres)
Inventory service decrements stock (own Postgres)
Payment service calls Stripe to charge (external API, not rollbackable)
Points service deducts loyalty points (own DynamoDB)
Notification service sends email + push (external SES / FCM)

Scale: 5k peak TPS checkout, P99 ≤ 800ms end-to-end, 0 oversells, 0 duplicate charges. The 5 services are owned by 5 teams with no shared database. Every intermediate step may fail: service crash, network partition, Stripe 504, message loss.

Question: how do these 5 steps guarantee "all succeed or none happen (or compensate observably)"? This is the distributed transactions problem. Today: why textbook 2PC/3PC is almost never used in production, how Saga actually lands, how Outbox solves dual-write, and why "the best distributed transaction is none at all."

High-level Architecture

graph TD
    Client["Client"]
    Orch["Saga Orchestrator
Temporal / Cadence
persists every step"]

    subgraph S1["Order Domain"]
        OrdSvc["Order Service"]
        OrdDB[("Postgres
orders + outbox")]
    end
    subgraph S2["Inventory Domain"]
        InvSvc["Inventory Service"]
        InvDB[("Postgres
reservations + outbox")]
    end
    subgraph S3["Payment Domain"]
        PaySvc["Payment Service"]
        Stripe["Stripe API
external, not rollbackable"]
    end
    subgraph S4["Points Domain"]
        PtsSvc["Points Service"]
        PtsDB[("DynamoDB
+ idempotency table")]
    end
    subgraph S5["Notification Domain"]
        Notify["Notify Service"]
    end

    CDC["Debezium CDC
reads outbox table"]
    Kafka["Kafka
order.events"]

    Client --> Orch
    Orch -->|1 createOrder| OrdSvc
    Orch -->|2 reserveStock| InvSvc
    Orch -->|3 charge| PaySvc
    Orch -->|4 deductPoints| PtsSvc
    Orch -.->|5 async notify| Kafka
    OrdSvc --> OrdDB
    InvSvc --> InvDB
    PaySvc --> Stripe
    PtsSvc --> PtsDB

    OrdDB -.-> CDC
    InvDB -.-> CDC
    CDC --> Kafka
    Kafka --> Notify

    classDef orch fill:#2a1530,stroke:#ff7ab6,color:#e8eef5
    classDef ext fill:#3a2010,stroke:#ffb450,color:#e8eef5
    class Orch orch
    class Stripe ext

Core idea: a Saga (Temporal/Cadence) chains the 5 steps; each is a local transaction with a reverse compensation on failure. Event fan-out goes through Outbox + CDC (DB commit ⇒ Kafka emit is guaranteed). Weak-transaction work like notifications goes eventual. The whole system has zero 2PC.

Key Techniques

1. 2PC / 3PC: textbook answer, production liability

Mechanism: Two-Phase Commit uses a coordinator to orchestrate multiple participants:

Phase 1 (prepare): coordinator asks all participants "can you commit?"; each writes a prepare log + locks resources, replies yes/no.
Phase 2 (commit/abort): all yes → commit all; any no → abort all.

Correctness gives ACID atomicity across nodes. But:

2PC's three fatal flaws

Blocking: after prepare, participants must wait for the coordinator's decision. If the coordinator crashes between phase 1 and phase 2, all participants hold locks indefinitely; affected rows/resources are inaccessible to any other transaction. 3PC adds a pre-commit phase to mitigate, but requires more round trips and assumes synchronous networks (unrealistic), so almost no one runs 3PC in production.
Performance: 2 RTTs × N participants + multiple disk fsyncs. Cross-service latency jumps from 5ms to 50-200ms; throughput halves.
Heterogeneous systems: Stripe doesn't expose prepare/commit; sent emails can't roll back. Most real-world side effects are inherently non-rollbackable.

Who still uses 2PC: legacy enterprise XA (Oracle + MQ + another Oracle, one DBA team); distributed-DB internals (Spanner uses 2PC for cross-shard, but each participant is itself a Paxos group, so no single-point block). Across microservices or external APIs → never 2PC.

# 2PC disaster — why Pat Helland called it a "Maginot Line"
# T = 0    coord -> all: PREPARE
# T = 10ms all -> coord: YES (each locks resources, writes prepare log)
# T = 11ms coord crashes ⚠
# T = ∞    participants hold locks waiting for coord to recover
#          other transactions blocked, business halts
#
# Even when coord restarts, it must replay its transaction log
# before deciding; participants stay locked. This is the worst
# failure mode in distributed systems: "one process death stalls
# the entire system."

Real cases:

Pat Helland (Microsoft / Salesforce / Amazon) — "Life Beyond Distributed Transactions" (ACM Queue 2017, originally CIDR 2007): the industry's seminal argument that large-scale systems cannot rely on distributed transactions, launching the saga / messaging / entity school of thought.
Google Spanner: uses 2PC across shards, but coordinator and participants are Paxos groups (highly available) — single-node failure doesn't block. A rare "2PC done right."
MySQL XA: supports XA, but the official docs warn about prepare-after-commit edge cases between binlog and InnoDB; large internet companies have largely abandoned it.
Java JTA / Atomikos: J2EE-era pattern for DB + JMS transactions; cloud-native era has replaced it with Saga + Outbox.

2. Saga: break one big transaction into N local transactions + N-1 compensations

Origin: Garcia-Molina & Salem introduced Saga at SIGMOD 1987, originally for "long-running transactions inside a single DB" (to avoid long-held locks). Microservices adopted it: split a cross-service transaction into a chain of local transactions T1…Tn; if any Ti fails, execute compensations C(i-1)…C1 in reverse order for completed steps.

You lose ACID isolation but gain eventual consistency + business-level reversibility. Two landing patterns:

Pattern	Coordinator	Pros	Cons	Use when
Orchestration	Central orchestrator (Temporal / Cadence / Step Functions) explicitly calls each step	Logic is centralized and readable; clear compensation path; easy to debug/visualize	Orchestrator is a new component to operate; could become a bottleneck	Many steps (≥4), complex compensations, audit needs
Choreography	No center; services subscribe to events and act	Fully decoupled; no single point	Flow is scattered across services; one change affects N services; hard to trace	Few steps (≤3), naturally event-driven within a bounded context

sequenceDiagram
    participant C as Client
    participant O as Orchestrator (Temporal)
    participant Ord as Order Svc
    participant Inv as Inventory Svc
    participant Pay as Payment (Stripe)
    participant Pts as Points Svc

    C->>O: placeOrder()
    O->>Ord: 1 createOrder (PENDING)
    Ord-->>O: orderId=42
    O->>Inv: 2 reserveStock(sku, qty)
    Inv-->>O: reserved
    O->>Pay: 3 charge($100, idempKey=42)
    Pay-->>O: paid (charge_xyz)
    O->>Pts: 4 deductPoints(user, 50, idempKey=42)
    Pts--xO: fail! insufficient points
    Note over O: Saga reverse compensations
    O->>Pay: C3 refund(charge_xyz, idempKey=42-refund)
    O->>Inv: C2 releaseStock(sku, qty, idempKey=42-rel)
    O->>Ord: C1 markFailed(42, reason="points")
    O-->>C: failed: insufficient points, fully rolled back

# Temporal Workflow-style pseudocode (Python SDK)
@workflow.defn
class PlaceOrderSaga:
    @workflow.run
    async def run(self, req: OrderReq) -> Result:
        compensations = []          # stack of comp actions for completed steps
        try:
            order = await activity.execute(create_order, req)
            compensations.append(lambda: mark_failed(order.id))

            await activity.execute(reserve_stock, req.sku, req.qty)
            compensations.append(lambda: release_stock(req.sku, req.qty))

            charge = await activity.execute(
                stripe_charge,
                amount=req.amount,
                idempotency_key=f"order-{order.id}"   # key: idempotency
            )
            compensations.append(
                lambda: stripe_refund(charge.id, key=f"refund-{order.id}")
            )

            await activity.execute(deduct_points, req.user, req.points,
                                   key=f"pts-{order.id}")
            return Result.ok(order.id)

        except ActivityError as e:
            # Run compensations in reverse. Temporal guarantees at-least-once + retry.
            for comp in reversed(compensations):
                await activity.execute(comp, retry_policy=infinite_backoff)
            return Result.fail(reason=str(e))

# Key properties:
#  - Temporal persists workflow state to an event log; on crash it replays and resumes.
#  - Every activity must be idempotent (because it may retry).
#  - Compensations must be idempotent AND not allowed to fail forever
#    (a failed compensation needs human intervention).

Compensation is not rollback. Rollback pretends nothing happened (DB undo log). Compensation is "do the reverse operation to reach an equivalent outcome." A charge can be refunded, but a sent email is sent — compensation is "send a 'sorry, your order failed' follow-up." This forces the business to accept "visible intermediate states" and is Saga's biggest semantic gap from ACID.

Real cases:

Uber Cadence: Cadence ships a built-in Saga primitive; 1000+ services inside Uber use it, with Cadence 1.0 announced in 2024. Order fulfillment and driver dispatch run as saga workflows.
DoorDash: uses Cadence as a fallback for event-driven processing (2024) — primary path is Kafka events, with Cadence saga as the recovery path.
Airbnb / Netflix / Coinbase / HashiCorp: use Temporal (the OSS evolution of Cadence) for payments, onboarding, content moderation, and other multi-step flows.
AWS Step Functions: managed orchestration with native saga support (see AWS Compute Blog "Implementing the saga pattern" 2021).
Chris Richardson — microservices.io/patterns/data/saga.html: the industry-grade authoritative doc on saga, including orchestration vs choreography and examples.

3. Outbox Pattern: kill "dual write," demote distributed transactions to local ones

Problem: a service must "write DB" + "publish Kafka event." The two are not in one transaction, so there is always an inconsistency window:

DB succeeds, Kafka fails → downstream never receives → data loss
Kafka succeeds, DB fails (power loss) → downstream receives an event for an order that doesn't exist → ghost orders

This is the dual write problem, the most common transactional pitfall in microservices.

Outbox solution:

The business table and the outbox table are written in the same local DB transaction. Local = ACID; both succeed or both fail.
A separate relay process (Debezium-style CDC reading the WAL/binlog) asynchronously ships new outbox rows to Kafka.
The relay tracks its own offset, delivers at-least-once; consumers dedupe on outbox row ID.

graph LR
    App["Order Service"]
    DB[("Postgres
orders + outbox
same local txn")]
    Debez["Debezium
reads WAL"]
    K["Kafka
order.events"]
    Cons1["Inventory Consumer"]
    Cons2["Analytics Consumer"]
    Cons3["Notification Consumer"]

    App -- "BEGIN
INSERT orders
INSERT outbox
COMMIT" --> DB
    DB -. WAL .-> Debez
    Debez --> K
    K --> Cons1
    K --> Cons2
    K --> Cons3

# Outbox in practice — Postgres + Debezium
CREATE TABLE orders (id BIGSERIAL PK, user_id ..., amount ..., status ...);
CREATE TABLE outbox (
    id BIGSERIAL PK,
    aggregate_id BIGINT,        -- order id (used as partition key)
    event_type TEXT,            -- "order.created" / "order.cancelled"
    payload JSONB,
    created_at TIMESTAMPTZ DEFAULT now()
);

-- Business code: write orders and outbox event in the same txn
BEGIN;
  INSERT INTO orders (...) VALUES (...) RETURNING id;
  INSERT INTO outbox (aggregate_id, event_type, payload)
       VALUES (42, 'order.created', '{"user":7,"amount":100}');
COMMIT;        -- atomic: both succeed or neither does

-- Debezium attaches to the Postgres replication slot,
-- tails the WAL → watches outbox INSERTs → transforms → pushes Kafka.
-- The business code doesn't need to know Kafka exists — fully decoupled.

# Why not CDC on the business table directly?
#  - Business schema changes often; downstreams become tightly coupled.
#  - A single row change may correspond to 0 or many domain events
#    (an order INSERT may emit "order.created" + "loyalty.points.eligible").
#    Outbox lets business code control event granularity.

# Why not just call Kafka producer in the same function?
#  - Producer.send() is not in the DB transaction — dual write problem.
#  - Even with Kafka transactions, you cannot span Postgres + Kafka.

Outbox trade-offs:

✅ Demotes "distributed (DB + MQ) transaction" to "local (DB only) transaction" — a major engineering win.
✅ Business code is non-invasive; the relay is independently scalable.
⚠ Adds ~second-level latency (CDC is async). Strict-real-time needs synchronous outbox with embedded relay at the cost of throughput.
⚠ The outbox table grows; you must archive/delete after X days.
⚠ At-least-once delivery ⇒ consumers must be idempotent (combine with point 4 below).

Real cases:

Debezium blog — "Reliable Microservices Data Exchange With the Outbox Pattern" (2019): the industry-grade reference for outbox + CDC, with a Quarkus extension.
Shopify: their order system heavily uses outbox + Kafka to push cross-service consistency down into CDC.
LinkedIn Brooklin / Databus: early CDC pioneer; Brooklin pipes outbox-style streams to Kafka / Espresso as the company-wide event bus.
Eventuate Tram (Chris Richardson): an out-of-the-box Java framework for outbox + saga.

4. Idempotency: the last line of defense for distributed transactions

Mechanism: in distributed systems every RPC, message, and callback is at-least-once — a network timeout can't distinguish "request lost" from "response lost," so retries are inevitable. Idempotency guarantees "executing the same operation N times has the same effect as executing it once" — it is the safety net for saga retries, outbox redelivery, and compensations.

Strategy	Mechanism	Best for
Naturally idempotent ops	SET x=5, DELETE id=42 — same result if repeated	Express as set when possible
Idempotency key + dedup table	Client generates a UUID; server stores (key → result); retries return cached result	Payments, orders, external callbacks (Stripe is the canonical example)
Optimistic locking / version	`UPDATE … WHERE version=X`, discard conflicts	Concurrent updates on the same resource
Consumer-side dedup	When consuming Kafka, write event_id to dedup table; skip on duplicate	Outbox / event sourcing consumers

# Stripe-style idempotency key, server-side (Postgres)
# Reference: https://stripe.com/blog/idempotency

CREATE TABLE idempotency (
    key TEXT PRIMARY KEY,
    request_hash TEXT NOT NULL,     -- detect key reuse with different bodies
    response_code INT,
    response_body JSONB,
    status TEXT,                    -- 'in_progress' | 'completed'
    locked_at TIMESTAMPTZ,
    created_at TIMESTAMPTZ DEFAULT now()
);

def handle(key, req):
    h = hash(req.body)
    row = db.fetch_for_update("SELECT * FROM idempotency WHERE key=%s", key)

    if row and row.request_hash != h:
        raise Conflict("idempotency key reused with different body")

    if row and row.status == 'completed':
        return row.response_body                # return cached response

    if row and row.status == 'in_progress':
        if now() - row.locked_at > LEASE:
            pass                                # lease expired, retry ok
        else:
            raise Conflict("request in flight")

    # claim
    db.upsert("idempotency", key=key, request_hash=h,
              status='in_progress', locked_at=now())

    try:
        resp = do_business(req)
        db.update("idempotency", key=key,
                  status='completed', response_code=200,
                  response_body=resp)
        return resp
    except RetryableError:
        # don't update; let client retry
        raise
    except FatalError as e:
        db.update("idempotency", key=key,
                  status='completed', response_code=500,
                  response_body={"error": str(e)})
        raise

# Expiry policy: Stripe keeps keys for 24h, then prunes. Clients should
# not assume keys live forever.

Real cases:

Stripe: "Designing robust and predictable APIs with idempotency" (2017) + Brandur Leach's "Implementing Stripe-like idempotency keys in Postgres", the most detailed public engineering writeup including lease, expiry, and integration with retry backoff.
AWS SQS FIFO + content-based deduplication: 5-minute dedup window; beyond that you must dedupe yourself. Classic "infra handles 90%, you handle the last 10%."
Kafka transactions + idempotent producer: producer is idempotent per partition; cross-partition transactions (read-process-write) supported; consumers still need their own dedup.
PayPal / Square / Adyen: every charge / refund API requires an idempotency key, otherwise rejected. This is the payments industry's "default contract."

Extension & Optimization

Saga + workflow engine is the de facto standard: for cross-service flows of more than 3 steps, do not hand-roll state machines. Adopt Temporal / Cadence / AWS Step Functions / Azure Durable Functions — you get persistence, retry, visualization, and versioning for free.
Outbox alternatives: CockroachDB / TiDB changefeeds stream directly to Kafka, skipping Debezium; MongoDB / DynamoDB change streams do the same. For new projects, prefer databases with built-in CDC.
Semantic Locking: during a saga, data may be visible to other transactions (isolation violated). Chris Richardson proposes semantic locks — mark data with a business-level 'pending' flag; other transactions see pending and decide what to do (reject / wait / force).
Inbox pattern (outbox dual): the consumer keeps an inbox table of processed event IDs; "write business table + write inbox" is one local transaction. Consumer dedup becomes a local-transaction problem.
Transaction ID / trace propagation: propagate saga_id / step_id across services; combined with OpenTelemetry it gives full traceability — when a saga is stuck you locate the step in seconds.
The fundamental weapon against distributed transactions: domain boundaries. If 90% of flows cross the same two services, they probably should be one service. Wrong microservice boundaries are what force you into distributed transactions in the first place.

Common Pitfalls & Interview Questions

1. Treating Saga as ACID — forgetting isolation is gone. During a saga, an order may sit in a "created-but-not-paid" intermediate state visible to readers, who may make decisions based on incorrect state. Either use semantic locks to mark pending, or have the business accept "visible intermediate states" with UX mitigations.

2. Non-idempotent / failure-prone compensations. A failed compensation is worse than a failed forward step — compensation exists to save you. Compensations must be idempotent, and failures must escalate to DLQ / human action — never an infinite retry loop.

3. Treating "send Kafka + write DB" as atomic. The most common junior trap. The interviewer expects you to immediately name "dual write problem" and "Outbox."

4. Idempotency key as timestamp / auto-increment. Client retries must use the same key. If the client generates a new timestamp on each retry, dedup is broken — the key must be fixed at request generation time and reused on retries.

5. Using 2PC across external APIs (Stripe / email / WeChat Pay). External APIs don't support XA. Such scenarios require the trio: saga + compensation + idempotency.

Interview follow-ups:

Walk through 2PC blocking: coordinator dies after phase-1 ACK — what do participants do? How does 3PC improve and why is it still rare?
How does Saga differ from ACID in isolation? Give a bug caused by Saga's visible intermediate state.
Outbox vs Kafka transactions — why is Outbox preferred in production?
Design "cross-service transfer": service A debits 100, service B credits 100. Give the saga + compensation flow and failure-mode analysis.
What's the expiry policy for idempotency keys? What happens if a client reuses the same key after expiry? How do you avoid double charges?
"Should microservices avoid distributed transactions?" — how do you answer? Give an example of "converting a distributed transaction into a domain redesign."

Deep Resources

"Designing Data-Intensive Applications," Ch 7 §Transactions + Ch 9 §Atomic Commit and Two-Phase Commit (Kleppmann): the most complete industry-grade treatment of 2PC and the limits of distributed transactions.
Pat Helland — "Life Beyond Distributed Transactions: An Apostate's Opinion" (ACM Queue 2017 / CIDR 2007): the consensus-shaping paper. Established the saga / messaging / entity school.
Garcia-Molina & Salem — "Sagas" (SIGMOD 1987): the original saga paper, 14 pages. Reading it gives you a deeper grasp than any modern framework's docs.
Chris Richardson — microservices.io: Saga Pattern + "Microservices Patterns" (Manning): includes orchestration / choreography comparison, semantic lock, saga versioning, and operational lessons.
Debezium — "Reliable Microservices Data Exchange With the Outbox Pattern" (2019): the canonical outbox + CDC reference.
Stripe — "Designing robust and predictable APIs with idempotency" (2017) + Brandur Leach's idempotency keys in Postgres: best engineering references.
Uber Cadence — "Announcing Cadence 1.0" (2024); DoorDash — "Building Reliable Workflows with Cadence" (2024): saga orchestration at hyperscale.
Temporal — "The Definitive Guide to Durable Execution": theory and practice for modern saga + durable workflow platforms.

Deep Reflection

1. Cross-service transfer: A debits 100, B credits 100. Design saga + failure-mode analysis. If B credits succeed but the saga orchestrator itself crashes, what happens?

The classic saga interview question. The core issue is that the orchestrator's own failure must be recoverable — otherwise you've just moved the single point from 2PC's coordinator to your saga orchestrator, without solving anything.

Forward flow:

T1: A.debit(100, idempKey=tx-42)
T2: B.credit(100, idempKey=tx-42)
completed

Compensation: T2 fails → C1: A.credit(100, idempKey=tx-42-comp) (not a refund, a reverse credit).

Critical failure modes:

Orch crashes after T1 completes: on recovery, replay from event log; see T1 done, resume T2. The prerequisite is that orch state is persisted at every step (Temporal's event-sourcing model).
Orch crashes after sending T2, not knowing whether B succeeded: on recovery, retry T2; B's idempotency key dedupes — if last time succeeded, return same result; if last time the request didn't arrive, process normally. This is exactly what idempotency keys are for.
T2 fails forever (B permanently unreachable): compensate with C1. But beware "compensation also fails" — what if A is down too? Industry practice: a failed compensation goes to a DLQ with alerting, and an on-call engineer intervenes; do not let sagas retry forever. Temporal has a default max-attempts policy.
T1 succeeds but outbox not written: impossible. If you use outbox, T1 = (UPDATE accounts; INSERT outbox) is one atomic transaction.

Deeper reflection: this example is often used to "prove" saga is weak. But in real banking systems, transfers don't rely on distributed transactions at all — they use double-entry bookkeeping: the ledger isn't "modify A, modify B," but "record a debit, record a credit"; the sum of both must be zero. Any imbalance is caught by end-of-day reconciliation. This is centuries-old financial engineering wisdom, long predating distributed databases. Lesson: many "distributed-transaction needs" disappear when you change the data model.

2. Saga drops isolation — give a real example where reading an intermediate state causes a bug. How do you solve it?

Scenario: an e-commerce saga has steps "deduct inventory → deduct balance → ship." User A triggers a saga; inventory deduction succeeds (stock -1), balance deduction not yet started. Another query endpoint is hit and reads stock as -1; the recommender concludes "low stock" and sends user B a "limited-time offer" push. Then A's saga fails balance deduction and compensates (stock +1).

Result: user B got a marketing push based on an inventory reduction that never actually happened, and may immediately order only to find stock is fine — business logic disordered. This is the isolation violation caused by saga's "intermediate state leak." Under ACID, uncommitted transactions are invisible to others, so this cannot happen.

Solutions, from cheap to expensive:

Semantic Lock (Chris Richardson): add a pending column on inventory. During the saga, mark pending=true; other readers know "this is a saga in-flight state" and decide accordingly (the recommender can ignore pending rows).
Commutative operations: change "deduct inventory" to "reserve +1, ship -1," making order irrelevant. Compensation is "reserve -1," leaving no ghost state.
Reorder steps: push irreversible / high-side-effect steps to the end. Reversible steps (DB writes) first; non-compensable steps (sending emails, calling external APIs) last.
Versioning: each saga write carries saga_id + version; readers decide visibility based on saga state (MVCC-style).

Underlying principle: Saga trades "eventual consistency + visible intermediate state" for "cross-service feasibility + high throughput." If the business cannot tolerate visible intermediate state, that data should not cross services — it should stay in one service with local transactions. Back to "draw the right domain boundary."

3. Outbox introduces "second-level lag from DB to Kafka." If a business needs "the order is searchable immediately after writing," what do you do? Give 3 approaches.

Real scenario: after placing an order, the user must immediately see it in "My Orders." If outbox + Debezium has 2s lag, the user wonders "where's my order?" This is the conflict between synchronous requirements and async outbox.

Approach 1: Dual-read (recommended, cheap)

"My Orders" reads the search index, plus an extra query against the primary DB for the last N minutes of orders, merged in the UI.
Client / API gateway aggregates both sources.
Cost: one more DB query, but limited to current user's recent orders (highly indexed, < 5ms).
UX: perfect read-your-writes; downstream search system has no "must be instant" pressure.

Approach 2: Dual-write + tolerate inconsistency (for "occasionally lose a bit" cases)

Business code writes DB and ES at the same time. Accept some inconsistency, fix via batch reconciliation.
Cost: the dual-write problem returns — but if the business tolerates "search occasionally misses one row," acceptable.
Never for financial scenarios.

Approach 3: Synchronous outbox (heavy, edge cases only)

After commit, synchronously call Kafka producer and wait for ACK before returning to the client. Outbox is the backup.
Cost: write latency rises from 5ms to 30-100ms (Kafka cross-broker sync); Kafka failures stall the write path.
Use when "writes must be readable within seconds" is mandatory (financial reconciliation, regulatory reports).

Deeper reflection: "instant consistency" is almost always over-engineering. Google's and Amazon's search both have index lag. Ask the business: "really, not even 1 second?" — 80% of the time the answer is "5 seconds is fine." For the remaining 20%, Approach 1 (dual-read) usually suffices. Treating "instant consistency" as default is a hidden tax on performance and availability.

4. Cross-chapter: Day 6 (Consistency) says "pick consistency per data type"; today says "avoid distributed transactions." Combine both for the same e-commerce system: draw a "consistency + transaction strategy" map.

This is the layered-decision skill required of architects. Each subdomain in one system should pick both consistency and transaction strategy independently.

Subdomain	Consistency (Day 6)	Transaction strategy (Day 7)	Implementation
Inventory deduction	Linearizable	Single-DB local txn + row lock / CAS	Postgres SELECT FOR UPDATE or Spanner
Checkout (across 5 services)	Eventual + compensation	Saga (Temporal orchestration)	Per-step local txn + reverse compensation + idempotency
Payments (Stripe)	External, at-most-once charge	Idempotency key + retry + reconciliation	Stripe key = saga_id + end-of-day reconciliation
Order status fanout (search, recs, msgs)	Eventual	Outbox + Kafka + at-least-once consumers	Debezium → Kafka → multiple consumers
"My Orders" display	Read-your-writes	Primary DB + secondary-index dual read	Session token carries LSN + dual read
User analytics / telemetry	Eventual + lossy ok	Fire-and-forget + batch	Local buffer → async Kafka push

Overall architecture:

"Hard correctness" data (inventory, balance, order state) → local ACID inside one service.
"Cross-service flow" → Saga + Outbox + Idempotency trio.
"Downstream fanout" → Eventual via Kafka.
"Display-layer consistency" → client tokens + dual read.

Anti-patterns:

System-wide strong consistency ("to be safe"): Spanner everywhere; checkout P99 spikes; conversion drops.
System-wide eventual ("to keep it simple"): oversells, double charges, support inferno.
Distributed transactions everywhere: tight coupling, fault propagation; one service down brings everything down.

The architect's core skill: be able to draw this table in 5 minutes for a new business, with every row backed by numbers (QPS, SLO, cost of violation). That's the real boundary between "senior" and "architect."

5. Counterintuitive: "the best distributed transaction is none at all" — give a real example of "killing a distributed transaction via domain redesign."

This is the essence of Pat Helland's paper and the mark of microservice design maturity. When you find 90% of flows cross the same two services and saga compensations grow ever more complex, your boundaries are wrong.

Real case 1: e-commerce "order + inventory" used to be two services

Initial: order service, inventory service split. Every order requires a saga.
Pain: 90% of requests cross both services; saga compensation accounts for 40% of code; failure rate climbs; ops cost soars.
Refactor: merge into "order fulfillment service" (one DB, one team); order and inventory in the same local transaction — SELECT FOR UPDATE on the inventory row + INSERT order, done in 5ms. The saga disappears entirely.
Lesson: "more microservices = better" is a myth. Boundaries should follow "independent evolution + independent scaling + independent failure," not code module boundaries.

Real case 2: banking transfers replace distributed transactions with double-entry bookkeeping

Common-wrong design: account-A service, account-B service; transfers require distributed transactions.
Better design: all accounts in one "ledger service"; the ledger table is INSERT-only; every transaction writes two rows (debit / credit); sum == 0 means correct.
Reading balance = SUM(credits) - SUM(debits) (with snapshots for speed).
"Cross-account transfer" reduces to two INSERTs in one service's local transaction — distributed transaction eliminated. This is the core design of Stripe Ledger, Square, and real-world ledger systems.

Real case 3: event sourcing collapses multi-service state into a single source of truth

Each service has its own state → cross-service consistency issues sprout endlessly.
Switch to event sourcing: every service writes events to a central event store (Kafka / EventStoreDB); each service materializes its own view from the stream.
"Cross-service transaction" becomes "append an event" — a local op. View inconsistency is handled by consumers.
Examples: parts of Netflix, parts of LinkedIn, most modern fintech.

Philosophical takeaway: complexity in distributed systems never disappears, only moves. Saga moves "coordinator complexity" to "compensation complexity"; double-entry moves "cross-service transaction" to "data-model design"; event sourcing moves "cross-service consistency" to "event schema evolution."

Mark of a mature architect: identify where complexity hides and put it where it is easiest to understand and evolve. Not eliminate complexity, but place it well. This principle applies to all design decisions — fundamentally the same thinking as "push inconsistency to the UX layer, keep consistency at the DB layer."