DAY 34 / PHASE 4 · HUMAN-IN-THE-LOOP

Human-in-the-Loop Engineering

Confidence Routing · Async Approval · Interruptible Resume · Approval Audit

2026-06-15 · BigCat

Real HITL isn't "pop a dialog and wait for you to click yes" — it's teaching the agent to pause gracefully, hand the decision back to you, and pick up seamlessly afterward.

// WHY THIS MATTERS

Day 31 covered "put a human gate before irreversible actions" — but that was one line of input(). Once the agent runs in the background, runs hundreds of turns, and you're not watching the terminal, synchronous blocking breaks: the agent freezes waiting on a terminal nobody's at; or you get annoyed and just set everything to auto. Production HITL isn't a dialog box, it's an orchestration discipline: when is it worth interrupting a human (confidence routing), how do you ask without blocking (async approval + state persistence), how does a human take over and hand back (interruptible checkpoints), and how does all of it stay auditable (approval audit). Today we upgrade require_human from one line of input() into a system that can suspend while you sleep and resume from a tap on your phone. The hard part of HITL was never "should we ask a human" — it's "what does the agent do while it asks."

// 01

Confidence Routing: deciding when it's worth interrupting a human

Claim: ask on every action → approval fatigue until you rubber-stamp blindly; ask on nothing → you automated what shouldn't be. The first piece of HITL engineering is routing, not a gate.

Background & principle

Day 31 sorted by reversibility (reversible runs free, irreversible gets a gate). Add a second axis here: confidence × impact. The cross yields three tiers — auto (high confidence + low impact, run free), ask (low confidence or medium-high impact, ask a human), deny (over-budget / unauthorized, just refuse — it never enters the human queue). The key trap: confidence can't be self-reported by the model — it systematically overestimates. Use verifiable proxy signals instead: retrieval hit count, whether a tool errored, self-consistency across samples, whether output matches the schema. "I'm confident" doesn't count; the evidence does.

In practice

def confidence(ctx):                  # verifiable signals, not "how sure are you?"
    sigs = [ctx.retrieval_hits >= 2,
            not ctx.tool_errored,
            ctx.self_consistency >= 0.8,    # agreement across samples
            ctx.schema_ok]
    return sum(sigs) / len(sigs)

def route(action, ctx):
    if action.over_budget or action.unauthorized:
        return "deny"                     # unauthorized isn't "ask" — it's a bug
    c, impact = confidence(ctx), action.impact
    if c >= 0.8 and impact == "low":  return "auto"
    return "ask"                          # low confidence or high impact → ask

Core mental model: HITL's cost is human attention, and attention is a scarce, depletable resource. Routing's goal isn't "safer" — it's "spend each human intervention where it counts."

Failure modes: (1) Confidence self-reported by the model — under pressure to finish, it overestimates; you have no routing. (2) Thresholds picked by gut and frozen forever (see §4). (3) Dumping deny into the human queue too — unauthorized actions don't belong there; mixing them in just buries the ask items that genuinely need judgment.

Going deeper · Anthropic Building Effective Agents (human oversight & the agent-computer interface), anthropic.com/engineering/building-effective-agents

// 02

Async Approval: let the agent pause, notify, and resume

Claim: input() is synchronous blocking — a background agent that uses it will die. The core of production HITL is turning "wait for a human" from blocking into suspend + callback.

Background & principle

When the agent hits the ask tier, input() freezes the whole worker, idling on a terminal that may have nobody at it for hours. The fix follows durable execution: checkpoint state outside the process (not in memory) → send a request to an approval channel (Slack / phone push / queue) → yield resources and exit → once the human decides, resume from the checkpoint via callback or polling. Two iron rules: state must live outside the process (a crash mustn't lose the approval); resume must be idempotent (replaying the same approval can't execute the action twice). Attach an SLA timeout to each approval: hung too long → default-deny or escalate, never wait forever.

┌──────────────── async approval loop (non-blocking) ────────────┐ │ agent loop ──▶ route()==ask │ │ │ │ │ ▼ ① checkpoint state OUT of process (DB / file) │ │ ▼ ② send request to Slack / phone + approval_id │ │ ▼ ③ yield the worker, exit ── no idle CPU spin │ │ · · · (human taps on their phone) │ │ ▼ ④ callback wakes up with approval_id │ │ ▼ ⑤ idempotency check: seen this id? → ignore │ │ ▼ ⑥ resume_from(checkpoint), continue │ │ unresolved ─▶ SLA fires: default-deny / escalate to backup │ └─────────────────────────────────────────────────────────────────┘

In practice

async def step(ctx):
    action = plan(ctx)
    if route(action, ctx) == "ask":
        aid = save_checkpoint(ctx)              # ① state out of process, with approval_id
        notify("slack", action, aid)         # ② fire request, don't await
        raise Suspend(aid)                      # ③ yield worker, exit
    return execute(action)

def on_approval(aid, decision):               # triggered by callback after the human acts
    if seen(aid): return                     # ⑤ idempotent: replays don't re-execute
    mark_seen(aid); ctx = load_checkpoint(aid)
    if decision == "approve": resume_from(ctx)  # ⑥ continue from checkpoint
    else: resume_from(ctx, denied=True)        # feed denial back to the agent too

Failure modes: (1) Synchronous input() blocks the whole worker — one pending approval drags down every concurrent task. (2) State only in memory — a restart / crash loses the approval and all progress. (3) No idempotency — a replayed callback or a double-tap executes the transfer twice. (4) No timeout — one unattended ask suspends the task forever.

Going deeper · Temporal Durable Execution (state persistence & replay for long workflows), temporal.io · LangGraph Human-in-the-loop (interrupt / resume), langchain-ai.github.io

// 03

Interruptible & Takeover: not just "approve / reject," but "edit"

Claim: HITL's high value isn't the binary approval — it's a human stepping in to modify the agent's intermediate work, then letting it continue.

Background & principle

Thinking of human intervention as only approve / reject wastes it. Real workflows have two more valuable moves: edit (fix its plan / draft / params and continue) and takeover (a human does a few steps, then hands back). To support both, the agent loop needs three things: interruptible (not a while True running to the death), serializable state, and resumable from any checkpoint. The commonly-missed part: after a human edits, you must feed the change back into the agent's context — otherwise its next step overwrites your edit with the old plan. So "plan" and "draft" must be explicit, human-overwritable state, not buried in prompt history.

In practice

def loop(state):
    while not state.done:
        if stop_requested(): return snapshot(state)  # interruptible: save & exit anytime
        proposal = plan(state)
        decision = await_human(proposal)        # approve / edit / takeover
        if   decision.kind == "approve":  state = execute(proposal, state)
        elif decision.kind == "edit":
            state.plan = decision.edited         # human-edited plan
            state.ctx += f"[human edited plan: {decision.note}]"  # feed back! or it's overwritten
        elif decision.kind == "takeover":
            state = decision.human_steps         # human does steps, then resume
    return state

A counterintuitive but important point: edit is cheaper than reject. Reject sends the agent back to square one (it may repeat the same mistake); edit nudges it straight onto the right track — one correction beats ten rejections.

Failure modes: (1) A non-interruptible while True — to step in you can only kill it, losing all progress and restarting from scratch. (2) The human edits the draft but it isn't fed back into context — the agent ignores your change next turn. (3) Incomplete checkpoint state — on resume a key variable is missing and the agent runs "amnesiac."

Going deeper · LangGraph Breakpoints & state editing, langchain-ai.github.io · Microsoft Guidelines for Human-AI Interaction (G9–G11: correctable / takeover), microsoft.com

// 04

Approval Observability & Audit: who approved what, and was it right

Claim: approvals are first-class data. Without a trail you can't learn "which asks were redundant," nor trace "who approved this action."

Background & principle

Each approval should record at least: the action, the agent's reason, the confidence, who approved, how long it took, the outcome. This log has two uses. First, retune routing thresholds — an ask class you approve 95% of the time → demote to auto; an auto class that caused an incident → promote to ask. The §1 thresholds aren't set once; they grow out of audit data. Second, escalation & delegation — high-impact actions require more than one approver / a specific role (lightweight RBAC); an approval hung too long auto-escalates to a backup. Finally, the single best move against approval fatigue: batch similar low-risk actions (approve 10 on one screen) rather than popping 10 dialogs.

In practice

def log_approval(rec):                    # every approval to the store: retune + accountability
    db.insert(action=rec.action, reason=rec.agent_reason,
              conf=rec.confidence, approver=rec.who,
              latency=rec.decided_at - rec.asked_at, outcome=rec.result)

def retune(action_type):                  # regress thresholds from history
    rows = db.query(action_type, last_days=14)
    rate = mean(r.outcome == "approved" for r in rows)
    if rate > 0.95 and no_incidents(rows): suggest("ask→auto")  # always approved = redundant ask
    if any(r.outcome == "incident" for r in rows): suggest("auto→ask")

Failure modes: (1) No audit trail — no post-mortem, no accountability, thresholds stay guesswork forever. (2) Frozen thresholds, never retuned — neither redundant asks (fatigue) nor leaked autos (incidents) get fixed. (3) One granularity for all approvals — high- and low-risk in one queue, and the truly important gets drowned in noise.

Going deeper · Anthropic Agentic Misalignment (reduce autonomy / add oversight on sensitive actions), anthropic.com/research/agentic-misalignment

// Hands-on · add an async HITL layer to a background agent

Chain the four into a weekend project: give a background agent you run a layer that "suspends while you sleep, resumes from a tap on your phone," then red-team it yourself.

Routing: write a (confidence, impact) → {auto, ask, deny} table; proxy confidence with verifiable signals (retrieval hits / tool errors / self-consistency), never the model.
Async: turn the ask tier from input() into checkpoint-to-store + Slack/push + suspend-exit; don't block the worker.
Takeover: support approve / edit / takeover in the response, and always feed edits back into context.
Idempotency: dedupe by approval_id; callback replays / double-taps must not re-execute.
Audit: log every approval; after a week, use retune() to see which asks to demote and which autos to promote.
Red-team: deliberately leave an approval pending for 12 hours; verify the SLA timeout fires (default-deny / escalate) and the approval survives a process restart. If it doesn't, your state is still in memory.

Once you've built this, you'll instinctively ask of any "autonomous agent product": what does it do while it asks a human (block or suspend), can I edit its intermediate work, does it keep an approval trail — instead of being dazzled by the "one-click full-auto" in the demo.

// ENGLISH GLOSSARY

Human-in-the-Loop (HITL): Engineering that embeds human judgment into the agent's execution loop, not as after-the-fact review.
Confidence Routing: Routing actions to auto / ask / deny by (confidence × impact) to decide whether to interrupt a human.
Durable Execution: An execution model where state persists outside the process and resumes from a checkpoint after a crash / restart.
Checkpoint: A serializable snapshot of agent state — the vehicle for suspend and resume.
Idempotent Resume: Replaying the same approval runs the resume logic once, with no duplicate side effects.
Human Takeover: A human executes several steps, then hands control back to the agent.
Trajectory Editing: A human edits the agent's intermediate plan / draft and feeds it back into context before continuing.
Approval Queue: The queue of ask-tier actions awaiting human judgment; supports batching, timeout, escalation.
Escalation / SLA: On approval timeout, auto-escalate to a backup approver or decide by default policy.
Approval Fatigue: Too many approvals lead to blind clicking, degrading HITL into a rubber stamp.

// Deeper Thinking

§1 proxies confidence with "verifiable signals." But many open tasks (writing a strategy memo) have no verifiable signal — how do you set auto / ask there?

With no verifiable signal, fall back to impact (back to Day 31's reversibility axis): if the open task's output is just "a draft for you to read," impact is low → auto-produce but default into review; if it auto-sends (email / tweet), impact is high → force ask. In other words, confidence routing and reversibility gating are two complementary axes, not either/or: use confidence when you have a verifiable signal, fall back to impact / reversibility when you don't. When neither is strong, fail-safe to ask. The real danger is "no signal and default auto."

Async approval needs durable execution, state in a store, idempotent callbacks — isn't that over-engineering for a personal project? When is input() enough?

There's a clean line: does the agent run outside your attention. If you're at the terminal watching it step through, input() is plenty — don't add a queue. The moment it (a) runs in the background / on cron, (b) a single run spans beyond your attention window (tens of minutes+), or (c) multiple instances run concurrently — synchronous blocking bites: workers freeze, a crash loses all progress. The criterion isn't "project size," it's "are you present when it breaks." If not, you need durable. If you are, the machinery is just overhead.

edit / takeover let a human change the agent's intermediate work. But what if the human edits worse than the agent, or is unsure too? Is HITL just shifting blame to a tired person?

A real concern. HITL doesn't create judgment, it only places it — place it wrong and it's just blame-shifting. Two principles avoid that: (1) Only require human intervention where the human is genuinely better (domain knowledge, value trade-offs, your private context); leave pure compute / retrieval to the agent. (2) Give the human the context to decide, not just a yes/no — attach the agent's reason, confidence, alternatives. If the human can only rubber-stamp, that's not HITL, it's a ritual of liability transfer. When the human is also unsure, the right move is usually escalate or abstain, not force an approval.

§4 retunes thresholds from history = using the past to decide the future. If you approved lazily in the past, won't the regressed auto threshold cement bad habits into automation?

Yes — this is the core trap of HITL audit: it amplifies your approval bias. That no_incidents guard in retune() is the defense — only suggest demotion when "always approved and no incidents," not on "always approved" alone. But the sturdier practice is to never let the regression auto-change thresholds — only propose (suggest(), not apply()), with your periodic review. Feeding your bias into a system that then automates the bias is HITL's most insidious failure — so the audit itself must be audited.

How does this confidence routing relate to Day 31's reversibility gate? When do you use which? Do they conflict?

No conflict — they're two layers of depth. Day 31's reversibility gate is the safety floor: an irreversible action passes the gate no matter how high the confidence — a hard constraint routing can't bypass. Today's confidence routing is upper-layer efficiency: in the gray zone of "should we ask," it uses confidence to set interruption frequency and save attention. Compose them: pass reversibility first (irreversible → always ask, no confidence overrides it), then hand reversible actions to confidence routing for auto / ask. In one line: reversibility decides "may it be automated," confidence decides "is it worth asking." The former governs the safety floor, the latter the efficiency ceiling.

// Further Reading

Anthropic · Building Effective Agents — engineering principles for the agent-computer interface and human oversight
LangGraph · Human-in-the-loop — reference implementation of interrupt / resume / breakpoints / state editing
Temporal · Durable Execution — state persistence & replay for long workflows, the substrate for async approval
Microsoft · Guidelines for Human-AI Interaction (Amershi et al., CHI 2019) — 18 HCI guidelines, the basis for correctable / takeover design
Anthropic · Agentic Misalignment — why high-impact actions warrant lower autonomy and human oversight