Defer-not-hold: replacing held-HTTP approval with PreToolUse defer

The original CodePulse approval pipeline made a simple bet: when Claude Code asks for permission via a PreToolUse hook, hold the HTTP connection open until the user taps Allow or Deny on Telegram. The bet worked for a year. It also created a structural ceiling on how many concurrent decisions any single user could have in flight, how slow the user could be without the CLI giving up, and how gracefully the system could recover from any kind of network blip on the bridge.

When Claude Code v2.1.89 added a new defer permission decision — a documented (eventually) signal that lets a hook return immediately and resolve the decision asynchronously — we knew the held-HTTP era was over. What we did not expect was that adopting it would take five implementation phases, eleven hardening releases, and a months-long pas de deux with undocumented CLI behavior. This is the post-mortem on the migration that landed under TAB-551.

What "held HTTP" actually cost us

The original pipeline worked like this. Claude Code wants to run a Bash command. It fires the PreToolUse hook with the tool name and arguments. Our hook script POSTs the payload to the local approval bridge and waits for the bridge to respond. The bridge sends a Telegram card to the user. The user taps Allow. The bridge writes the decision back to the still-open HTTP response. The hook script reads the response, returns the decision to the CLI, and the CLI either runs the tool or denies it.

The architecture is simple to draw and simple to reason about. It is also a wall of held connections. Every pending decision is one open HTTP request, blocked at read(), waiting for a human in milliseconds-of-human-time. Our default timeout was 360 seconds. A user with three pending tool approvals on Telegram had three concurrent open HTTP requests sitting on the bridge for up to six minutes each. A user on a flight with their phone in airplane mode would have approval cards waiting for them on the ground, but the corresponding HTTP requests would have already timed out twenty minutes earlier — meaning the cards on Telegram would be tappable but tapping them would do nothing, because the other end of the conversation had already terminated.

We documented the failure modes in our approval pipeline post, but the deeper problem was structural: the approval was synchronous from the CLI's perspective and asynchronous from the human's perspective, with the bridge bearing the entire impedance mismatch on a fixed timeout. Any latency above the timeout broke the contract. Any network blip dropped a decision. Any user who put their phone away for an hour came back to a card that looked alive and was actually dead.

We had wallpapered over the worst of it with a held-connection pool, retry logic on the hook script side, and a generous timeout. None of that addressed the underlying shape. The shape was wrong. We needed the CLI to let go of the question while the human thought about it, and reattach later when the human had answered.

What `defer` does

Claude Code v2.1.89 added a third permission decision alongside allow and deny. From the changelog:

Added defer permission decision to PreToolUse hooks — headless sessions can pause at a tool call and resume with -p --resume to have the hook re-evaluate.

The semantics are exactly the shape we needed. The hook returns {permissionDecision: "defer"} immediately. The CLI exits cleanly, saves the deferred tool to its session state on disk, and is gone. No held connection. No background process. The session is durably paused. When the user is ready, the CLI is restarted with claude -p --resume <sessionId> and the same PreToolUse hook fires again with the same tool input. The hook checks the bridge for a stored decision — allow, deny, or defer again — and returns it.

// Hook returns this on first call:
{ "hookSpecificOutput": {
    "hookEventName": "PreToolUse",
    "permissionDecision": "defer"
} }

// CLI exits. Bridge stores the question. Telegram card lives on.

// User taps Allow. Bridge stores: pendingApprovals.set(toolCallId, "allow")

// Resume fires. Same hook called with same tool input.
// Hook checks bridge: { permissionDecision: "allow" }
// CLI runs the tool.

The architectural shift is small to describe and large in consequence. The CLI no longer holds anything. The bridge no longer holds anything. The decision lives in durable storage between the defer and the resume. Our resume-per-message pattern — already the spine of CodePulse's CLI integration — could absorb defer cleanly because every message was already a --resume invocation. We just needed the bridge to remember which tool calls had been deferred and what decision the user had given.

A sequence diagram contrasting the held-HTTP architecture (CLI waits, bridge holds) with the defer architecture (CLI exits, bridge persists, resume re-asks)

The migration of the approval pipeline began the day v2.1.89 dropped. The migration finished — in the sense of having shipped through all eleven hardening releases — three weeks later. Almost none of those three weeks was spent on the core mechanism. They were spent on the corners.

What the docs did not say

Anthropic shipped defer in v2.1.89 with one paragraph in the changelog and no documentation page. Three GitHub issues open at the time we started — #41791 (hooks docs omit defer), #41794 (headless docs omit deferred resumption), #42309 (--resume prompt cache behavior with deferred tools) — captured the gap. We had to learn the behavior by running the CLI and watching what came out. The behaviors we discovered, in roughly the order we discovered them:

The deferred tool's payload is not echoed in the stream-json output. When the CLI exits on a defer, it does not emit a "session deferred" event in NDJSON. We had to detect "session deferred" by absence — the session ended without a result event of type success or error. Building reliable detection on absence is harder than building it on presence.

hookSpecificOutput.hookEventName is required for defer to register. v2.3.81 was a hotfix for this exact bug. We were emitting {permissionDecision: "defer"} without the surrounding hookSpecificOutput.hookEventName: "PreToolUse" envelope. The CLI was silently treating the response as malformed and falling back to the default permission behavior — which was allow, not defer. Tools were running before the user had approved them. We caught the bug only because a user reported a Bash command running without a Telegram card appearing, and a packet capture of the hook response showed the missing field. The fix added the explicit hookEventName to every defer response we emit. Wire format details matter, and the wire format details for defer were entirely undocumented.

Defer only applies to tool events. We had to verify experimentally that Notification, SessionStart, Stop, and UserPromptSubmit hooks ignore permissionDecision: defer entirely. They do. The CLI logs nothing about it. The hook returns and execution continues as if the field were absent. Useful to know; impossible to discover without trying.

Resume timing matters more than expected. A claude -p --resume <id> issued less than ~250ms after the CLI exits sometimes loses the deferred state. We added a 500ms minimum resume delay after the bridge sees the CLI exit. v2.3.79 added this. v2.3.80 added a separate deferKey shortening because the original deferKey format exceeded Telegram's 64-byte callback data limit, which made the whole resume callback invisible to the user.

Multi-defer behavior is undefined. TAB-551 explicitly notes "Maximum defer count per tool/session: no documentation exists." We tested up to ten consecutive defers on the same tool call and found it works. We do not know what happens at one hundred. We have not tested. Production has not yet hit the corner.

The cumulative effect of this learning is captured in TAB-551 itself, which explicitly opens with ⚠️ RISK: Blocked until official documentation provided and lists six "Key Unknowns." We shipped anyway because the win was large enough to justify the risk, and we shipped incrementally because we did not trust ourselves to land it correctly in one cut.

Five phases, eleven hardening releases

The implementation broke into five phases that landed across releases v2.3.74 through v2.3.78, plus eleven follow-up releases (v2.3.79 through v2.3.89) that landed each new corner case as we found it.

Phase	Release	What landed
1 — bridge defer infrastructure	v2.3.74	`pendingApprovals` ManagedMap, defer response type, callback handler
2 — hook script defer support	v2.3.75	PS1 + sh hooks parse `permissionDecision: defer` from bridge response
3 — Telegram defer flow	v2.3.76	Defer button on approval card, callback dispatcher, resume trigger
4 — UX polish	v2.3.78	Paused indicator, resume feedback, session-expiry error card
5 — multi-decision races	v2.3.79	Eviction guard for completed decisions, session-expiry timeout

The hardening releases each fixed one specific behavior we had not anticipated:

v2.3.80 — deferKey shortened from base64-of-hash to d:{base36-of-hash} to fit Telegram's 64-byte callback data limit. The original key was getting truncated mid-string.
v2.3.81 — hookEventName: "PreToolUse" envelope added to defer responses (the silent-allow bug above).
v2.3.82 — Delegate-session scoping. Defer was firing on interactive terminal sessions where users expected the synchronous wait. We added a flag to scope defer to delegate-spawned sessions only.
v2.3.83 — Non-delegate auto-allow. Interactive sessions now bypass defer entirely so the user sees the synchronous prompt they expect.
v2.3.84 — Guard ordering. The non-delegate guard had to move above the TCVF verification pipeline so interactive sessions skipped the entire defer pipeline rather than going through it and being rejected.
v2.3.85 — Smart continuation. A 5-minute rolling window auto-resumes the CLI session for follow-up messages, so the user does not lose context after a deferred-and-resolved tool call.
v2.3.86 — UTF-8 stdin encoding fix on the PS1 hook (CP1252 -> UTF-8 for emoji and Unicode in deferred-card content).
v2.3.87 — Wait Quietly suppression scoped to delegate sessions only (interactive sessions need the Stop hook to fire normally).
v2.3.88 — Auto-approve notification restored. The non-delegate path was suppressing "Bash auto-approved" status messages that interactive users actually want to see.
v2.3.89 — Learning + notifications restored. Non-delegate auto-allows now record engine learning patterns and send Telegram notifications. Interactive sessions are first-class again.

The pattern across these eleven releases is consistent: every corner case was a place where the binary split between "delegate session" and "interactive session" had subtleties we had not modeled. The defer mechanism worked correctly from v2.3.74 onward; what took eleven releases was figuring out which sessions should use it and how to gracefully handle the ones that should not.

A timeline of releases v2.3.74 through v2.3.89 showing the 5 implementation phases as larger milestones and the 11 hardening releases as smaller follow-up dots

The win

Three measurable things changed.

Held HTTP connection count dropped to zero. Before defer, our worst-case load on the local approval bridge was N concurrent decisions × M users × 360 seconds of held connection. After defer, the bridge holds nothing — every approval state is durably stored in pendingApprovals and looked up on resume. The bridge can run with a default thread pool. Memory pressure on long sessions disappeared.

The user can take arbitrarily long to respond. A user who tapped Allow forty-five minutes after the original tool call now resolves cleanly. The CLI was not waiting; the bridge was just remembering. There is no timeout to extend. The approval state lives until the bridge process restarts (or the explicit eviction TTL fires, currently 24 hours).

Network blips no longer drop decisions. A connection to the bridge that fails between defer and resume is recoverable — the next resume reads the same stored decision. Before defer, the same blip would silently lose the human's response and require them to re-approve, sometimes without any indication that the previous tap had been lost.

Two unmeasurable but real things changed too.

The approval architecture composes with other CLI features. Smart continuation (v2.3.85) auto-resumes the session for follow-up messages. The auto-resume relies on the same claude -p --resume <id> mechanism that defer requires. Once defer was in, smart continuation was a one-paragraph change instead of a separate architectural lift.

The pipeline is no longer dependent on Telegram round-trip latency. Pre-defer, a slow Telegram delivery (rate limiting, network congestion, the user's phone in low-battery mode) could push the bridge timeout. Post-defer, Telegram latency does not affect anything except the user's perception of speed. The bridge does not care.

What it took to ship through the documentation gap

The most honest thing we can say about this migration is that we shipped a lot of bugs, fixed them, shipped a lot more, and kept iterating. Eleven hardening releases is not a clean roll-out. It is a visible iteration trail in our release log that any user can read, which we accept as the cost of moving on a feature whose contract was being defined by behavior rather than by docs.

A few practices made the iteration tolerable rather than catastrophic.

We rolled out behind a delegate-only flag. Interactive sessions were excluded from defer for the first three releases. Only delegate-spawned sessions exercised the new path. This meant the bug surface was a fraction of the user base, and the bug reports we got back were mostly from delegate-mode users (advanced users who tolerate iteration better than first-time users).

Every release had a one-paragraph "what we found and how we fixed it" entry in docs/RELEASES.md. The eleven hardening releases together produced a usable case study of the defer mechanism's behavior — useful both internally and to anyone integrating with Claude Code hooks at scale. We linked back to the relevant GitHub issues on Anthropic's repo so the upstream documentation gap was visible alongside our workaround.

We left the synchronous held-HTTP path intact for several releases. The defer pipeline lived alongside the held-HTTP pipeline behind a config flag. Until v2.3.78, you could disable defer entirely and fall back. We removed the held-HTTP path only after the eleventh hardening release (v2.3.89) had been in production for a week without regressions. Killing the rollback path before you trust the new path is brittle.

We did not try to fix every documented unknown before shipping. The "Maximum defer count per tool/session" question is still open. We did not block the migration on it. We documented the unknown, set a conservative default (single-defer per tool call), and shipped. Some questions are not worth answering empirically until production exercises them.

When this kind of architectural migration is worth doing

The migration cost roughly fifty engineering hours across three weeks of calendar time. It pays back the first time a user has more than a handful of pending approvals, or sits with a phone in their pocket for an hour, or has a flaky cellular connection on a train. None of those scenarios were rare. The pre-defer pipeline was correct on a fast network with an attentive user; it was fragile on the network and user shapes that real production has.

The structural lesson is the same one we have written about for if filters and main-vs-tag workflow resolution: when the CLI ships a feature that subtracts a load-bearing assumption from your architecture — synchronous wait, in our case — the right reaction is to migrate. The wrong reaction is to keep the old assumption and pile workarounds on top of it. Workarounds compound. Migrations end.

Three rules generalised from the defer migration.

Adopt new CLI features eagerly, but behind flags. Claude Code's pace is fast. Features arrive every week. Features that subtract assumptions from your architecture are worth more than features that add capabilities. Adopt the subtractive ones first.

Document what the docs do not. The five "things we discovered the docs did not say" above are now in a private engineering note. The next person on our team who touches defer reads that note before they touch the bridge. If we had documented less, every corner case would have been re-discovered next year.

Ship the migration in phases that each preserve the old path. The cost is a longer total migration; the benefit is the ability to roll back any single phase without rolling back the whole thing. Eleven hardening releases is a lot. Eleven hardening releases that each shipped behind a flag and could be reverted cleanly is acceptable.

If your integration with Claude Code (or any external CLI) is currently holding open connections to bridge between the CLI's synchronous expectations and a human's asynchronous reality, defer is probably the answer. The pattern generalises to any async-decision flow that uses --resume semantics: the question is durable, the response is durable, the connection is not.

We have run two months of releases since v2.3.89. Held-HTTP timeout rates dropped to zero, support tickets about "approval not registering" dropped to zero, and the smart-continuation feature that depends on defer has shipped cleanly on top of the same architecture. The migration cost was real. The structural payoff continues to compound.

Ready to ship Claude Code integrations that scale with human attention? Download CodePulse and let your phone resolve approvals at human speed, not network speed. The free tier includes the approval pipeline, zero-config installer, and Telegram bridge. Upgrade to Premium to unlock AI commit review, the Genius Supervisor, and voice input.

What "held HTTP" actually cost us

What `defer` does

Claude Code v2.1.89 added a third permission decision alongside allow and deny. From the changelog:

Added defer permission decision to PreToolUse hooks — headless sessions can pause at a tool call and resume with -p --resume to have the hook re-evaluate.

// Hook returns this on first call:
{ "hookSpecificOutput": {
    "hookEventName": "PreToolUse",
    "permissionDecision": "defer"
} }

// CLI exits. Bridge stores the question. Telegram card lives on.

// User taps Allow. Bridge stores: pendingApprovals.set(toolCallId, "allow")

// Resume fires. Same hook called with same tool input.
// Hook checks bridge: { permissionDecision: "allow" }
// CLI runs the tool.

A sequence diagram contrasting the held-HTTP architecture (CLI waits, bridge holds) with the defer architecture (CLI exits, bridge persists, resume re-asks)

What the docs did not say

Five phases, eleven hardening releases

Phase	Release	What landed
1 — bridge defer infrastructure	v2.3.74	`pendingApprovals` ManagedMap, defer response type, callback handler
2 — hook script defer support	v2.3.75	PS1 + sh hooks parse `permissionDecision: defer` from bridge response
3 — Telegram defer flow	v2.3.76	Defer button on approval card, callback dispatcher, resume trigger
4 — UX polish	v2.3.78	Paused indicator, resume feedback, session-expiry error card
5 — multi-decision races	v2.3.79	Eviction guard for completed decisions, session-expiry timeout

The hardening releases each fixed one specific behavior we had not anticipated:

v2.3.80 — deferKey shortened from base64-of-hash to d:{base36-of-hash} to fit Telegram's 64-byte callback data limit. The original key was getting truncated mid-string.
v2.3.81 — hookEventName: "PreToolUse" envelope added to defer responses (the silent-allow bug above).
v2.3.82 — Delegate-session scoping. Defer was firing on interactive terminal sessions where users expected the synchronous wait. We added a flag to scope defer to delegate-spawned sessions only.
v2.3.83 — Non-delegate auto-allow. Interactive sessions now bypass defer entirely so the user sees the synchronous prompt they expect.
v2.3.84 — Guard ordering. The non-delegate guard had to move above the TCVF verification pipeline so interactive sessions skipped the entire defer pipeline rather than going through it and being rejected.
v2.3.85 — Smart continuation. A 5-minute rolling window auto-resumes the CLI session for follow-up messages, so the user does not lose context after a deferred-and-resolved tool call.
v2.3.86 — UTF-8 stdin encoding fix on the PS1 hook (CP1252 -> UTF-8 for emoji and Unicode in deferred-card content).
v2.3.87 — Wait Quietly suppression scoped to delegate sessions only (interactive sessions need the Stop hook to fire normally).
v2.3.88 — Auto-approve notification restored. The non-delegate path was suppressing "Bash auto-approved" status messages that interactive users actually want to see.
v2.3.89 — Learning + notifications restored. Non-delegate auto-allows now record engine learning patterns and send Telegram notifications. Interactive sessions are first-class again.

A timeline of releases v2.3.74 through v2.3.89 showing the 5 implementation phases as larger milestones and the 11 hardening releases as smaller follow-up dots

The win

Three measurable things changed.

Two unmeasurable but real things changed too.

What it took to ship through the documentation gap

A few practices made the iteration tolerable rather than catastrophic.

When this kind of architectural migration is worth doing

Three rules generalised from the defer migration.

What "held HTTP" actually cost us

What defer does

What the docs did not say

Five phases, eleven hardening releases

The win

What it took to ship through the documentation gap

When this kind of architectural migration is worth doing

What "held HTTP" actually cost us

What defer does

What the docs did not say

Five phases, eleven hardening releases

The win

What it took to ship through the documentation gap

When this kind of architectural migration is worth doing

What `defer` does

What `defer` does