19 releases in 4 days: how delegate mode went from prototype to production

Nineteen releases in four days. From March 27 to March 30, CodePulse shipped v2.3.29 through v2.3.47 — each one a focused change that moved delegate mode from a working prototype into production-grade infrastructure. This is the story of what changed, why it changed, and what it means for how you interact with Claude Code through Telegram.

The pace was not accidental. Delegate mode — where you describe a task in natural language and CodePulse hands it to Claude Code for autonomous execution — touches every layer of the system. The plan generator, the approval bridge, the MCP tool whitelist, the git state detector, the task completion flow, and the intent classifier all had to evolve together. Fixing one layer exposed assumptions in the next. The fastest way to ship reliable software was to ship each fix individually, test it in production, and move to the next layer.

Overview of the 19-release sprint: four themes across four days

The Task Complete redesign that started everything

The first major change in this sprint was not a bug fix — it was a feature that redefined how delegate mode communicates results. Version 2.3.33 replaced the old result card with a completely new Task Complete card (TAB-518).

The old card was minimal: a text summary and a "Push" button. It assumed every task produced a git commit and every user wanted to push immediately. That assumption was wrong for read-only tasks, analysis requests, and multi-step workflows where you want to review changes before committing.

The new Task Complete card shows everything you need to decide what happens next. A mode badge tells you whether the task ran in delegate or approval mode. A 300-character CLI summary captures what Claude actually did. Duration appears in a clean Xm Ys format. Files changed and line delta give you the scope at a glance. Two buttons — View Changes and Dismiss — replace the old push-first flow.

Task Complete card anatomy with mode badge, summary, metrics, and action buttons

Status-aware actions: six cards for six states

The real complexity lives behind the View Changes button. When you tap it, CodePulse detects the current git state — uncommitted changes, committed but not pushed, already pushed, or clean working tree — and combines it with the bridge session state (alive or dead) to determine which actions make sense. This produces one of six card variants, each showing only the actions that are actually available.

If the bridge session is still alive and there are uncommitted changes, you see "Ask Claude to Commit" which resumes the CLI session. If the session died (you closed the terminal, the process timed out), you see "Commit" which runs the operation directly. If changes are already committed, "Push" appears. If the tree is clean, the card confirms no changes were made.

This replaced nine stale test cases, the entire resultKeyboard module, sendResultCard, handleResultCallback, and every piece of dead code associated with the old push-first flow. The cleanup removed an entire category of state that was being stored but never properly cleaned up — storedPlans entries that lingered after plan cancellation, commit proposal approval, and push confirmation.

Race conditions and state corruption

The same week the Task Complete card shipped, three separate race conditions surfaced in the approval pipeline.

The approval mode race (v2.3.31)

Version 2.3.31 fixed a race condition where concurrent fire-and-forget executions corrupted the shared approval mode state (TAB-517). When multiple tasks ran simultaneously — a common pattern in delegate mode where you might approve one task while another is already executing — the approval mode could flip between states unpredictably. One task would set the mode to "delegate," another would read it as "approval," and the resulting behavior was non-deterministic. The fix added synchronization to ensure approval mode transitions are atomic.

The stop storm (v2.3.36)

After a delegate task completes, Claude generates repeated idle messages — acknowledgments, summaries, and "anything else?" prompts. Each one triggers a stop hook, which generates a Telegram card, which creates noise on your phone. Version 2.3.36 added post-task suppression: after the Task Complete card is sent, a 50-stop limit activates. Subsequent stops are silently consumed. The counter resets when you send the next message, ensuring the suppression never interferes with real interactions.

The stale history injection (v2.3.30)

This was the subtlest bug. When generating a plan for a new delegate task, the plan generator was injecting the full conversation history — including messages from previous, unrelated tasks — into the prompt for Haiku. The result was plans about previous requests instead of the current one. A user asking "refactor the auth middleware" might get a plan about "fix the login bug" because that was the previous conversation topic. The fix was surgical: the plan generator now uses only the current message context (TAB-516).

Extractive plan generation: preserving every word you said

Version 2.3.40 introduced the most architecturally significant change in this sprint: extractive plan generation (TAB-529).

The previous plan generator worked by creative summarization. Haiku received your message and wrote a plan in its own words. This was lossy — constraints got dropped, cross-references were forgotten, and negative instructions ("do NOT commit") disappeared entirely. If you said "check the blog section first, then write the article, but don't commit until I approve," the plan might say "1. Research blog templates 2. Write article 3. Commit changes." That third step directly contradicts your instruction.

Extractive vs creative plan generation: how constraints survive the pipeline

The new extractive approach works differently. Instead of creative summarization, Haiku extracts structured sections directly from your message:

Summary — what the task is about
Steps — the ordered list of actions
Constraints — conditions that must be respected
Cross-check — files or references to verify against
Do NOT — explicit negative instructions

The original user message travels alongside the structured plan to Claude CLI, ensuring no detail is lost. The plan card now displays the Constraints, Cross-check, and Do NOT sections visibly, so you can verify the plan was interpreted correctly before execution begins.

Two supporting fixes shipped alongside this feature. The maxTokens for plan generation was bumped from 512 to 1024 tokens to prevent truncated JSON responses from the more detailed extractive prompt (v2.3.40). And a JSON repair safeguard was added for cases where Haiku still exceeds the limit — instead of falling back to a generic three-step plan, the parser now closes unclosed strings, arrays, and objects, preserving as much structured plan data as possible (v2.3.42).

The $0 cost display (v2.3.42)

A small but important UX fix shipped with the JSON repair: the plan card was showing estimated API cost instead of "$0.00 (Max Plan)" for users on the Claude Max subscription. Delegate mode uses the Max Plan at zero actual cost, and showing a misleading dollar amount implied API charges that did not exist.

MCP tools: the whitelist that blocked everything

Two releases — v2.3.38 and v2.3.39 — tell the story of getting MCP tools working in delegate mode (TAB-528).

The delegate mode bridge launches Claude CLI with an --allowedTools whitelist. This whitelist originally contained exactly 10 standard tools: Read, Write, Edit, Bash, Glob, Grep, and a few others. Every MCP tool — from Linear to GitHub to Supabase to Stripe — was excluded. When a delegate task needed to create a GitHub issue or query a Linear project, it failed silently.

Version 2.3.38 fixed this by dynamically discovering configured MCP servers from ~/.claude/settings.json and adding them to the allow list. But the fix had a syntax bug that v2.3.39 caught: the mcp__server__* wildcard pattern silently fails in Claude Code's --allowedTools flag (Claude Code GitHub issue #13077). The trailing __* suffix was the problem. Changing to mcp__server without the suffix correctly matched all tools from each configured server.

This two-release sequence is a pattern that repeated throughout the sprint. The first fix addresses the root cause, production testing reveals a subtle edge case, and a follow-up release patches it. The total time from v2.3.38 to v2.3.39 was under two hours.

The environment variable that broke health checks

Version 2.3.35 fixed an environment conflict that perfectly illustrates the layered complexity of running inside Claude Code (TAB-524).

When the CodePulse service launches from a Claude Code terminal session, it inherits the CLAUDECODE=1 environment variable. This variable is how Claude Code signals to its subprocesses that they are running inside a Claude Code session. The problem: CodePulse's health checker used the same variable to detect whether it was running inside Claude Code. The service misidentified itself, health checks reported FAIL, and the diagnostic dashboard showed spurious errors.

The fix strips CLAUDECODE=1 from process.env at startup. Simple, but it required understanding the full chain: Claude Code sets the variable, the shell inherits it, the service reads it, and the health checker interprets it.

A related fix in v2.3.37 (TAB-527) addressed the health check pipeline from the other direction. The premium auto-enable and CLI path resolution functions wrote their results to config and bridge objects but not to process.env. The health checker read from process.env. The result was false warnings about missing configuration that was actually present. Syncing process.env after both operations brought the health report from "1 fail, 2 warnings" to clean.

The Unicode saga: ship, break, revert

Versions 2.3.41 and 2.3.43 form a cautionary tale about compiled binaries and character encoding.

The idea was straightforward: replace \uXXXX escape sequences with literal UTF-8 characters across source files. The source code would be cleaner, diffs would be more readable, and modern editors would display the characters correctly.

It worked perfectly in development. It broke everything in production. Bun's compiled binary handles escape sequences correctly but mangles literal multi-byte UTF-8 characters. Every emoji and special character across every Telegram card — including button labels — rendered as diamond replacement characters.

The revert in v2.3.43 restored the original escape sequences. The lesson was expensive but clear: what works in bun run does not necessarily work in bun build --compile. The compiled binary and the development runtime have different text encoding paths, and that difference is invisible until you deploy.

Diagnostic improvements you will never see (but will appreciate)

Version 2.3.44 added four diagnostic logging improvements that exist purely for debugging production issues (TAB-530):

User message preview (200 characters) during plan generation, so support can see what the user asked without accessing the full message
Raw Haiku JSON response (2000 characters) to trace plan extraction failures back to the exact model output
CLI prompt prefix (500 characters) in both approval and delegate bridges, showing exactly what Claude received
Increased /support log capture from 200KB to 500KB, ensuring complete diagnostic data in support requests

These are the changes that make the difference between a five-minute support resolution and a two-hour investigation. Nobody notices good logging until something breaks.

Content pushes and behavioral rules (v2.3.45)

Version 2.3.45 fixed two issues that affected the daily workflow of using delegate mode (TAB-531).

The TCVF (Test Coverage Verification Framework) T3 gate was blocking git push for changelog and blog .mdx file changes with "no tests observed during session." This made sense for code changes — you should not push untested code. But for content-only changes like blog posts and changelogs, requiring tests is absurd. The fix checks whether all edited files are content-only formats (.md, .mdx, .txt, .yaml, images) before requiring tests.

The second fix added a behavioral rule across all intent types preventing Claude from suggesting PR creation or GitHub compare links after delegate tasks. Users manage the git workflow through the Task Complete card, not through Claude's suggestions. This is a small behavioral constraint that eliminates a common source of confusion.

Intent classification: why your task was ignored (v2.3.46)

Version 2.3.46 fixed the most user-visible bug in the sprint (TAB-533). Users in delegate mode sending natural language tasks like "check the blog section and write an article about changes" were receiving "I'm in question-only mode" — a response meant for simple questions, not tasks.

The root cause was the intent classifier's word-pattern detection. Messages that did not start with action verbs (like "implement," "fix," "create") were classified as simple_question. In delegate mode, this classification routed the message to a question-answering path instead of the Claude CLI execution path.

The fix overrides simple_question to multi_step when in delegate mode, ensuring task messages are always routed to Claude CLI. Other intents — git_action, project_switch, continuation — are preserved because they have their own correct handlers.

The hang timer that killed long operations (v2.3.47)

The final release in this sprint — v2.3.47 — fixed a timeout conflict that caused the CLI process to be killed during legitimate operations.

The HANG_TIMEOUT_MS was set to 180 seconds. The APPROVAL_BRIDGE_TIMEOUT was 360 seconds. When a user took more than three minutes to respond to an approval request, the hang timer killed the CLI process before the approval timeout expired. The same happened during long MCP calls that exceeded three minutes.

The fix increased the timeout to 420 seconds and added smart approval-aware detection. When the timeout fires, it checks if an approval request is pending. If one is, the deadline extends rather than killing the process. This means long approval waits and slow MCP calls no longer cause process termination.

What 19 releases in 4 days tells you

This sprint was not planned as 19 releases. It started as "ship the Task Complete card" and evolved as production testing revealed layer after layer of assumptions that needed fixing. The stale history bug surfaced during plan testing. The MCP whitelist issue appeared when a delegate task tried to use GitHub. The Unicode encoding broke when the first compiled binary went out.

Each release followed the same pattern: identify the issue, write the fix, test it, ship it, move to the next layer. The release pipeline that v2.1.7 established — conventional commits, automated versioning, ed25519-signed builds, Cloudflare R2 distribution — made this pace sustainable. No manual version bumps, no unsigned binaries, no deployment ceremonies.

The four-day release timeline organized by theme

The result is a delegate mode that handles the full lifecycle: you describe a task, the extractive plan generator preserves every constraint, Claude executes with full MCP access, the Task Complete card shows you what happened, and status-aware actions let you commit, push, or discard based on the actual git state. Every layer traces its operations. Every timeout respects the approval bridge. Every content push skips irrelevant test gates.

Nineteen releases is not a number we are proud of for its own sake. It is the number that was required to close the gap between "delegate mode works in testing" and "delegate mode works when real users send real tasks from their phones at unpredictable times." That gap was 19 fixes wide, and now it is closed.

Running an older version? Download CodePulse to get v2.3.47 with the complete delegate mode overhaul. The free tier includes approval mode, Telegram bridge, and auto-updates — upgrade to Premium to unlock delegate mode, AI commit review, Genius Supervisor, and voice input.