You were running a CodePulse session. The bridge process was alive on port 18321. The /health endpoint cheerfully reported hooksTriggered=42, version: "2.3.142", no errors. Your Claude Code terminal was busy editing files, running tests, executing Bash commands. And nothing — absolutely nothing — was reaching Telegram.
This is the story of that outage, the architectural diagnosis behind it, and the four releases that made the failure mode structurally impossible. It is also the story of the smallest possible change to fix it: not a feature, not a refactor — a single file dropped beside a script.
The symptom: hooks ran but the bridge never saw them
The user noticed it mid-session. Code-mode tool calls — Bash(echo test), Edit(file.ts), Read(./README.md) — used to surface in Telegram as auto-approval toasts within milliseconds. Suddenly, silence. No toasts. No approval cards. No errors. The bot was alive; the approval bridge was responsive; the hooksTriggered counter on /health was even incrementing once per minute as some MCP server somewhere phoned home.
But the user's actual interactive Claude Code session was producing zero hook calls into the bridge. From the bridge's perspective, the session didn't exist.
The first instinct on seeing this was wrong. We assumed the bridge had a bug — maybe a regression in the TAB-642 cross-session contamination fix, maybe the /select Code/Real-life selector had wedged into a stale state. Two hours of bridge debugging produced zero progress. Every code path looked correct. Every assertion held.
Then we looked one layer down: at the hook script itself.
The diagnosis: env vars are the wrong primitive
The CodePulse hook script (codepulse-pulse.ps1 on Windows, codepulse-pulse.sh on Linux/macOS) is what connects Claude Code to the bridge. Claude Code fires the script on every tool-call lifecycle event — PreToolUse, Stop, SessionStart, etc. — and the script POSTs the event to localhost:18321/approval-api. No script firing → no bridge POST → no Telegram surface.
How does the script know where to find CodePulse? Pre-fix, it had a three-tier resolver chain:
- Check
$env:CODEPULSE_HOME(the user-scope environment variable set at install time) - Fall back to reading the same value from
HKCU:\Environment\CODEPULSE_HOMEin the registry - Fall back to inferring the install location from the script's own path
If all three failed, the script silently exited at line 47 — no log, no diagnostic, no /health change. This was deliberate: hooks must NEVER block Claude Code, and any unexpected error inside a hook should resolve to a no-op exit. But it had a brutal failure mode. If the env var disappeared, every layer of the chain fell over silently and the bridge stopped seeing hooks.
We checked the user's machine. [Environment]::GetEnvironmentVariable('CODEPULSE_HOME', 'User') returned empty. Get-Item HKCU:\Environment showed no CODEPULSE_HOME key. The install log from earlier that morning confirmed setx had successfully written it. Six hours later, it was gone.
The likely culprit: any one of the well-known Windows environment-variable failure modes. setx's 1024-character Path-truncation race that clips other variables when the cumulative Path length exceeds the threshold. Antivirus scanners that periodically wipe HKCU. Corporate profile-cleaner tools that reset user-scope settings. Elevation context drift between admin and non-admin sessions. Any of these will silently delete environment variables that the user thought were permanent.
The architectural error was deeper than any specific failure mode, though. An environment variable is a runtime-discoverable global. It assumes the operating system will preserve a piece of static install configuration across sessions, processes, and time. Operating systems make no such guarantee. Environment variables are convenient but fragile — and CodePulse was treating one as load-bearing.
The fix: an anchor file beside the script
The replacement primitive was the simplest possible. The installer writes a single text file — codepulse-home.txt — next to the hook script. Inside it: one line, the absolute path to the CodePulse install. The hook reads that file as resolver tier #1, before anything else.
$AnchorPath = Join-Path $PSScriptRoot 'codepulse-home.txt'
if (Test-Path -LiteralPath $AnchorPath -PathType Leaf) {
$AnchorValue = (Get-Content -LiteralPath $AnchorPath -Raw -Encoding UTF8).Trim()
if ($AnchorValue -and (Test-Path -LiteralPath $AnchorValue -PathType Container) -and
(Test-Path -LiteralPath (Join-Path $AnchorValue '.env') -PathType Leaf)) {
$CodePulseHome = $AnchorValue
}
}
Why this works where the env var didn't: the file lives in the same directory as the hook script. Whatever installed the hook script also installed the anchor file. They share a lifetime. There is no operating system layer between them. An antivirus scanner wiping the registry doesn't touch the file. A profile-cleaner resetting environment variables doesn't touch the file. The file is gone only if the hook script is also gone — at which point Claude Code can't run it anyway.
The env var didn't disappear from the resolver chain. It moved to tier #2, then registry to tier #3, then script-location inference to tier #4. The chain still works for legacy installs that pre-date the anchor file. But all four tiers are now fallbacks — the anchor file is the primary path.
The self-heal flag: surfacing the silent failure
The original outage was made worse by /health reporting hooksTriggered=42 while the bridge had received zero hooks from the user's session. The number incremented from unrelated MCP traffic, masking the fact that the user-facing surface was dark.
The fix was a diagnostic flag the hook stamps on resolver-chain failure. When all four tiers fail, the script atomically writes a JSON file to %APPDATA%\CodePulse\diagnostics\hook-unresolved.flag (or ~/.codepulse/diagnostics/hook-unresolved.flag on Linux/macOS) containing the timestamp, the list of resolver tiers tried, and the script's own location. The bridge's /health endpoint reads this flag and surfaces:
{
"hooksHealthy": false,
"hooksReason": "resolver-failed:anchor-missing,env-empty-or-invalid,inference-no-env"
}
The flag is written atomically — printf > tmp.$$ followed by mv -f, which is a POSIX-atomic rename on the same filesystem. A concurrent reader hitting the unlink-rename window observes either the old content or no file at all (treated as healthy by the reader), never a torn JSON parse. The flag is also self-clearing: every successful hook run removes it, so a healthy hook restores hooksHealthy: true within one Claude tool-call.
If the original outage had happened on a system with this flag in place, /health would have surfaced the failure within minutes — not after two hours of bridge debugging.
Why the install no longer breaks itself
The other side of this story is what happens when the anchor file gets written. The installer's Step 5 used to set CODEPULSE_HOME in the registry; if that step failed (transient antivirus lock on HKCU:\Environment, locked ~/.bashrc because of a sync conflict, restricted-shell environment), the installer routed through a Read-FailureChoice prompt that defaulted to Abort in silent mode and triggered a full rollback.
Steps 1 through 4 of the install — copying the hook script, writing the anchor file, registering hooks in settings.json, validating Bash permissions — would have completed successfully. The hook would have been functional. But a transient antivirus lock that lasted three seconds during Step 5 would silently uninstall everything and exit-1 the installer.
The fix mirrors the same pattern the approval pipeline uses for transient bridge errors: warn-and-continue. A registry-write hiccup at Step 5 now logs a WARN line in the install log and proceeds to the summary. The exit code becomes 2 (partial completion with warnings) instead of 1 (rollback fired). The NSIS-driven user-facing wizard treats exit code 2 as "completed with warnings" — the user sees a finished install with a small caveat in the support log, not a failed install with a confusing rollback message.
The install rule is now: never destroy a working install over a transient persistence hiccup. The anchor file makes the env var optional; the env var's failure mode should not be fatal.
Cross-platform parity: porting the same fix to bash
The Linux and macOS hook (codepulse-pulse.sh) had the same architectural fragility, with different specific triggers: profile-cleaner sweeps wiping ~/.bashrc, sudo invocations stripping environment variables, stale shell sessions whose env was loaded before install, systemd user units or launchd plists with restricted env passthrough. We built the same four-tier resolver chain in bash, with the same atomic flag write, the same self-clear semantics, and the same JSON payload format that the bridge's /health endpoint can consume.
The implementation differs in three deliberate places. POSIX has no analogue to the Windows registry, so the bash port is four tiers (anchor → env → script-location inference → fail-with-flag) where Windows is five. The flag location is ~/.codepulse/diagnostics/hook-unresolved.flag instead of the %APPDATA% path. The atomic rename uses mv -f instead of PowerShell's Move-Item -Force.
The bridge's readHookHealthFlag() was extended to probe both paths — Windows %APPDATA% and Unix $HOME/.codepulse — and pick the freshest by mtime. On a developer's machine running both shells (Windows host with WSL or git-bash), the latest hook signal wins regardless of which platform produced it. On tied mtime, Unix wins, so a developer working primarily in Linux/macOS sees their host shell's signal first.
First Linux: validating bash code from a Windows-only dev box
The CodePulse dev environment runs on Windows. The bash hook had been tested under git-bash on Windows — bash 5+ but with Windows path semantics — for the entire history of the project. git-bash is close to but not identical to real Linux bash, and the gap is exactly where cross-platform regressions hide.
We added a new GitHub Actions workflow — tests-sh-linux.yml — that runs on ubuntu-latest and validates two things: the bash syntax of every modified .sh file (bash -n), and the new TAB-668 test file that exercises the resolver chain. We deliberately scoped the Linux job narrowly to dodge the documented Vitest mock-hoisting flakes that prevent the full TypeScript test suite from running on Linux without false negatives.
The first run of the new workflow was green on the v2.3.147 commit that introduced the bash port. That was the project's first real-Linux green check. Until that point, the bash hook code had been shipping without ever having been validated on a real Linux runner.
The workflow is currently a non-gating early-warning system — it runs on every dev push but doesn't block semantic-release tagging. After a soak period of stable runs, it will graduate into the release gate chain.
What "structurally impossible" means
The original outage had three independent failure modes — env-var disappeared, registry-write failed, script-location inference produced wrong path. Each was rare. All three triggering simultaneously, on a single user, in a single session, was rare-cubed. But "rare" is not "impossible," and the cost of the failure was a silently dark Telegram surface that took two hours of debugging to diagnose.
The same situation today fails differently. The env var disappearing is irrelevant — anchor file wins. The registry getting corrupted is irrelevant — anchor file wins. Script-location inference producing the wrong path is irrelevant — anchor file wins. All three of those failure modes can fire simultaneously and the user sees no impact. If the anchor file itself somehow disappears, the chain falls back to the same three tiers it relied on before, and if all four fail, the diag flag lights up /health within milliseconds.
The architectural lesson generalizes beyond hooks. Static install configuration belongs in static files, not runtime-discoverable globals. Env vars and registry keys are two layers of operating-system state that you don't control. The file system is one layer of operating-system state that you do control. When the installer places a script, it should also place every piece of configuration that script needs to find. Self-contained at the install site is more robust than every alternative.
The four releases
The story shipped across four versions in 12 hours. v2.3.145 introduced the anchor-file resolver and the diag flag. v2.3.146 collapsed three duplicate path-resolution blocks in the installer to a single canonical variable, and demoted the env-var write step from fatal-with-rollback to warn-and-continue. v2.3.147 ported the entire architecture to the bash hook and added the first real-Linux CI workflow.
Each release shipped with adversarial review (twice for v2.3.147 given its cross-platform risk) and a dedicated regression test that pins the specific failure mode the release prevented. Each release ships its own changelog entry, its own commit, its own release tag — strictly linear dev history matching every prior CodePulse release. Zero force-pushes, zero merge bubbles, zero pre-existing test regressions.
The user-visible surface looks the same. /select Code still picks the Code mode. /select Real-life still picks Real-life. Tool calls in your Claude Code terminal still surface as Telegram approval cards. The difference is that the layer underneath is now structurally hardened against the failure that triggered the EPIC.
Ready to put your Claude Code sessions on a hardened foundation? Download CodePulse and run the v2.3.147 installer — the anchor-file resolver lands automatically. The free tier includes the full hook resolver and /health observability. Upgrade to Premium for AI commit review, voice input, and the rest of the platform.