The fix that broke half our users (and the one that brought them back)

The first sign came as a single message from a long-time Code-mode user. "Stopped seeing my hook approvals in Telegram after the last update. Did something change?" One report could be a config drift. Two reports — same day, different users — was a signal. Six reports by the next morning was an outage. We'd shipped a fix the previous afternoon. The fix had silenced exactly the thing it was supposed to protect. This is the story of how we got there, what the symptoms looked like, and the precision gate that finally drew the right line.

The leak that started it

A few days earlier, a Real-life mode user reported a strange noise in their Telegram chat. They were talking to Gmail through CodePulse — summarize my unread emails — and every few seconds, an auto-approval toast popped into the chat. "Auto-approved: Bash(git status)." "Auto-approved: Edit(file.ts)." The user was baffled. They weren't running Claude Code. They had no active Code-mode session. But Code-mode hook decisions were appearing in their Real-life chat.

The cause turned out to be cross-session contamination. CodePulse runs a hook server that intercepts every tool call across every Claude session on the machine. When the hook server resolves a decision (allow, deny, defer), it fires a callback that surfaces the decision in Telegram. The callback was global. It didn't care which session the hook came from or which mode the active Telegram topic was in. If any hook fired anywhere, the toast went to whichever topic was currently focused in Telegram.

That was fine when there was only one mode. With Real-life mode in the mix, it was a leak. The fix seemed obvious: if the active Telegram topic is in Real-life mode, suppress the Code-mode hook toasts. We shipped it. The leak stopped. The user's Real-life chat stayed clean. We moved on.

The fix that broke half our users

The next afternoon, the messages started coming. Code-mode users were saying that hook approvals had stopped appearing in their Telegram chats. Not "rare hook approvals" — all of them. Every Bash command, every Edit, every WebFetch. The auto-approval toasts that used to surface in Telegram were now silent. The Code-mode interactive flow — see the AI's tool call, approve or deny in Telegram — had effectively gone dark.

We pulled the logs. The hook server was running. The hook callback was being invoked. The callback returned, and... nothing reached Telegram. Somewhere between the callback firing and the message rendering, the toasts were being swallowed.

It took an hour of staring at the diff to find it. The fix from the previous day had been simpler than it should have been. The condition we'd written was: "if Real-life mode is active anywhere in the bot, suppress non-delegate hooks." The intent had been to protect Real-life chats from Code-mode chatter. The implementation suppressed Code-mode hooks for everyone, in every chat, the moment Real-life mode existed in the system. Since most CodePulse users had Real-life mode enabled at least somewhere — a different topic, a different group — the suppression fired for nearly every user.

The symptom: Code-mode hook auto-approvals stop reaching Telegram after the previous day's leak fix

The leak was sealed. The room was also sealed. We'd cured the noise by closing the door, and users couldn't get out either.

What we learned from staring at the wreckage

The instinct after a regression like this is to roll back. Roll back was the wrong move here. The leak that the fix had been targeting was real. Rolling back would have brought the noise back to Real-life users. We needed a fix that surgically separated the two cases — Code-mode hooks for Code-mode chats; suppressed hook noise in Real-life chats — without falling back to the global behavior.

The diff revealed the underlying problem. The boolean we'd checked, realLifeMode, was a global. It was true if any topic anywhere in the bot was currently set to Real-life mode. It told us nothing about the specific session whose hook had just fired, and it told us nothing about the specific topic the toast would have rendered into. The condition operated at the wrong granularity.

A better condition needed three things. First, it needed to know which session the hook belonged to. Second, it needed to know which mode that session was running in. Third, it needed to be able to ask "is the user currently focused on a Real-life chat that this hook would interrupt?" — independent of any other Real-life chat the user might also have open elsewhere.

Of those three things, only the first was easy to plumb through. The hook callback receives a session ID. We can map the session ID to the bridge process that spawned it. The bridge process knows which mode it was started in. So "is this hook from a Real-life session?" was a query we could answer correctly with a few lines of plumbing.

The harder question was the third one — "is the user focused on a Real-life chat?" For that, we needed the concept of an "active gate."

The active gate

The active gate is a state-machine variable that tracks, per Telegram chat, whether Real-life mode is currently attached — meaning the user has actively switched into Real-life mode in this chat and a Real-life session is running. The gate is local to the chat (each Telegram thread has its own), and it transitions deliberately.

When the user runs /select Real-life, the gate flips to attached. When they run /select Code or /select Exit, the gate flips to detached. The gate is not affected by what happens in other chats. It is not affected by what happens in other topics. It tracks one thing: is Real-life mode the active mode for this specific Telegram surface right now?

With the gate in place, the suppression logic became precise. A hook fires. We know its session, and through the session, its mode. We know the chat it would render into, and through the chat, whether the active gate is attached. The suppression rule becomes: only suppress when the hook is from a Code-mode session AND the destination chat has the Real-life gate attached.

That rule has the shape we wanted from the start. Real-life chats stay quiet. Code-mode chats stay informative. The leak across the modes is sealed without the rooms being sealed.

The active-gate state machine: per-chat attached/detached transitions on /select commands, gating cross-mode hook suppression

Implementing the gate

The implementation took a few hundred lines spread across the bridge dispatcher, the hook resolver, and the topic state store. The gate variable lives in the topic state record alongside the active mode and the active session ID. The /select command handlers update the gate when they update the mode. The hook callback consults the gate when it decides whether to surface or suppress a toast.

The trickiest part was the transition handling. A user can flick between modes faster than hooks can finish — /select Real-life, run a query, /select Code, /select Real-life again, all inside ten seconds. The hooks from the in-flight sessions can outlive the mode changes. We had to decide, for each hook, whether to gate it on the mode at the moment the hook fired (request time) or the mode at the moment the toast would render (render time). We picked render time, with a snapshot taken when the toast is queued, so a mid-flight mode change can't reroute a half-rendered toast into the wrong chat.

The other subtle part was making the gate observable. If the gate gets stuck attached when the user thinks they've exited, hooks will be silently suppressed and the user will think CodePulse has gone dark again. We added a tiny diagnostic: every time a hook is suppressed by the gate, the suppression is logged with the session ID, the chat ID, and the gate state. If a user reports "my hooks are silent," we can grep the log and see immediately whether the gate is the cause.

The gate ships with sensible defaults. If a chat has no recorded gate state — for example, the user has never run /select in that chat — the gate defaults to detached. The suppression rule is "suppress only when attached," so the default behavior is to surface hooks, not suppress them. New users see Code-mode hooks immediately, the way they should.

Why this fix held when the previous one didn't

The previous fix was a single-line condition: if (realLifeMode) suppress. It treated Real-life mode as a global property of the bot. The active gate is a per-chat state machine with explicit transitions. It treats Real-life mode as a property of a specific Telegram surface, queryable independently from anything else happening elsewhere.

The single-line condition was easier to write. It was easier to read. But it operated at the wrong level of abstraction. The bug it was trying to prevent — Code-mode noise leaking into Real-life chats — was a per-chat phenomenon. The condition that fixes a per-chat phenomenon has to be a per-chat condition. Anything global will either over-suppress or under-suppress.

This is a recurring pattern in CodePulse's architecture. The temptation to use a global flag is strong, especially when the feature seems "modal" — the bot is in Code mode or Real-life mode. But "the bot" is not a single user-facing surface. Each Telegram chat is its own conversation. Each topic is its own context. The right granularity for any feature that affects user perception is the granularity of the surface the user is looking at.

We've started encoding this as a design principle for the bridge: if a flag controls behavior across multiple user-facing surfaces, the flag is operating at the wrong scope. Replace it with a per-surface state machine before it ships.

The lessons we wrote into the runbook

A regression that's caught in 18 hours is the system working. It's not a victory, but it's not a disaster either. We got reports, we found the cause, we shipped a fix. The total user-facing dark window was less than a day. What hurts is when a regression goes uncaught for weeks — and that almost always happens when monitoring is silent on the relevant signal. Add the diagnostic at the same time you ship the fix.

A fix that "seals the leak" but also seals the room is not a fix. Suppression at the wrong granularity feels safe — you're choosing the conservative side of an ambiguity — but it shifts the bug from "noise leaking out" to "signal not getting through." Both are bugs. The fix has to address the leak without erasing the signal.

The condition you write reflects the granularity of your model. If you write if (mode === 'rl') at module scope, you've decided that mode is a global property of the module. That decision will hold or break depending on whether your users actually experience the feature globally. For a multi-chat bot, the decision broke. We replaced it with if (chat.activeGate === 'attached'), and the model finally matched what users actually experience.

Diagnostics are part of the fix, not an add-on. The first thing we did after the active-gate ship was add the suppression-event log line. Without it, the next regression — and there will be one — would be much harder to diagnose. The log is cheap. The next outage is expensive.

Where the active gate is now

The active gate has been in production since v2.3.135, and the regression class hasn't reappeared. Code-mode users see their hooks again. Real-life-mode users see their conversational threads stay clean. The two modes coexist in the same Telegram surface without bleeding into each other.

The gate has also unlocked features we couldn't have built before it. Per-topic policy stores use the gate to decide which store to read on every approval decision. Slim Real-life cards check the gate to decide whether to render the chrome-heavy or chrome-light version. The gate has become the load-bearing primitive for any feature that needs to know "what mode is this user actually using right now."

That feels right. A primitive that started as a regression fix is now the thing that keeps several other features honest. The hour we spent debugging the silenced toasts paid for itself many times over.

The active gate is shipping in CodePulse v2.3.135 and later. Download CodePulse to get the precision-gated build, run /select Real-life and /select Code to feel the per-chat boundary, and check the features page for the rest of what the gate enables. Premium plans on the pricing page include AI commit review and the full Real-life toolkit.