How we keep your AI's Gmail habits from polluting your code reviews

You allow Claude to call list_emails three times in a row. The bot learns the pattern: when you're working with Gmail, this read tool is safe to auto-approve. Convenient. But what happens the next morning when you switch into a code review and Claude wants to call a tool literally named list_emails against a hypothetical Gmail-clone npm package? Should the auto-approval carry over? If the answer feels obvious — "no, of course not, that's a totally different context" — you've already grasped why CodePulse builds an isolation boundary into its approval system. The hard part is making that boundary structural, not aspirational.

This is the story of why approval-pattern leakage is a real risk, what a leak would actually look like in production, and the per-topic policy that makes the leak structurally impossible.

The naive design and why it fails

The simplest way to build an approval engine is one global pattern store. Every approval the user grants — Allow Bash(git status), Allow list_emails, Allow Read(README.md) — gets recorded in one big map keyed by tool name plus argument signature. Next time the same combination shows up, the engine consults the map and auto-approves if the user has approved this exact pattern enough times to clear a confidence threshold.

This works beautifully when the user has one AI workflow. CodePulse, however, runs at least two: the Code mode that drives your repo through Claude Code, and the Real-life Mode that drives your inbox, calendar, and other MCP services through a separate Claude session. Both modes run inside the same Telegram chat, both use the same approval-card UI, both call the same approval engine for routing decisions. And the moment two distinct workflows share one pattern store, contamination becomes inevitable.

The contamination doesn't even need to be malicious to be wrong. Imagine Claude is doing some code review work and it needs to call a tool that happens to share a name with a Real-life tool — a hypothetical read_file that exists in both your code editor MCP server and your Drive MCP server, with totally different semantics. With one global store, the user's three earlier Allow read_file taps in the Drive context unlock auto-approval for every read_file call in the repo context. The user never made that decision; the engine inferred it across a context the user would have wanted treated separately.

The variant of this problem that bit us in production was subtler. We had no name collision — different tool names everywhere. But we had a different leak: an interactive Bash ls 3× in Code mode taught the engine to auto-allow Bash ls for delegate work the user never authorized. The interactive session was a power-user editing files at their own desk; the delegate session was a long-running agent the user wanted to supervise tightly. Same tool name, same argument shape, totally different trust contexts. The pattern leaked because we hadn't separated the trust contexts.

Naive global pattern store: every learned approval shared across every workflow, contamination inevitable when contexts diverge

The four boundaries we ship

Once we accepted that "one global engine" was the wrong model, we drew four boundaries. Each one separates a class of approval state from another class. Each boundary is enforced structurally — meaning the code path doesn't even let cross-boundary state flow. None of them rely on developer discipline.

Boundary B1: read tools auto-learn, write tools never do. A pure read like list_emails or search_files is idempotent and reversible — at worst you ran a query you didn't intend, and the answer is harmless. A write like send_email or create_event is irreversible. We never auto-approve writes from learned patterns. Every write asks every time, full stop. This isn't really an isolation boundary in the spatial sense, but it's the first line of "what does the engine even consider learnable."

Boundary B2: explicit denials are sticky. When the user taps Deny, that decision persists for that pattern in that context until explicitly revoked. We never decay denials toward auto-approve, even if the user later accepts similar requests. Trust is asymmetric: it builds slowly through repeated approvals, it collapses immediately on a single deny.

Boundary B3: UI chrome stays in its mode. The Code-mode approval card with [Reply][Wait Quietly][Stop] chrome never appears in Real-life threads. The Real-life slim card with action verb and target never appears in Code threads. We discovered this leak the hard way during the Phase 6 redesign that split our bridge in two — once chrome started crossing modes, every fix was patching a boundary that should have been structural.

Boundary B4: learned approval patterns stay in their mode and topic. This is the deep one. The Code-mode ApprovalEngine.recordDecision store is physically separate from the Real-life policy store. Each Telegram topic — Gmail, Calendar, Linear, Code, whatever — maintains its own learned-pattern store. A pattern recorded in Gmail topic does not exist in Code topic, period.

How the read/write classifier works

Every tool call goes through a classifier before it reaches the policy engine. The classifier looks at the tool name and decides whether the operation is a read (auto-learnable, low-stakes, idempotent) or a write (never auto-learnable, irreversible, always asks).

The classifier rules are explicit and conservative. We start from a small allowlist of read tools — list_*, read_*, search_*, get_*, find_* prefixes — and treat everything else as write. We err toward false-write classifications because the cost of asking the user to approve a read they could have skipped is one tap; the cost of silently auto-approving a write is potentially unrecoverable. Edge cases like compose_template (creates a template object, no side effect) get manually classified into the read bucket.

For each tool that classifies as read, the policy engine consults the per-topic store. If the user has approved this pattern N times in this topic without revoking, auto-approve and toast. If not yet enough confirmations, ask. For write, the policy engine ignores any pattern history and asks every single time.

The rules surface in the slim approval cards so users always know what's happening. A read-with-pattern-learning shows a "this is auto-approving (3 of 3 prior allows)" footer. A write shows "this action is irreversible, approve carefully."

Per-topic policy stores

This is the structural piece. Each Telegram topic gets its own policyStore instance. The store is keyed on the topic ID at construction; the topic ID never changes for the lifetime of that thread. The bridge looks up the right store by reading the incoming hook's topic context and dispatches the approval through that store and only that store.

When the user is in their Gmail topic and approves list_emails, the engine records the pattern in policyStore[gmailTopicId]. When they switch to their Calendar topic and approve list_events, the engine records in policyStore[calendarTopicId]. The two stores share zero state. They live in different memory regions, get persisted to different rows in the local SQLite, and reload into different objects on bridge restart.

Per-topic isolation: Gmail, Calendar, and Code each have their own learned-pattern stores. No state crosses the boundary between them.

What about cross-topic operations — like asking Claude in your Calendar topic "find me an open slot, then email Jane to propose it"? Claude routes the calendar search through Calendar topic and the email send through Gmail topic. Each tool call is approved in its native topic's policy. The user might tap two Allow buttons (one for list_events, one for send_email), but each tap only writes to the topic that matches the tool. There is no path where Calendar topic's history can influence Gmail topic's policy.

How a leak would have looked, and how we know we don't have one

We pinned this guarantee with a regression test that does exactly the wrong thing — an interactive Code-mode session that approves Bash ls three times, then immediately checks whether a delegate (or RL) session has gained any auto-approve permission for Bash ls. The test asserts the delegate engine's pattern store is unchanged. If a future refactor accidentally points the delegate session at the global ApprovalEngine instead of the per-mode store, this test catches it before merge.

The test sits in tests/unit/approval-bridge-server.test.ts under the heading "Boundary B4 — interactive recordDecision must not leak into delegate engine." It's the test that ran on every PR through the entire RL phase rollout. It caught two near-misses during Phase 6 where well-intentioned refactors briefly unified the stores. The test caught both before merge. Without that test, the leak would have shipped silently and we wouldn't have known until a user noticed their delegate session running tools they hadn't authorized.

Why this matters for trust

A user is making decisions about how much trust to extend to an AI agent. The agent can execute tools that have real consequences. The decisions the user makes — Allow this pattern, Deny that one, ask me again every time — are commitments. The engine has to honor those commitments precisely.

If the engine generalizes a commitment too aggressively, the user starts losing confidence in their own decisions. They tap Allow and don't know which other future decisions they just made for themselves. That's worse than the friction of one extra tap. Trust in an approval system is built by being predictable about what each tap implies. The B4 boundary is one of the things that makes those implications predictable: you only ever approve the topic you're in.

There's a deeper reason too. The same Claude model is doing the work in every topic. The user's safety relies on knowing that the approval engine — the human-controlled gate between Claude and the world — applies the user's specific intent to the specific context. Bleeding between contexts is the failure mode that erodes the entire premise of supervised AI.

What we deliberately don't ship

A cross-topic "this user is generally permissive about reads" mode. Other tools ship that and call it convenience. We don't, because it explicitly violates the trust contract. If you want to extend a permission to another topic, you do it in that topic by approving the pattern there. The engine doesn't decide for you that approvals are transitive.

A "trust the agent" auto-mode for writes. Same reason. The whole CodePulse premise is that you supervise the agent. Allowing it to send emails or create calendar events without per-action approval would dissolve the supervision into a marketing claim.

Pattern decay over time. Approvals don't fade. We learned in early prototypes that decay made the engine feel arbitrary — a pattern would auto-approve for two days, then suddenly start asking again, with no obvious trigger. Predictable rules beat smart-but-opaque rules.

How to think about it as a user

When you tap Allow on an approval card, you're recording one decision in one topic. That's it. The engine doesn't extrapolate to other topics. It doesn't generalize to "similar" tools. It doesn't escalate from read auto-approval to write auto-approval, ever.

If you want to revoke an auto-approve pattern, the /policy command shows you the learned patterns per topic and lets you remove any of them. Each revocation also stays scoped to its topic. The mental model is: each topic is its own little world of trust decisions, and CodePulse never lets them mix.

The boundary work isn't visible in the same way that Real-life Mode itself is visible. You don't think about it when you're sending a Gmail draft; you just tap Allow and the message goes. But it's the reason you can tap that Allow with confidence — because three weeks ago when you let Claude run your code reviews, those approvals stayed in the code topic and never spilled into the topic where it now wants to send an email on your behalf.

Ready to use AI assistance with explicit, isolated approvals? Download CodePulse and configure your MCP toolkit through the /setup wizard. The free tier includes the per-topic approval policy and read/write classifier. Upgrade to Premium for AI commit review, voice input, and the rest of the platform.