Building the context up front: /deep-context

If you keep fixing the same problem and it keeps coming back in a slightly different form, search is often not enough. This post is about the Claude Code skill I built for that case: /deep-context.

The buttons in my iOS terminal app kept breaking this month. I’d fix them, the fix would hold for a few days, and then the same problem would come back in a new way. The most useful part of the story is why I needed a new tool at all when I already had two ways of searching my session history.

Why my existing search tools didn’t work

My setup already had two ways of searching its memory. One is a vector index, which is good for finding things that are conceptually similar even when they don’t use the same words. The other is a keyword index, for finding the exact text you’re looking for. Both are good at what they do, and neither could solve the buttons problem.

There are three types of question you ask a memory system.

Looking around (“have we worked on this before?”) suits vector search. It can find the right group of past sessions even if your search terms are vague.

Looking up (“what was the exact command we used for X?”) wants keyword search. You have a specific thing in mind, like a file path or a commit hash, and you want to find exactly where it appeared.

Joining the dots (“why does this keep breaking in the same way?”) gets you neither. The answer isn’t in one session; it’s in the connection between many of them. A search tool gives you the most relevant snippets. A joining-the-dots question needs every relevant snippet read together and reasoned about as a whole.

The buttons problem was a joining-the-dots question. Vector search gave me the most recent session about buttons. Keyword search gave me the chunks where the word appeared. Both gave a partial view, and neither showed the pattern.

So I needed a third tool: one that joins the dots up front, before the task starts, and gives the next session the answer as a single file. That’s /deep-context.

What the skill does

You run it as /deep-context <brief>.

It reads broadly across my history. A pool of Claude Opus agents runs at the same time, each given a slice of past session transcripts. Two more agents run alongside; one reads my curated memory files, the other walks the codebase. The output gets deduplicated, and a single agent then reads the deduplicated pile and pulls all the threads together into one context.md file. The main task is then spawned as a sub-session that reads only that file. It doesn’t need to search for anything, because the search has already been done.

How I got there

I got this wrong twice before I found a version that worked.

First attempt: compress every past session into a short summary. The summaries were small enough to load all at once, which was the appeal. The problem was that the summarisation process stripped out the very details I needed: commit hashes, file paths, line numbers, the exact wording of an error message. Everything in the summaries was technically true. None of it was specific enough to use.

Second attempt: still compress, but force the important things to be kept word-for-word. Each summary now had a required table of identifiers (file paths, commit hashes, error strings, commands), a list of failed approaches with their exact commands, and a few direct quotes from direction-setting moments. The summaries were dramatically better. The new problem was a more subtle one: I couldn’t easily tell from outside whether any given summary had quietly dropped something important. Without that confidence, the summaries weren’t safe to trust on a high-stakes task.

Working version: stop compressing in advance and read the raw transcripts at task time. The transcripts are first prestripped on disk, which means dropping bulky things like tool definitions and trimming long tool outputs to the head and tail, but keeping every message from me, every message from the AI, and every tool input exactly as it was. At task time the agent pool reads slices of these prestripped transcripts and pulls out everything relevant to the brief, with full fidelity and every identifier surviving. Slower and more expensive than either compression version, but the right trade-off for a high-stakes task.

The last step is a pull-together pass. Without it, what came out of the pipeline was a long pile of deduplicated facts in no particular order. A downstream session could find any single fact in it but couldn’t see the pattern across them. The pull-together is one extra Opus call. A single agent reads the deduplicated pile, names the patterns that span sessions, checks the brief’s starting assumption (and corrects it if it’s wrong), groups the fix timeline into rounds, and flags anything the sessions disagree about. Without it the pipeline produces raw material rather than an answer.

Two other Claude Code skills should inherit pieces of this. /compact is the in-session context compactor that runs when a conversation approaches the context limit; it currently paraphrases, which is the failure mode of the first attempt above at smaller scale, and the fix is the same prestrip transform: keep verbatim turns and tool inputs, trim only bulky tool outputs, preserve every identifier. /dream is the daily memory pass that promotes useful new bits into the main memory tier; it currently runs as a single-model summary over whatever it can easily find, and it should get the fan-out plus pull-together architecture, with the daily promotion as the pull-together output rather than a single-pass summary.

A worked example: the buttons that wouldn’t stay fixed

The buttons brief was a good test. It had a long history of attempts, where each one solved a real problem but only for a short time before things broke in a new way. The pull-together pass found two things that a fresh look had missed.

First, it explained the underlying problem. The button-detection logic has two parts running at the same time. One reads what’s on the screen (the text in the terminal). The other reads Claude’s own session log. Neither was treated as the main signal, and the fixes had alternated between fixing one part and fixing the other. A fix to the screen-reader part didn’t help the log-reader part, and a fix to the log-reader part didn’t help prompts that still relied on the screen-reader part. Until both parts were thought about together, the fixes were always going to bounce around.

Second, it caught a specific gap.

The app can receive a button tap in two places: from inside the app, and from the iOS lock screen (using a feature called Live Activity, which lets an app show interactive controls on the lock screen). I had recently added a guard against out-of-date taps for the in-app route, but not for the lock-screen one. So a tap from the lock screen on an old prompt would still leak a digit into whatever prompt came next. The pull-together pointed straight at the fix, which I made last night.

The pull-together also corrected my starting assumption. I had assumed that having Bash in the global allowlist switched off the permission prompt. The pull-together showed there were two settings that look the same in writing but do opposite things in practice:

Bash(*) is a wildcard. It auto-approves every Bash command, so the prompt never appears.
Bare Bash is ignored by recent versions of Claude Code. So every command falls through to a per-prompt approval, and the prompt fires constantly.

Both produce “the buttons don’t seem to work”, and any fix aimed at the wrong one passes the next manual test then breaks again as soon as the allowlist state shifts.

Neither the raw fan-out nor a single-session researcher caught this. The fan-out had the claims but didn’t connect them. The single-session researcher worked from a narrower slice of history and missed the bare-Bash-is-ignored detail.

When to use it, when not

/deep-context is expensive: a substantial Opus token spend, and tens of minutes of wallclock while the fan-out runs.

Use it when the task has a long history I can’t easily reconstruct (repeated failures at the same underlying problem, cross-repo changes, multi-session architectural decisions); when being wrong on this task has blast radius (migrations, new daemons, production data changes); or when a single-session researcher has already missed something and I don’t know what.

Don’t use it when the task is small, greenfield, or a quick question that doesn’t require writing or deploying anything.

A good test is whether a well-briefed colleague with six months of my context could answer the question in a single sitting. If yes, plain Claude is fine. If no, /deep-context.

What I would build next

When /compact fires, instead of compacting in place, it could spawn a sub-session with the compacted file as its brief. The current session ends, the new one continues. That’s the /deep-context pattern applied at the single-session boundary. I haven’t built it yet, but it’s the next change I’d like to make.

If you’ve built something similar, or have thoughts on how I’ve approached this, I’d like to hear from you. Email is in the footer.