Tim Trailor

/autonomous

A persistent task runner with a delivery guarantee.

The sentence that forced this was "email me when you are done". I had left the laptop running overnight on a long task, closed the lid, and assumed the agent would email me when it finished. No email arrived. The task had completed, or had probably completed, and the session had died somewhere in the process. I had no record of the result and no way to recover it.

/autonomous is the skill I now invoke instead. It kicks off a background runner that executes the task, retries on failure, and is responsible for delivery. Delivery is not "the agent said it sent an email"; it is "the SMTP server returned 250 OK to this script, and the script wrote a local file as a fallback in case the email also fails."

The runner reads three things in order: its task description, the results of the last attempt, and a running log. On each wake it decides whether the work is done, whether to retry, or whether to escalate. "Escalate" means send me an ntfy push on my phone saying the autonomous task is stuck. This has happened twice; both times the escalation was the thing that made me go and look at what had gone wrong.

The discipline the skill enforces is small and boring. Write the result to a file. Send the email from the runner, not from the agent. Keep a log. Do not trust that the parent session is still alive when the work finishes. It has run jobs as short as forty seconds and as long as thirteen hours. None have silently failed.

Full essay: "Email me when done": a persistent task runner with a delivery guarantee →

/debate

Three AI models, same question, parallel answers, then informed rounds.

Claude, Gemini 2.5 Pro, and GPT-5.4. Three models from three different families. /debate sends the same question to all three at once in Round 0, with no knowledge of what the others will say. Round 1 onwards each model sees the other two answers and is asked to respond. I read the final round.

Blind Round 0 is the part that earned the skill its place. It catches the cases where two models would have converged toward a wrong answer if they had seen each other early. A single-model review says "this looks fine". A debate opens with three separate reads and forces the disagreements out before any consensus forms.

The value is not speed. A single-model review is faster and cheaper. The value is that the debate has caught two categories of error that single-model review missed in my own usage: a plausible-looking migration plan that would have lost data, and an architecture proposal that recreated a previously-deleted pattern. Both were caught in blind Round 0, by the two models the primary was not.

I run /debate before any irreversible architectural decision. I do not run it for small edits or quick fixes. The confidence trap is real: three models agreeing feels like strong evidence, and it often is. When they agree incorrectly, they agree confidently. The protocol is not a substitute for me reading the answer.

Full essay: Three-way AI model debate as a pre-commit gate →

/dream

Nightly consolidation that promotes session insights into canonical memory.

The memory system has two tiers. The canonical tier is a set of hand-curated topic files in plain Markdown. They are the source of truth. The ambient tier is every session's compressed conversation log, indexed by both semantic embedding and keyword. The ambient tier is large and noisy. The canonical tier is small and curated.

/dream is what connects them. It runs once a day. It reads the sessions that have happened since the previous run. It identifies material that has stabilised into a genuine pattern and should be promoted into the canonical tier: a new feedback rule, a lesson from an incident, a fact that has been mentioned in three sessions and is clearly now load-bearing. It drafts the promotion as a diff against the relevant topic file and either applies it or queues it for me to review.

The skill has a specific failure mode I tune it for: over-promotion. A session in which I complained about a tool once is not a pattern. A session in which I corrected the agent twice on the same point is. The drafting prompt asks for evidence of repetition or confirmation before any promotion is accepted. Without that filter the canonical tier starts to rot.

Memory that does not consolidate becomes memory that cannot be searched. The canonical tier is the part I read; the ambient tier is the part I search when I need something specific. /dream is the pipe that keeps both honest.

Full essay: Memory that sleeps: a tiered memory architecture with daily consolidation →

/deep-context

Pre-assemble the context file before the task starts.

A context window is a budget. The mistake I kept making in the first few months of running an agentic environment was treating it as a container you fill, rather than as an artifact you manufacture. Files got read on demand, relevant passages pulled from memory mid-conversation, and the window filled up with whatever the agent happened to look at first. By the time it was getting to the actual task the budget was half spent on things the task did not need.

/deep-context inverts the order. Before the task runs, a planning pass builds a task-specific context file: the relevant code extracts, the applicable lessons, the prior related decisions, the specific constraints. The file is bounded in size. It is checked against a quality bar before the task is allowed to start. The task itself then runs as a sub-session that reads only this file and the brief.

The sub-session cannot go off and read random files. It cannot exceed its context budget because the budget is the file. It cannot forget the relevant prior decisions because those were included up front. What it produces is scoped to what the file said. If the file was wrong, the output is wrong. That is now a visible failure rather than a silent one.

The shift from "hope the context ends up right" to "build the context deliberately" was the single largest quality improvement I made in my own use.

Full essay: Context as a first-class artifact: the /deep-context pipeline →

Lessons as code

Twenty-three postmortem patterns, checked before every non-trivial change.

I keep a file called lessons.md. It has twenty-three numbered patterns in it. Each pattern is a specific way I have broken my own system and the rule I now apply to prevent a recurrence. Every pattern has happened to me at least twice. The file is read at the start of every session.

Before any non-trivial change, a skill called /lessons-check compares the proposed change against the patterns and flags any match. "You are about to add a LaunchAgent with no health check. Pattern 7: silent daemon failure, 2026-02-14 and 2026-03-31." The flag is a stop, not a warning. I have to decide whether the match is real before the change is allowed to ship.

The patterns themselves are not clever. They are boring, specific, and backed by a dated incident. "Do not hardcode paths in settings.json that differ between the laptop and the Mac Mini." "Never trust that a process is running as evidence that the feature works." "If a helper script rewrites an OAuth token, the backup goes in /tmp, never in the project folder." The value is that they get checked, not that they are novel.

The portable part is the pattern itself, not my specific list. Any agentic development environment accumulates incidents. The question is whether the next change knows about them. A file plus a pre-flight skill is the thinnest version of that answer I have found.

Full essay: Lessons as code: turning postmortems into pre-flight checks →