agents
Multi-model debate, subagents, review pipelines.
- Story918MB, an Ofsted inspection, and a governor who is not a developerMy kids school was rated Requires Improvement and facing re-inspection. The evidence base was 1,650 files and 918 megabytes. No governor was going to read all of it. So we built a tool that could.
- EssayContext as a first-class artifact: the /deep-context pipelineStop hoping that relevant information will fit in the context window. Start manufacturing a task-specific context file before the task begins. The mechanism, the receipts, and the benchmark that gates it shipping.
- Essay"Email me when done": a persistent task runner with a delivery guaranteeLong-running tasks fail silently if the session dies before the result is ready. This is the runner I built to make "email me when done" actually mean that. Retry loop, fallback email paths, and a last-ditch file.
- StoryFrom model to agent: what changed when I stopped predicting and started investigatingWhy the regression models that came out of the hackathon got replaced within weeks by three agentic tools. The short version: probability scores without narrative are not what analysts need.
- StoryOne hour, one marketing listA vague ask ("give me a list of prospects that look like X") turned into a working pipeline across three data sources in under sixty minutes. A small build, but the speed is the point.
- EssayThree-way AI model debate as a pre-commit gate: receipts from several months of useRunning Claude, Gemini and GPT-5.4 on the same question in parallel, blind Round 0, informed from Round 1 onward. What works, what does not, and the confidence trap I did not predict.