Building the agent system

I spent most of today trying to get a local AI agent system running and breaking it in new ways every hour.

The plan was simple: run Ollama on the M5, download Qwen and Llama models, wire them into Paperclip, have them run coding and writing tasks 24/7 for free. The M5 has 128GB unified memory. It should eat this.

Ollama crashed on every model. Turns out there’s a known Metal shader bug on M5 chips that’s been open since January. I didn’t know this until I’d already downloaded 32GB of models. Killed the downloads, reclaimed the space, pivoted to MLX which was running fine the whole time on port 8800.

Then the agents wouldn’t actually do anything. The tool calling worked at the API level — I tested it directly and the models called shell commands correctly. But local-agent.js, the script that bridges Paperclip to the models, had three separate bugs: thinking tokens from Qwen3 were corrupting the JSON responses, two instances of the script were spawning at startup and racing each other, and the comment_on_issue and update_issue calls weren’t awaited so they fired and died silently. Agents were completing runs and reporting success while doing absolutely nothing.

Fixed all three. Scout started working. Engineer started working. Reviewer almost works — it still hits context limits on large files.

The honest question I keep sitting with: did I save time today? I probably spent six hours building and debugging this system. The first real task it completed — writing a blog post — took CEO about four minutes. That blog post also auto-published without my approval, which I had explicitly said I didn’t want.

Red bar. The system is real and the loop works. But I built it messier than I needed to and broke some things along the way.