OpenAI's Symphony post made the agent future feel less like better prompting and more like a new way to organize work.
Read experiment →The Log
Every experiment. Failures included.
The full archive lives here. The homepage is the same idea with almost no furniture around it.
This week reminded me that the hard part is not making the product work. It is making people get it right away.
Read experiment →I started exporting my dictations and realized I am not really taking notes anymore. I am working out loud.
Read experiment →I adapted Andrej Karpathy's AutoResearch loop and pointed it at my diarization error rate. Went from 34.1% to 19.2% — a 44% improvement. No new model, no new features. Just a loop that found the one knob that mattered.
Read breakthrough →I said I'd build the diarization AutoResearch loop. I built it. The loop is running right now. Baseline DER: 34.1%. Here's the full story including all the stuff that broke.
Read experiment →every.to just shipped plus one. the personal agent pattern is converging fast and i'm not sure anyone has the moat they think they do.
Read experiment →i have 6 claude agents running overnight. the competitor digest is great. the content batch can't save files. the idea collision engine can't find the vault. shipping is still manual.
Read experiment →karpathy's autoresearch ran 700 experiments in 2 days. my app breaks with more than 2 speakers. what if i pointed that loop at my problem.
Read experiment →Found a library claiming 7x faster speaker diarization than pyannote on CPU. Haven't tested it yet. But if it's real, Transcripted's pipeline changes.
Read experiment →Notion shipped custom instructions for AI meeting notes plus background mode. When the gorilla enters your space, you have to think about what you actually are.
Read experiment →rewrote draft's entire ui from swiftui to appkit. should have done it weeks ago.
Read experiment →Found a Show HN that's basically Transcripted. Local transcription, Obsidian vault integration, Claude Code skills. Same niche, same tools.
Read experiment →Six agents running overnight. Zero features shipped to the actual product.
Read experiment →Scheduled Claude Code tasks ran overnight on Transcripted while I slept. Security audit, CLAUDE.md sync, refactor sweep, code review. Four PRs. All merged before I had coffee.
Read experiment →I spent weeks trying to get a local agent system reliable. It wasn't. Now I'm trying something completely different.
Read experiment →The floating pill is completely rebuilt. Onboarding is smarter. Model downloads actually tell you what's happening. And the nightly agents caught two security issues I would have missed.
Read experiment →I've been building the wrong version of this app.
Read experiment →I spent most of today trying to get a local AI agent system running and breaking it in new ways every hour.
Read experiment →Shipped Transcripted v0.3.0 — replaced Sortformer with PyAnnote for offline diarization, cut onboarding from 6 screens to 3, cleaned up the floating pill and transcript UI, and fixed a batch of edge-case bugs.
Read breakthrough →Incremental improvements to things that shipped rough. The core is solid. The surface layer is catching up.
Read experiment →A Claude Code session that started with 'install Draft' and ended with a feature I might scrap entirely.
Read experiment →Shipped Transcripted v0.2.0 — aurora recording animation is now the only UI, removed idle branding for a cleaner pill, and the whole ML pipeline processes 11 minutes of audio in 5.4 seconds on M5.
Read breakthrough →transcripted.app is live. Free meeting recorder for Mac. 100% local, no cloud, no bot in your meeting. Speaker diarization that learns your team.
Read breakthrough →Bought the domain. Connected Cloudflare. Deployed from GitHub. The whole thing took one Claude Code session. First green bar for the site itself.
Read breakthrough →Ran 50 users through Draft's onboarding flow. 12% made it to their first generated message. The mic permission prompt is killing momentum.
Read experiment →Zero paid conversions in the first week at $9/month. The free tier is too good — nobody hits the paywall. Time to rethink the wedge.
Read experiment →After 200+ accepted drafts from 10 beta users, Draft's style matching hit 94% accuracy. The RLHF loop is working. Users are editing less over time.
Read breakthrough →GUIs were always a compromise. Now that AI agents are real, the command line is back — not because developers are nostalgic, but because it's the only interface that actually works for AI.
Read experiment →