Transcripted v0.3.0: PyAnnote, simpler onboarding, UX cleanup

Biggest update to Transcripted yet. Four areas: new diarization engine, simpler onboarding, UX cleanup, and bug fixes.

PyAnnote replaces Sortformer for offline speaker diarization. This is the headline change. Sortformer worked, but PyAnnote gives better accuracy on overlapping speech and short turns — the exact scenarios that matter in real meetings. Still fully on-device, still runs in the same pipeline slot after Parakeet TDT V3 transcription. The switch was a clean drop-in: same input format, better output. Speaker detection went from “usually right” to “reliably right” on our test recordings.

Onboarding cut from 6 screens to 3. The old flow had too many stops before you could record anything. Consolidated the welcome, permissions, and model setup into three focused steps with real download progress bars so you know exactly where things stand. No more guessing whether the 1.2GB model is still downloading or stuck.

UX overhaul on the floating pill and transcript views. The floating pill got cleaner interactions — less visual noise when you’re in a call. Transcript viewing got a similar pass: tighter layout, better scroll behavior, more intuitive interactions when you’re reviewing what was said. Nothing radical, just removing friction that accumulated over the last few releases.

Bug fixes: capped utterance merge distance so the pipeline stops gluing unrelated sentences together, fixed micro-cluster absorption that was occasionally merging distinct speakers, corrected the footer speaker count, and fixed dynamic model retention so downloaded models don’t get garbage collected when they shouldn’t be.

Net result: the speaker detection pipeline is now built on a stronger foundation, and the app gets out of your way faster. Green bar — PyAnnote is the kind of upgrade where you swap one component and the whole system gets better.

Download: github.com/r3dbars/transcripted/releases