← Back
● Experiment

The memory layer

I’ve been building Transcripted as a meeting transcription app. That’s the wrong framing and I only just realized it.

I ran 63 minutes of meeting audio through the pipeline last night. Full transcript, every speaker identified by name, timestamps on every utterance. Done in 20 seconds. Nothing left my Mac. That part I knew.

What I hadn’t fully thought through is what that transcript is actually for.

It’s not for reading. Nobody reads their meeting transcripts. It’s for your agent. When you have a personal AI that knows your context — and everyone will, sooner than most people think — it needs to know what you decided in that meeting three weeks ago. What you promised. What the other person said they’d do. Who Sarah is and why she matters.

That’s what Transcripted actually produces. Not a document. A context feed. Speaker-labeled, timestamped, persistent identity across meetings because of the voice fingerprinting. The kind of structured memory that a personal agent can actually use.

I’ve been describing it as “local meeting recorder.” The more accurate description is: the memory component of a local agent stack.

This matters because the whole privacy argument only holds if the stack is local end to end. Granola just pushed their users to MCP. Their MCP server feeds their cloud data into your agent. That’s not private — that’s just a different company in the loop. Transcripted’s answer to that is: your data never left your machine in the first place. There’s nothing to pipe.

The red bar: I shipped the right thing for the wrong reason. The transcription quality, the speaker diarization, the voice fingerprinting — all of that is correct. But I was optimizing for “good transcript” when I should have been optimizing for “context an agent can use.” They’re related but not the same.

The pivot is mostly framing. Some of it is features — richer JSON output, better metadata, making it trivial to point any agent at the folder. But mostly it’s just being honest about what the app is for.

Agent-first, local-first, context layer. That’s the product.