How the transcript becomes notes
A recording lands in S3. From there to a clean recap is a short chain of steps, and only two of them use a model. First Amazon Transcribe turns the audio into text with speaker labels and a timestamp on every line. Then one Bedrock call writes a short summary, and a second pulls out the decisions and action items. Both model calls are told to use only the transcript and to cite the line behind every claim. Nothing here invents; the model’s whole job is to read what was said and lay it out clearly.
Key takeaways
- Amazon Transcribe makes the transcript — text with speaker labels and a timestamp per line.
- Two Bedrock passes: one writes the summary, one pulls out decisions and action items.
- Both passes cite the transcript line behind every claim, so nothing is invented.
- Long meetings (over ~an hour) use Claude Sonnet 4.6; normal ones use the cheaper Haiku 4.5.
- The transcript and the draft notes are saved to S3, so a run can be re-checked later.
The notes flow, per meeting
Transcription: text with speaker labels and timestamps
When a new file lands in s3://mn-recordings/, a small Lambda starts an Amazon Transcribe job. Transcribe reads the audio — and pulls audio out of video files on its own, so an MP4 works exactly like an M4A. It turns on speaker labels (Transcribe calls this speaker diarization — the feature that figures out who said what and tags each line), so the transcript reads like a script: Speaker 1: ..., Speaker 2: ..., each with a timestamp. If the sidecar named the attendees in speaking order, a small step maps Speaker 1 to a real name; if not, the names stay generic until the organizer fixes them at confirm time.
Transcribe writes its output JSON back to S3. A short Lambda flattens it into a plain transcript — one line per turn, each line carrying its timestamp — and saves that too. The flat transcript is what the model reads, and the timestamps are what let every action item link back to the exact moment it was said.
Two grounded passes, not one
It would be possible to ask the model for everything in a single call. The notetaker doesn’t, for a simple reason: a summary and an action-item list are different jobs, and splitting them keeps each prompt short and each output easy to check.
Pass 1 writes the summary. The prompt is short: “Here is the transcript. Write a summary of at most N sentences in the style described. Use only what is in the transcript. Do not add background the speakers didn’t mention.” The style and length come from the style doc, so the team controls the tone without touching code.
Pass 2 pulls the action items. The prompt asks for a list of decisions and a list of action items in a fixed shape: for each item, the task, the owner, the due date, and the transcript line behind it. It’s told explicitly: “Return JSON only. For each action item include the verbatim transcript line it came from and its timestamp. If the owner or the date wasn’t said out loud, set it to needs-confirm — do not guess.” That last instruction is the whole game, and Part 4 is about how it’s enforced.
Choosing the model by meeting length
A 20-minute standup and a two-hour strategy offsite are different problems. The short one is easy: Claude Haiku 4.5 reads it, writes a tidy summary, and pulls a handful of action items, for a fraction of a cent. The long one has more threads to follow, more half-decisions, and more chances to confuse who owns what — so for transcripts over about an hour, the pipeline switches to Claude Sonnet 4.6, which reasons more carefully across a long conversation. The cutoff is a number in the style doc; most meetings fall under it and use the cheap model.
Both models run on Bedrock through Global cross-Region inference, which spreads the call across regions for capacity without you managing any of that. The notetaker never holds a model open or runs one on a schedule — it calls Bedrock twice per meeting and that’s it.
What gets saved, and why
Three things land in S3 for every meeting: the Transcribe output, the flat transcript, and the draft notes (summary, decisions, action items, needs-confirm). All three are versioned. If a recap later looks wrong — an owner that shouldn’t have been on an item, a date that reads oddly — you can open the transcript and the draft and see exactly what the model had in front of it. Because every action item carries the transcript line and timestamp behind it, checking a claim is a matter of reading one line, not re-listening to the whole meeting.
Next post: the four checks that keep the notes honest — how cite-or-drop, owner resolution, date sanity, and the needs-confirm queue make sure nothing invented ever reaches the recap.
All posts