Part 4 of 7 · Transcription archive series ~5 min read

How a question finds the moment

Someone types “what did the Acme buyer say about the deadline?” into the search box. The archive has to turn that into a vector, find the closest chunks across the whole back-catalogue, drop anything the asker isn’t allowed to see, and write a short answer with a direct quote that links to the exact second. Get any of those wrong and the search is worse than useless: a quote from a call they can’t open, an answer with no source, a wall of half-matches. Four small gates sit between the question and the answer.

Key takeaways

  • The question is embedded with the same Titan V2 model the chunks used, so they’re measured the same way.
  • S3 Vectors returns the closest chunks, each with its recording, timestamp, and access tag.
  • The access filter drops anything the asker can’t see before the answer is ever written.
  • Haiku 4.5 writes a short answer with a direct quote — grounded only in the chunks it was handed.
  • The result links to the exact second in the audio, and the whole search is logged.
Four gates between a typed question and the answer with a quote A horizontal flow diagram. On the far left, a "Question typed" box: a person typed a plain-language question into the search box, and the request carries who is asking and which teams they belong to. Four gates sit in a row to the right, each drawn as a vertical bar. Gate 1: Embed query — the question is sent to the same Titan Text Embeddings V2 model used for the chunks, returning a 1024-number vector measured on the same ruler as everything in the index. Gate 2: Match — the query vector is sent to S3 Vectors, which returns the closest chunks across the whole archive, each carrying its recording id, its start time, the people, the topic, and its access tag. Gate 3: Access filter — the asker's teams are checked against each returned chunk's access tag; any chunk the asker isn't allowed to see is dropped here, before the answer is written, so a restricted quote can never leak. Gate 4: Answer compose — the top few allowed chunks are handed to Claude Haiku 4.5 with an instruction to answer only from these chunks, quote directly, and name the recording and time; if nothing relevant survives the filter, it returns "no matching moment found" rather than inventing one. After all four gates, the result ships to the search box: a short answer, a direct quote, the recording's name and date, and a play link that jumps to the exact second. A note at the bottom: every search is logged to DynamoDB with who asked, the question, and what came back. Question plain words, plus who is asking Gate 1 Embed query same model as the chunks used (Titan V2) question → vector Gate 2 Match chunks S3 Vectors finds closest chunks each with recording, time, access tag Gate 3 Access filter asker's teams vs access tag? drop what they can't see Gate 4 Answer + quote Haiku 4.5 reads allowed chunks + quote, name, time; or "none found" Result — short answer, direct quote, recording, play link no relevant allowed chunk → "no matching moment found" every search logged to DDB tx-searchlog — who, what, results Access is checked before the answer — a restricted quote never reaches the wrong person.
Fig 4. Four gates between the question and the answer. Embed the query. Match the closest chunks. Filter by access. Compose with a quote. Then ship the answer with a play link and log the search.

Gate 1: embed the query

The question is sent to the same Titan Text Embeddings V2 model that turned every chunk into a vector back in Part 3. That sameness is the whole point: because the question and the chunks are measured by the same model, “closeness” in that 1024-number space actually means “about the same thing.” A question embedded by a different model couldn’t be compared to the chunks at all — it would be measuring with a different ruler.

This is a single, cheap model call — a fraction of a cent. It’s the only embedding cost on the search path, because the chunks were already embedded once, at index time.

Gate 2: match the closest chunks

The query vector goes to S3 Vectors, which returns the handful of chunks closest to it across the whole archive — usually the top ten or twenty. Each returned chunk carries its metadata from index time: which recording it came from, its start time in that recording, the people, the topic, and its access tag. At this point the system has candidates, but it hasn’t yet checked whether the asker is allowed to see them, and it hasn’t written a word of answer. It just has a short list of the most relevant moments in the back-catalogue.

Matching by meaning is what makes this better than keyword search. The question “did they push back on price?” finds the moment somebody said “that number feels high for what we’re getting” — a match no keyword search for “price” would ever make.

Gate 3: the access filter

Before anything is shown or summarized, the search drops every candidate chunk the asker isn’t allowed to see. The request carries who is asking and which teams they belong to; each chunk carries an access tag. If the tags don’t match — a salesperson asking about an HR recording, say — the chunk is removed from the list. This happens before the answer model ever sees the chunks, which is the only safe order: you can’t accidentally quote from a recording that was filtered out before the quote was written. Part 5 covers the access model in full; the important thing here is where the filter sits in the flow.

If the filter removes everything — the only relevant moments are in recordings the asker can’t see — the search returns “no matching moment found” and logs the empty result. It never hints at what it hid.

Gate 4: compose the answer, then ship

The top few surviving chunks are handed to Claude Haiku 4.5 with a short, strict instruction: “Answer the question using only these chunks. Quote directly. Name the recording and the time. If the chunks don’t answer the question, say so — do not guess.” Grounding the model in only the retrieved chunks is what keeps the answer honest: it can’t invent a quote that isn’t in the transcript, because the only material it’s given is the transcript.

The result that lands in the search box is a short answer (“On the April 2 Acme call, their VP said they needed to go live before June 30”), a direct quote, the recording’s name and date, and a play link that jumps to the exact second — built from the chunk’s start time. The asker hears it in the speaker’s own voice if they want to.

Every search — question, asker, the recordings that came back, and the timestamp — writes a row to tx-searchlog in DynamoDB. That log is both an audit trail and a quiet signal: if the same question keeps returning nothing, maybe a recording never got filed, or a topic tag is wrong.

Why the gates exist

None of these gates are exotic. They’re the order a careful person would follow if they were doing the search by hand: figure out what’s being asked, find the relevant moments, set aside anything you’re not allowed to share, and then answer with the actual words and where to find them. Putting them in code as four small steps — with access checked before the answer is written — makes the safe behavior part of the design, not something you’re trusting each search to remember.

Next post: the access model in full — how recordings get tagged, how sensitive ones stay out of open search entirely, and how the search log gives you a years-long record of who asked what.

All posts