How a question finds the moment

Key takeaways

The question is embedded with the same Titan V2 model the chunks used, so they’re measured the same way.
S3 Vectors returns the closest chunks, each with its recording, timestamp, and access tag.
The access filter drops anything the asker can’t see before the answer is ever written.
Haiku 4.5 writes a short answer with a direct quote — grounded only in the chunks it was handed.
The result links to the exact second in the audio, and the whole search is logged.

Four gates on every search

Fig 4. Four gates between the question and the answer. Embed the query. Match the closest chunks. Filter by access. Compose with a quote. Then ship the answer with a play link and log the search.

Gate 1: embed the query

The question is sent to the same Titan Text Embeddings V2 model that turned every chunk into a vector back in Part 3. That sameness is the whole point: because the question and the chunks are measured by the same model, “closeness” in that 1024-number space actually means “about the same thing.” A question embedded by a different model couldn’t be compared to the chunks at all — it would be measuring with a different ruler.

This is a single, cheap model call — a fraction of a cent. It’s the only embedding cost on the search path, because the chunks were already embedded once, at index time.

Gate 2: match the closest chunks

The query vector goes to S3 Vectors, which returns the handful of chunks closest to it across the whole archive — usually the top ten or twenty. Each returned chunk carries its metadata from index time: which recording it came from, its start time in that recording, the people, the topic, and its access tag. At this point the system has candidates, but it hasn’t yet checked whether the asker is allowed to see them, and it hasn’t written a word of answer. It just has a short list of the most relevant moments in the back-catalogue.

Matching by meaning is what makes this better than keyword search. The question “did they push back on price?” finds the moment somebody said “that number feels high for what we’re getting” — a match no keyword search for “price” would ever make.

Gate 3: the access filter

Before anything is shown or summarized, the search drops every candidate chunk the asker isn’t allowed to see. The request carries who is asking and which teams they belong to; each chunk carries an access tag. If the tags don’t match — a salesperson asking about an HR recording, say — the chunk is removed from the list. This happens before the answer model ever sees the chunks, which is the only safe order: you can’t accidentally quote from a recording that was filtered out before the quote was written. Part 5 covers the access model in full; the important thing here is where the filter sits in the flow.

If the filter removes everything — the only relevant moments are in recordings the asker can’t see — the search returns “no matching moment found” and logs the empty result. It never hints at what it hid.

Gate 4: compose the answer, then ship

The top few surviving chunks are handed to Claude Haiku 4.5 with a short, strict instruction: “Answer the question using only these chunks. Quote directly. Name the recording and the time. If the chunks don’t answer the question, say so — do not guess.” Grounding the model in only the retrieved chunks is what keeps the answer honest: it can’t invent a quote that isn’t in the transcript, because the only material it’s given is the transcript.

The result that lands in the search box is a short answer (“On the April 2 Acme call, their VP said they needed to go live before June 30”), a direct quote, the recording’s name and date, and a play link that jumps to the exact second — built from the chunk’s start time. The asker hears it in the speaker’s own voice if they want to.

Every search — question, asker, the recordings that came back, and the timestamp — writes a row to tx-searchlog in DynamoDB. That log is both an audit trail and a quiet signal: if the same question keeps returning nothing, maybe a recording never got filed, or a topic tag is wrong.

Why the gates exist

None of these gates are exotic. They’re the order a careful person would follow if they were doing the search by hand: figure out what’s being asked, find the relevant moments, set aside anything you’re not allowed to share, and then answer with the actual words and where to find them. Putting them in code as four small steps — with access checked before the answer is written — makes the safe behavior part of the design, not something you’re trusting each search to remember.

Next post: the access model in full — how recordings get tagged, how sensitive ones stay out of open search entirely, and how the search log gives you a years-long record of who asked what.

All posts