How a transcript becomes searchable

Key takeaways

A vector is a list of numbers that captures meaning; two ways of saying the same thing land near each other.
Each transcript is split into short chunks, every chunk carrying its start time in the recording.
Titan Text Embeddings V2 turns each chunk into a 1024-number vector — the one model call in this step.
Vectors land in S3 Vectors, each tagged with its recording, its timestamp, and its access tag.
Indexing runs once per recording. Search later is cheap because the hard work is already done.

The indexing flow, per recording

Fig 3. The indexing flow, per recording. Split into timed chunks, drop the empty ones, embed each with Titan V2, attach metadata, and write to S3 Vectors. Once every chunk is written, the catalogue row is flagged searchable.

Chunking: short, timed, and a little overlapping

A whole transcript is too big to embed as one vector — the meaning of an hour-long call doesn’t fit in a single list of numbers. So the transcript is cut into short passages, roughly a paragraph each, on natural speaker turns and sentence boundaries. Every chunk keeps the start time of its first word, because that timestamp is what lets search later jump straight to the moment in the audio.

Chunks overlap a little — each one carries the last sentence or two of the chunk before it. That overlap means a thought that spans a chunk boundary (“...and the budget, which by the way, is firm at fifty thousand”) is still captured whole in at least one chunk, instead of being split in a way that hides it from search. The chunk size and overlap are plain settings in the rules doc; the defaults work for normal meetings.

Very short chunks and silent stretches — long pauses, hold music, the thirty seconds before everyone joined — are dropped before embedding. There’s no point spending a model call to index “[silence].”

Embedding: turning words into numbers that mean something

Each kept chunk is sent to Titan Text Embeddings V2, which returns a vector — a list of 1024 numbers. The trick of embeddings is that the numbers capture meaning: two chunks that say the same thing in different words land close together in that 1024-dimensional space, and two chunks about different things land far apart. That’s what lets a search for “deadline” find a chunk that only ever said “before year-end.” The same model will turn the search question into a vector later, so questions and chunks are measured on the same ruler.

Embedding is the one model call in this step, and it’s a cheap one — a fraction of a cent per chunk. A typical hour-long meeting is a few dozen chunks, so indexing a recording costs a cent or two. And it happens exactly once. After that, the recording can be searched as many times as you like for almost nothing.

Writing to S3 Vectors

The vectors go into S3 Vectors, AWS’s built-in vector store. Each vector is written with metadata attached: the recording id, the chunk’s start time, the people, the topic, and — importantly — the access tag. That access tag is what lets search filter results by who’s allowed to see them before any answer is written, which Part 5 covers in detail. Storing the access tag on the vector means the filter happens at the search itself, not as an afterthought.

S3 Vectors is the right fit here because the archive’s search volume is bursty and low — a handful of searches a day, not thousands a second. You pay for the storage and the searches you actually run, with no always-on index server humming in the background. For an SMB archive that mostly sits quiet and occasionally gets a question, that economics is exactly right.

Re-indexing without re-transcribing

The full transcript is kept in S3 even after indexing. That matters for one reason: if the embedding model is upgraded down the line, or you change the chunk size, you can re-index the whole archive straight from the stored transcripts — no need to re-transcribe, which is the expensive part. Transcription happens once per recording, ever; indexing can be redone cheaply whenever it’s worth it.

Next post: how a plain-language question turns into a vector, matches the closest chunks, gets filtered by access, and comes back as a short answer with a direct quote linked to the exact second.

All posts