A transcription archive on AWS for a few dollars a month

Key takeaways

Three sources for recordings: a watched Drive folder, a forward-to-an-address lane, and a scheduled pull from your meeting tool.
Every recording is transcribed once, filed by date, people, and topic, and split into short timed chunks for search.
Ask a plain-language question; the closest chunks come back with timestamps and a quote linked to the second in the audio.
Each recording carries an access tag; search only returns what the asker is allowed to see, and every search is logged.
Designed on AWS for about $4/month at typical small-business volume.

The whole system on one page

Before any code, here’s the shape of what we’re designing.

Fig 1. Three sources outside, three pieces inside AWS. Recordings flow in from a Drive folder, a forwarding lane, and a meeting-tool pull. The Intake transcribes and files. The Index makes it searchable. Search returns the right moment with a quote.

What you set up once (the outside)

Recordings. A Google Drive folder that you drop audio or video into. New recordings can also enter via two other lanes covered in Part 2 — a forward-to-an-address lane (forward a meeting recording to a dedicated address and it gets filed) and a scheduled pull (a small connector grabs finished recordings from your meeting tool’s cloud on a timer). Whatever the lane, the file lands in S3, and that’s where the work begins.
A rules folder. Two short Google Docs in a Drive folder. The rules doc covers the topic tags you care about (deals, support, hiring, product), the people aliases (so “Maria,” “M. Santos,” and her email all map to one person), and which recordings are open to everyone versus restricted. The access doc lists who belongs to which team, so the search box knows who is allowed to see what. Both are plain docs a person can edit without a deploy.
People. The staff who search the archive. Each person types a plain-language question into a small search box — “what did the customer say about pricing on the Acme call?” — and gets back a short answer, a direct quote, the name and date of the recording, and a link that plays the audio from the exact second. They only see results from recordings they’re allowed to see.

What runs when a recording arrives (the inside)

The intake. A recording lands in S3 and starts the pipeline. Amazon Transcribe turns the speech into text, with speaker labels and a timestamp on every word. A plain-Python step then files it: it reads the date from the file, matches the speakers and the “invited” list against the people aliases, and tags a topic from a short keyword pass. The result is one catalogue row per recording in DynamoDB — title, date, people, topic, access tag, and a link to the transcript — plus the full transcript saved in S3.
The index. The transcript is split into short timed chunks (roughly a paragraph each, carrying the start time in the recording). Titan Text Embeddings V2 turns each chunk into a vector — a list of numbers that captures the chunk’s meaning, so two ways of saying the same thing land near each other. The vectors are written to S3 Vectors, AWS’s built-in vector store, each one tagged with the recording it came from and its access tag. This is the one step that makes the back-catalogue searchable by meaning instead of exact words.
Search. Someone types a question. The same Titan model turns the question into a vector, and S3 Vectors returns the closest chunks — but first the search filters out anything the asker isn’t allowed to see. Claude Haiku 4.5 reads the top few allowed chunks and writes a short answer with a direct quote. The result links to the exact second in the audio. Every search writes a row to the search log: who asked, what they asked, what came back, and when.

In plain words

Your sales lead asks: “Didn’t the Acme buyer mention a hard deadline on one of our calls?” Nobody remembers which call. The lead opens the search box and types the question. The archive turns it into a vector, finds the three closest moments across forty recorded calls, drops the two from teams the lead can’t see, and Haiku 4.5 writes: “On the April 2 Acme call, their VP said they needed to be live before their fiscal year-end on June 30.” Below the answer is the quote and a play link that jumps to 14 minutes 22 seconds into that recording. The lead clicks, hears it in the buyer’s own voice, and forwards the moment to the account team. The whole thing took fifteen seconds, and the search is logged.

The cost of running this is about $4 a month at SMB volume. The cost of not running it is the deadline nobody remembered, the promise made on a call that nobody can find, or the hours a junior spends scrubbing through old recordings to confirm what was actually said.

Design rules that shaped every decision

Every answer points to the exact moment — a quote and a play link, not a list of files to scrub.
Transcribe once, index once. The expensive work happens when a recording arrives, not when someone searches.
Access is checked before the answer is written. You never see a quote from a recording you can’t open.
Every search is logged — who asked, what they asked, and what came back — for a years-long audit trail.
The rules and access docs live in Drive. Changing a tag, a person alias, or who-sees-what doesn’t need a deploy.
The archive finds the record. It never decides anything — staff still own the call.

Why this shape

Most teams “keep” their recordings the way they keep junk drawers: everything is technically in there, and nothing can be found. The meeting tool stores the files but names them after timestamps and forgets them after ninety days. Somebody’s notes capture a fraction of what was said, filtered through what they thought mattered at the time. And human memory, of course, fails the moment the person who was on the call goes on holiday or leaves.

The setup above keeps the recordings where they already pile up, but adds a small system that reads each one as it arrives, files it cleanly, and makes the whole back-catalogue answer questions in plain language. The expensive work — transcription and indexing — happens once, when the recording lands. Searching is cheap and fast. And because access is checked before any answer is written, the archive can hold sensitive calls without becoming a leak. The system is invisible most days; visible only the moment somebody needs to know what was actually said.

The next four posts walk through each piece in turn: how a recording gets filed, how a transcript becomes searchable, how a question finds the moment, and how the archive stays private. One diagram per post. A cost breakdown and a final engineering reference at the end.

All posts