How a recording gets filed

Key takeaways

Three intake lanes feed one archive: a watched Drive folder, a forwarding lane, and a meeting-tool pull.
Every lane lands the file in S3, which starts the same filing pipeline.
Amazon Transcribe turns speech into text with speaker labels and a timestamp on every word.
A plain-Python step files each recording: date from the file, people from the speakers, topic from a keyword pass.
The result is one catalogue row in DynamoDB plus the full transcript in S3.

Three lanes into one archive

Fig 2. Three lanes converge on one S3 bucket. Whatever the door, the file triggers Transcribe and then the filing step. The result is one catalogue row in DynamoDB and the full transcript in S3, ready for indexing in Part 3.

Lane 1: the Drive folder

The simplest lane. Open the watched folder in Drive, drop in an audio or video file, done. A small Lambda — drive-sync — runs every few minutes, lists new files in the folder via the Drive API, and copies anything new to s3://tx-audio/. The S3 PUT starts the pipeline. The folder stays the place a person browses by hand; S3 is where the system does its work. You don’t lose the original — it stays in Drive too.

This lane covers the cases where you already have a recording on your laptop — a phone call you recorded, a webinar export, a voice memo — and you just want it in the archive. Drag, drop, and the next search will find it.

Lane 2: forwarding (the lane most teams actually use)

Set up a dedicated inbound address — something like archive@your-company.com — via Amazon SES. When a meeting tool emails you “your recording is ready” with the file attached, or when a teammate has a recording they want kept, they forward it to that address and the archive takes it from there. SES writes the raw email to s3://tx-raw-mime/. The S3 PUT triggers a parser Lambda that walks the email, finds the audio or video attachment (or a link to it), pulls the file, and stores it in s3://tx-audio/ — the same place Lane 1 lands. From there it’s the same pipeline.

If the forward carries a link instead of an attachment (some tools email a download link rather than the file), the parser follows the link, downloads the recording, and stores it. The original email is kept in S3 for audit, so you can always see how a given recording got in.

Lane 3: meeting-tool pull

Most recordings these days are born in a meeting tool — the video-call app the team already uses. Those tools keep finished recordings in their own cloud and offer an API to list and download them. Lane 3 is a small connector Lambda that runs on a schedule (every couple of hours), asks the meeting tool for recordings finished since the last run, downloads any new ones to s3://tx-audio/, and lets the pipeline take over. The meeting tool’s credentials live in Secrets Manager.

This is the lane that makes the archive feel automatic. Nobody has to remember to upload anything — a recorded call simply shows up in the archive a couple of hours later, transcribed and searchable. A team that doesn’t use a supported meeting tool loses nothing; Lanes 1 and 2 still cover them.

Transcribe, then file

Whatever lane the file came through, the rest is the same. The S3 PUT on tx-audio starts an Amazon Transcribe job. Transcribe turns the speech into text, labels who spoke each part (speaker one, speaker two, and so on), and stamps a time on every word. When the job finishes, a filing Lambda reads the result and does three plain-Python things: it takes the recording date from the file’s metadata or its name; it matches the speaker labels and any “invited” list against the people aliases in the rules doc, so the catalogue says “Maria and the Acme buyer” rather than “speaker one and speaker two”; and it tags a topic with a short keyword pass against the tag list. It writes one row to the tx-catalogue DynamoDB table — title, date, people, topic, access tag, transcript link — and saves the full transcript to s3://tx-transcripts/.

No model runs in this step. Filing is deterministic on purpose: the same recording always files the same way, and a person can correct a mis-tagged topic or a mismatched speaker by editing the rules doc, with no deploy. The next post is where the AI starts to earn its place — turning that transcript into something you can search by meaning.

Why everything funnels to one pipeline

Three doors in, but only one pipeline behind them. That’s deliberate. If each lane filed recordings its own way, “why is this recording tagged wrong?” would mean checking three code paths. Funneling every file through the same S3 bucket means transcription, filing, and indexing happen exactly once, the same way, no matter how the recording arrived. The lanes are first-class for getting recordings in; they all hand off to the same pipeline the moment the file lands.

Next post: how the filed transcript becomes searchable — split into timed chunks, turned into vectors, and written to S3 Vectors so a question can match by meaning.

All posts