Part 5 of 7 · Transcription archive series ~5 min read

How the archive stays private

An archive of everything your business ever said on a call is powerful — and dangerous if anyone can search all of it. The sales team shouldn’t surface an HR call. A contractor shouldn’t find a board discussion. So the archive is built around three controls: every recording carries an access tag, search filters by that tag before any answer is written, and sensitive recordings can be kept out of open search entirely. On top of that, every search is logged. This post walks through all three, and what the log gives you.

Key takeaways

  • Three controls per recording: tag access (who may see it), lock sensitive (keep it out of open search), log search (record every query).
  • The access tag is stored on every chunk’s vector, so the filter happens at the search itself.
  • Locked recordings never appear in open search; they need a named, logged, direct request.
  • Every search writes a row to the audit trail — who asked, what, what came back, and when.
  • The default is private: a recording with no tag is visible only to its owner until someone sets one.

Three controls on every recording

Three privacy controls on every recording A diagram showing one input on the left flowing into three control paths, all of which feed an audit trail on the right. Far left: a "Filed recording" box showing the catalogue row — title, date, people, transcript link — with three control placeholders below: Tag access, Lock sensitive, Log search. The middle column shows the three controls. Control one, Tag access: each recording gets an access tag set from the rules doc — a team, a named person, or everyone; the tag is copied onto every chunk's vector in the index, so the search filter in Part 4 can drop disallowed chunks before the answer is written. A recording with no tag defaults to its owner only. Control two, Lock sensitive: a recording can be marked sensitive, which keeps its chunks out of open search entirely; they're indexed but never returned to the normal search box. Reaching a locked recording needs a named, direct request that is itself logged, so the most private material can't surface by accident from a broad question. Control three, Log search: every query — not just every recording — is captured; the asker, the exact question, the recordings returned, and the time all go to the search log. The right side shows the convergence: every access decision and every search writes a row to the tx-searchlog and tx-access DynamoDB tables with timestamp, user, action, and detail. A note at the bottom: the default is private — an untagged recording is visible only to its owner until someone sets a tag. Filed recording title, date, people [Tag access] [Lock sensitive] [Log search] Control 1 Tag access • Tag: team, person, or everyone • Copied onto every chunk's vector Control 2 Lock sensitive • Kept out of open search entirely • Needs a named, logged request Control 3 Log search • Every query captured not just every file • Asker, question, results, time Audit trail DynamoDB logs timestamp · user action · detail kept for years The default is private — an untagged recording is visible only to its owner.
Fig 5. Three controls per recording, all feeding one audit trail. Tag access decides who may see it. Lock sensitive keeps the most private recordings out of open search. Log search captures every query. Every decision is recorded.

Control 1: tag access (who may see it)

Every recording gets an access tag when it’s filed. The tag is one of three kinds: a team (“sales,” “support,” “leadership”), a named person, or everyone. The rules doc sets defaults — recordings from the sales meeting tool default to the sales team, recordings forwarded by a manager default to that manager — and a person can override the tag on any recording by editing the catalogue.

The important detail is where the tag lives. When the transcript is indexed (Part 3), the access tag is copied onto every chunk’s vector. So when search runs (Part 4), the filter can drop disallowed chunks at the match step itself, without a second lookup. The asker’s teams are checked against each chunk’s tag, and anything they can’t see is gone before the answer model is even called. A salesperson searching the whole archive simply never sees a sentence from an HR call — not because the answer hid it, but because those chunks were filtered out before any answer existed.

A recording with no tag at all defaults to its owner only. The system fails closed: when in doubt, fewer people can see it, not more.

Control 2: lock sensitive (out of open search)

Some recordings shouldn’t turn up in a broad search even for people who could technically be allowed — a termination call, a legal discussion, a board session. For those, the tag isn’t enough; you don’t want them surfacing from a vague question that happened to match a phrase. So a recording can be marked sensitive. Sensitive recordings are still transcribed and indexed, but their chunks are flagged so the normal search box never returns them, no matter who’s asking or how close the match is.

Reaching a locked recording takes a deliberate, named request — an authorized person opening that specific recording by id, which is itself logged. The point is that the most private material can’t leak out of an innocent broad search. You have to mean to open it, and the system remembers that you did.

The first two controls decide what comes back. The third records what was asked. Every search — the asker, the exact question, the recordings that were returned, and the timestamp — writes a row to the tx-searchlog table. This is per query, not per recording: even a search that returned nothing is logged, because “who went looking for the layoff discussion” is itself worth knowing.

The log earns its place in three ways. It’s an audit trail — if a quote turns up somewhere it shouldn’t, you can see who searched for it and when. It’s an early-warning system — a spike in searches for a sensitive topic is a signal worth a human glance. And it’s a quality signal — if the same reasonable question keeps returning nothing, a recording probably never got filed or a topic tag is wrong. The log is written by a path with append-only permission, so a search can record itself but can’t rewrite history.

How the three stack up

The controls are layers, not alternatives. Tag access decides the normal who-sees-what. Lock sensitive removes the riskiest recordings from open search no matter the tag. Log search records every query against both. A recording can be tagged to the sales team, and that’s the everyday control. A recording can additionally be locked, and now even sales can’t stumble on it. And every attempt to find either is written down. Each layer is simple on its own; together they let an SMB hold sensitive recordings in the same archive as everyday ones without turning the search box into a leak.

None of this replaces good judgment about what to record in the first place. But it means the archive earns trust the way a careful filing clerk would: it knows who’s allowed in which drawer, it keeps the locked drawer locked, and it writes down every time someone comes asking.

Next post: the cost breakdown. The whole archive runs in coffee-money territory at SMB volume; Part 6 explains exactly where the dollars go and why transcription dominates the bill.

All posts