How a long post gets loaded
The repurposer only works on what you give it. So the first job is making it easy to hand it a long piece, whatever shape that piece is in. There are three ways one gets in: you drop a doc in a Drive folder, you forward a recording transcript to a dedicated address, or you paste a link to a published post. The first is for things you wrote. The other two exist because in real life the best source is often a webinar nobody transcribed cleanly or a post already live on your own site.
Key takeaways
- Three intake lanes feed one source store: a Drive folder, a transcript forwarding lane, and a paste-a-link lane.
- The forwarding lane cleans transcripts — drops timestamps and speaker labels, keeps the words.
- The link lane fetches a published page and strips the menus, ads, and footer down to plain text.
- Every loaded piece is mirrored to S3, so you get versioning for free and the drafter reads from there.
- The Drive folder stays the place you control. The other lanes are conveniences that write into the store.
Three lanes into one store
Lane 1: the Drive folder itself
The simplest lane. Open the Drive folder, drop in the doc you want to reuse, done. It works for anything you wrote yourself — a blog post, a case study, a long email you’re proud of. A small Lambda — source-sync — runs every few minutes, exports any new or changed doc as plain text via the Drive API, and writes it to s3://cr-source-store/. The drafter reads from S3, not Drive directly. That keeps Drive API calls predictable and gives you S3 versioning for free, so if you edit the source and re-run, the old version is still there.
This lane covers the case where the long piece already exists as a doc and you just want a week of short posts out of it. Most of what a small team repurposes goes in this way.
Lane 2: transcript forwarding (the messy-source lane)
Recordings are the richest source and the worst-formatted one. A webinar, a podcast, a sales call you have permission to reuse — the transcript is full of good lines buried under timestamps, speaker labels, and filler. Set up a dedicated inbound address — something like repurpose@your-company.com — via Amazon SES. Forward the transcript (paste it in the body, or attach the file your recorder spat out) and the system takes it from there. SES writes the raw message to s3://cr-raw-inbound/. The S3 PUT triggers a cleaner Lambda.
The cleaner strips the parts you don’t want to post — the 00:14:32 timestamps, the Speaker 1: labels, the “um”s and “you know”s — and keeps the actual sentences as plain text. This part is plain code, not a model: a few rules that handle the common transcript formats. The cleaned text goes into the same source store as Lane 1, tagged as a transcript so the drafter knows it’s spoken words and can lean a little more on tightening them up. Nothing is invented in the cleanup; sentences are only ever dropped or trimmed, never added.
Lane 3: paste a link
Sometimes the best source is already published — on your own blog, or a guest post you wrote for someone else. Retyping it into a doc is a waste. Lane 3 is a small form (a single text box backed by a Function URL) where you paste the URL. A fetcher Lambda pulls the page and strips it down to just the article: out go the navigation menus, the cookie banner, the sidebar, the ads, the footer; what’s left is the title and the body text, written into the source store.
The fetcher only pulls pages you paste in — it doesn’t crawl, and it doesn’t go fetch anything on its own. That’s on purpose: the source should always be a thing you chose to repurpose, not something the system decided to grab. Paste-a-link is the most opt-in lane; a team that never uses it loses nothing.
Why everything funnels into one store
Three lanes in, but only one place the drafter actually reads from. That’s deliberate. A blog post, a transcript, and a fetched web page arrive in three completely different shapes; if the drafter had to handle each shape, it would be three times as much to get wrong. Cleaning every piece down to plain text at the door means the drafter only ever sees one kind of thing: a title and a body. The convenience lanes are first-class for getting a piece in, but they all pass through the same cleanup and land in the same store.
Next post: how the drafter reads a loaded piece, splits it into passages, and pulls the handful of points actually worth posting — each one carrying the exact passage it came from.
All posts