Part 2 of 7 · Content repurposer series ~4 min read

How a long post gets loaded

The repurposer only works on what you give it. So the first job is making it easy to hand it a long piece, whatever shape that piece is in. There are three ways one gets in: you drop a doc in a Drive folder, you forward a recording transcript to a dedicated address, or you paste a link to a published post. The first is for things you wrote. The other two exist because in real life the best source is often a webinar nobody transcribed cleanly or a post already live on your own site.

Key takeaways

  • Three intake lanes feed one source store: a Drive folder, a transcript forwarding lane, and a paste-a-link lane.
  • The forwarding lane cleans transcripts — drops timestamps and speaker labels, keeps the words.
  • The link lane fetches a published page and strips the menus, ads, and footer down to plain text.
  • Every loaded piece is mirrored to S3, so you get versioning for free and the drafter reads from there.
  • The Drive folder stays the place you control. The other lanes are conveniences that write into the store.

Three lanes into one store

Three intake lanes funnel into one source store A diagram with three vertical lane columns at the top and a single unified row at the bottom. Lane one, Drive folder: you drop a blog post or a doc into the Google Drive folder; the source-sync Lambda mirrors the folder to S3 every few minutes, and the drafter reads from there. Lane two, Transcript forwarding: you forward a recording transcript to a dedicated address, repurpose-at-your-company; SES writes the raw message to S3; a cleaner Lambda strips the timestamps and speaker labels and keeps the spoken words as plain text. Lane three, Paste a link: you drop a URL to a published post into a small form backed by a Function URL; a fetcher Lambda pulls the page and strips the menus, ads, and footer down to the article text. All three lanes converge on the same source store in S3, one cleaned plain-text file per piece, which the drafter reads. A note at the bottom: the Drive folder stays the place you control — the other lanes are conveniences that load pieces into the store. Lane 1 · drop Drive folder • Drop a post or doc in the folder • source-sync mirrors to S3 every few min • Drafter reads from S3 • You control the folder Lane 2 · SES inbound Transcript forward • Forward transcript to repurpose-address • SES writes message to S3 • Cleaner drops times, labels; keeps words • Plain text into the store Lane 3 · fetch Paste a link • Paste a post URL into a small form • Fetcher Lambda pulls the page • Strips menus, ads, footer to article • Plain text into the store Source store in S3 (one cleaned piece per file) title · kind · source link · loaded date · plain-text body · status versioning on — the drafter reads each piece from here to drafter, points lane The Drive folder stays the place you control — the other lanes are conveniences that load pieces into the store.
Fig 2. Three lanes converge on one source store. The Drive folder is the place you control; the transcript lane and the link lane are conveniences that load pieces for you. Everything lands as cleaned plain text in S3 so the drafter can read it the same way no matter where it came from.

Lane 1: the Drive folder itself

The simplest lane. Open the Drive folder, drop in the doc you want to reuse, done. It works for anything you wrote yourself — a blog post, a case study, a long email you’re proud of. A small Lambda — source-sync — runs every few minutes, exports any new or changed doc as plain text via the Drive API, and writes it to s3://cr-source-store/. The drafter reads from S3, not Drive directly. That keeps Drive API calls predictable and gives you S3 versioning for free, so if you edit the source and re-run, the old version is still there.

This lane covers the case where the long piece already exists as a doc and you just want a week of short posts out of it. Most of what a small team repurposes goes in this way.

Lane 2: transcript forwarding (the messy-source lane)

Recordings are the richest source and the worst-formatted one. A webinar, a podcast, a sales call you have permission to reuse — the transcript is full of good lines buried under timestamps, speaker labels, and filler. Set up a dedicated inbound address — something like repurpose@your-company.com — via Amazon SES. Forward the transcript (paste it in the body, or attach the file your recorder spat out) and the system takes it from there. SES writes the raw message to s3://cr-raw-inbound/. The S3 PUT triggers a cleaner Lambda.

The cleaner strips the parts you don’t want to post — the 00:14:32 timestamps, the Speaker 1: labels, the “um”s and “you know”s — and keeps the actual sentences as plain text. This part is plain code, not a model: a few rules that handle the common transcript formats. The cleaned text goes into the same source store as Lane 1, tagged as a transcript so the drafter knows it’s spoken words and can lean a little more on tightening them up. Nothing is invented in the cleanup; sentences are only ever dropped or trimmed, never added.

Sometimes the best source is already published — on your own blog, or a guest post you wrote for someone else. Retyping it into a doc is a waste. Lane 3 is a small form (a single text box backed by a Function URL) where you paste the URL. A fetcher Lambda pulls the page and strips it down to just the article: out go the navigation menus, the cookie banner, the sidebar, the ads, the footer; what’s left is the title and the body text, written into the source store.

The fetcher only pulls pages you paste in — it doesn’t crawl, and it doesn’t go fetch anything on its own. That’s on purpose: the source should always be a thing you chose to repurpose, not something the system decided to grab. Paste-a-link is the most opt-in lane; a team that never uses it loses nothing.

Why everything funnels into one store

Three lanes in, but only one place the drafter actually reads from. That’s deliberate. A blog post, a transcript, and a fetched web page arrive in three completely different shapes; if the drafter had to handle each shape, it would be three times as much to get wrong. Cleaning every piece down to plain text at the door means the drafter only ever sees one kind of thing: a title and a body. The convenience lanes are first-class for getting a piece in, but they all pass through the same cleanup and land in the same store.

Next post: how the drafter reads a loaded piece, splits it into passages, and pulls the handful of points actually worth posting — each one carrying the exact passage it came from.

All posts