Part 7 of 7 · Content repurposer series ~8 min read

Engineering reference: the content repurposer architecture

Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, the S3 Vectors index, Lambda inventory, IAM scopes, the SES inbound rule set, EventBridge Scheduler config, the DynamoDB schemas, and the Slack interactive flow for the review desk. Read alongside the previous six posts; this one’s the build sheet.

Region and account shape

Default region: ap-southeast-1 (Singapore). SES inbound, Bedrock cross-Region inference, S3 Vectors, and EventBridge Scheduler are all in good shape there. A second region for resilience isn’t worth the setup at SMB volume — the failure mode here is a draft that shows up an hour late, not a regional outage, and nothing in this system is on a hard real-time path. One AWS account dedicated to the repurposer (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.

Topology

AWS topology of the content repurposer A topology diagram with three regions stacked vertically inside one AWS account boundary. Top region: ingress. Three boxes show the three intake lanes — a Drive folder sync via the source-sync Lambda triggered every few minutes by EventBridge Scheduler that mirrors loaded docs to s3://cr-source-store/, an SES inbound rule set with action S3 PUT to s3://cr-raw-inbound/ plus the cleaner Lambda intake-cleaner that strips transcripts to plain text, and a fetcher Lambda on a Function URL that pulls a pasted post URL and strips it to article text. Middle region: processing. The points Lambda is triggered by an S3 PUT on the source store; it splits the piece into passages, calls Titan Text Embeddings V2 to embed each one into the S3 Vectors index cr-passages, calls Bedrock Haiku 4.5 to score each passage, keeps the top few, and emits one event per kept point to the EventBridge default bus: cr.point_picked. Bottom region: drafting and approval. The drafter Lambda is triggered by an EventBridge rule on cr.point_picked; it reads the voice and rules docs from s3://cr-rules-source/, drafts each format with Bedrock Haiku 4.5 (Sonnet 4.6 for hard pieces), runs the source-check against the S3 Vectors index, trims to length, and writes each draft to DynamoDB cr-drafts then posts it to the review desk in Slack with Approve, Edit, and Skip buttons. Slack button clicks land on a Function URL Lambda approve-handler that updates cr-drafts and cr-audit and, on approve, queues the draft on EventBridge Scheduler to drip out or sends it to the review channel. CloudWatch Logs collects from every Lambda at 7-day retention. Across the right edge: a small box labelled AWS Budgets alarm at $15 monthly threshold, posting to SNS topic cr-cost-alarm. A note at the bottom: every draft is grounded — and every approval is logged to cr-audit. Ingress Lambda · source-sync every few min Drive API → s3://cr-source-store/ piece.txt SES inbound rule set cr-inbound-rules action: S3 PUT s3://cr-raw-inbound/ trigger: intake-cleaner Lambda · fetcher Function URL form pulls a pasted URL strips to article text → source store Source store (S3) one cleaned piece per file Processing Lambda · points S3 PUT trigger split into passages embed, score, keep top few S3 Vectors · cr-passages Titan V2, 1024-dim one vector per passage used for the source-check EventBridge default bus cr.point_picked one per kept point carries its passage (dropped → no event) Drafting & approval Lambda · drafter picks formats, drafts in voice; source-check, trim to length Slack review desk DM with [Approve] [Edit] [Skip] button clicks → Function URL Lambda · approve-handler writes cr-drafts, cr-audit, and on approve queues the drip via Scheduler Every draft is grounded — and every approval is logged to cr-audit.
Fig 7. AWS topology, in three regions of the diagram: ingress (three lanes into the source store), processing (split, embed, score, emit a point event), drafting and approval (the drafter writes, the review desk shows, the approval is recorded). Every Lambda is event- or schedule-driven; nothing is synchronous-chained.

Lambda functions

All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.

  • source-sync — EventBridge Scheduler target, fires every few minutes. Uses the Google Drive API (service-account credentials in Secrets Manager under cr/drive/sa) to export any new or changed doc in the source folder as plain text and write it to s3://cr-source-store/<piece-id>.txt only if it has changed since the last sync. Same pattern syncs the voice and rules docs to s3://cr-rules-source/. Memory: 256 MB. Timeout: 30 s.
  • intake-cleaner — S3 PUT trigger on s3://cr-raw-inbound/. Parses the forwarded MIME, pulls the transcript body (or attachment), and strips timestamps, speaker labels, and filler with a small set of format rules (Otter, Zoom, Fireflies, and plain-paste formats handled; unknown formats fall back to a generic line-and-timestamp regex). Writes the cleaned plain text into s3://cr-source-store/ tagged kind=transcript. No Bedrock. Memory: 256 MB. Timeout: 30 s.
  • fetcher — Lambda Function URL (the paste-a-link form). Fetches the single pasted URL, runs a readability extraction (trafilatura, with a readability-lxml fallback) to strip navigation, ads, and footer down to the article body, and writes plain text into s3://cr-source-store/ tagged kind=web. Only fetches the exact URL submitted; no crawling, with an allowlist of schemes and a request timeout. Memory: 512 MB. Timeout: 30 s.
  • points — S3 PUT trigger on s3://cr-source-store/. Splits the piece into passages (paragraph- and topic-boundary chunking, ~3–5 sentences each), calls Titan Text Embeddings V2 to embed each passage into the S3 Vectors index cr-passages, calls Bedrock Haiku 4.5 to score each passage for postability, keeps the top N per the rules doc, and emits one cr.point_picked event per kept point with the point and its source passage as payload. Dropped passages emit nothing. Memory: 512 MB. Timeout: 120 s.
  • drafter — EventBridge rule on cr.point_picked. Reads the voice and rules docs, drafts each requested format with Bedrock Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0 via global.anthropic.claude-haiku-4-5-20251001-v1:0); routes the hard pieces (kind=transcript or a length/complexity flag) to Claude Sonnet 4.6 (global.anthropic.claude-sonnet-4-6-20250930-v1:0). Runs the source-check by embedding the draft and querying cr-passages for the nearest passage, dropping any claim not supported and re-prompting once if the draft drifted. Trims to the platform length. Writes each draft to cr-drafts and posts it to the Slack review desk. Memory: 512 MB. Timeout: 120 s.
  • approve-handler — Lambda Function URL, public with AuthType: NONE; verifies a Slack signature on the request body. Triggered by Slack interactive button clicks (Approve/Edit/Skip) and the Edit modal submission. Writes to cr-drafts and cr-audit; on approve, queues the draft on the post scheduler (a one-off EventBridge Scheduler rule per drip slot) or routes it to the review channel; on skip, optionally requests a backup point from points. Memory: 256 MB. Timeout: 15 s.
  • drip — EventBridge Scheduler one-off target. Sends one approved draft to its channel at its scheduled time (Slack channel post, or a webhook to the configured post scheduler). Reads cr-drafts, marks the draft sent, writes a sent row to cr-audit. No Bedrock. Memory: 256 MB. Timeout: 15 s.
  • recap — EventBridge Scheduler target, weekly Sunday 6pm. Reads the past week’s cr-audit; sends a recap to a configured Slack channel: pieces repurposed, drafts approved, edited, skipped, and the approve rate per format. The message is a plain summary table; no Bedrock. Memory: 256 MB.

Storage

  • DynamoDB · cr-drafts — one row per draft. PK (piece_id, draft_id); attributes: format (thread/post/caption), point_tier, passage_ref (S3 Vectors id of the source passage), model_text, final_text, status (pending/approved/edited/skipped/sent). On-demand.
  • DynamoDB · cr-audit — one row per write action of any kind. PK (draft_id, ts); attributes: action (approve/edit/skip/sent), by_user, before, after, passage_ref. On-demand. No TTL — this is the long-term audit trail.
  • DynamoDB · cr-pieces — one row per loaded piece. PK piece_id; attributes: title, kind (drive/transcript/web), source_link, loaded_at, passage_count, status. On-demand.
  • S3 Vectors · cr-passages — one vector per passage, Titan Text Embeddings V2 at 1024 dimensions, with metadata (piece_id, passage_id, position, text). Queried by the points scorer context and the drafter source-check.
  • S3 · cr-source-store — cleaned plain-text pieces, one file each. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 2 years.
  • S3 · cr-rules-source — mirrored voice and rules docs as plain text. Versioning enabled.
  • S3 · cr-raw-inbound — raw inbound MIME from forwarded transcripts. Lifecycle to Glacier at 30 days; expiry at 1 year.

Bedrock

  • Foundation models. anthropic.claude-haiku-4-5-20251001-v1:0 via the Global cross-Region inference profile for passage scoring and the routine drafting; anthropic.claude-sonnet-4-6-20250930-v1:0 via its Global profile for the hard pieces only, selected by the drafter from source kind and a complexity flag.
  • Embeddings. amazon.titan-embed-text-v2:0 at 1024 dimensions, one call per passage, written to the S3 Vectors index cr-passages. This is what makes the grounding source-check a single nearest-neighbour lookup.
  • Quotas. Default account quotas are more than enough at SMB volume. The system only calls Bedrock when a piece is loaded; there is no background traffic.

EventBridge Scheduler config

  • cr-source-syncrate(5 minutes). Target: source-sync Lambda.
  • cr-weekly-recapcron(0 18 ? * SUN *) in TZ_NAME. Target: recap Lambda.
  • Drip one-offs — created by approve-handler per approved draft at the slot the rules doc assigns (e.g. one a day at 9am local). Use at(YYYY-MM-DDTHH:MM:SS) expressions with --action-after-completion DELETE so each rule self-cleans after it fires drip.
  • Backup-refill one-offs — created by approve-handler on a skip when auto-refill is enabled, targeting points to draft a replacement from a backup point.

SES inbound and outbound

  • Set the MX record on a dedicated subdomain (e.g. repurpose.your-company.com) to inbound-smtp.ap-southeast-1.amazonaws.com.
  • SES inbound rule set cr-inbound-rules: one rule with recipient repurpose@your-company.com → spam scan → S3 PUT to s3://cr-raw-inbound/<message-id> → stop. The S3 PUT triggers intake-cleaner.
  • SES outbound for the weekly recap email (optional, if you prefer email to Slack): verify a sender identity at repurposer@your-company.com with DKIM and SPF on the parent domain. Out of sandbox by request.

IAM (least privilege per Lambda)

Each Lambda has its own role with policies scoped to exact ARNs. Sketch:

  • points role: s3:GetObject on the source store; bedrock:InvokeModel on the Titan and Haiku ARNs; s3vectors:PutVectors + QueryVectors on cr-passages; events:PutEvents on the default bus; dynamodb:PutItem on cr-pieces.
  • drafter role: s3:GetObject on the rules source; bedrock:InvokeModel on the Haiku and Sonnet ARNs; s3vectors:QueryVectors on cr-passages; dynamodb:PutItem on cr-drafts; secretsmanager:GetSecretValue on the Slack bot token; outbound network to slack.com.
  • approve-handler role: dynamodb:PutItem + UpdateItem on cr-drafts and cr-audit; scheduler:CreateSchedule for the drip one-offs; secretsmanager:GetSecretValue on the Slack signing secret; events:PutEvents for the optional backup-refill.
  • intake-cleaner role: s3:GetObject on cr-raw-inbound; s3:PutObject on cr-source-store. No Bedrock, no network egress.
  • source-sync and fetcher roles: secretsmanager:GetSecretValue on the Google service-account secret (source-sync only); s3:PutObject on the source and rules buckets; outbound network to www.googleapis.com (source-sync) or the open web with a scheme allowlist (fetcher).

Slack interactive flow

Drafts are posted to the review desk via the chat.postMessage Web API with Block Kit blocks: the draft text, the format and point tier, the source passage in a context block, and three action buttons. Button clicks are sent by Slack to the configured Interactivity request URL, which is the approve-handler Function URL. approve-handler verifies the Slack signing secret on the inbound request, parses the action_id (approve, edit, skip), opens a modal for Edit (pre-filled with the draft), and processes the response. Approve and Skip are one-tap; Edit submits through the modal.

The Slack app needs chat:write and im:write, and the Interactivity URL configured. The bot token lives in Secrets Manager under cr/slack/bot-token; the signing secret is cr/slack/signing-secret.

Observability and cost gates

  • CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on "error" + "throttle" + "timeout" to a CloudWatch metric for alerting.
  • Alarms: drafter failure rate > 1% in 24h; source-check drop rate > some threshold (a spike means the model is drifting and the prompt or model choice needs a look); approve-handler signature-verification failures > 5/hour (might mean the Slack secret rotated).
  • X-Ray: off by default. Not worth the cost at SMB volume.
  • AWS Budgets: $15/month threshold, alarm at 80% and 100%, posts to SNS topic cr-cost-alarm subscribed to the admin’s email and Slack.

Config and secrets

Service-account credentials for the Drive API live in Secrets Manager under cr/drive/sa. Slack bot token and signing secret under cr/slack/*. The post-scheduler webhook (if you drip to an external scheduler instead of a Slack channel) under cr/scheduler/webhook. The configured timezone, drip slots, format mix, top-N point count, and model-routing thresholds all live in Parameter Store under /cr/config/, with the voice and rules docs themselves in Drive (mirrored to S3). Lambdas fetch config on cold start and cache for the lifetime of the execution environment.

Deploy

GitHub Actions with OIDC into a deploy role (no long-lived keys), building and shipping with AWS SAM. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for cr-source-store and cr-rules-source so a bad edit can be rolled back in one click, and keep the S3 Vectors index in its own stack so re-indexing never forces a full app redeploy. Total deployable surface: around eight Lambdas, three DDB tables, one S3 Vectors index, three S3 buckets, one EventBridge rule on the default bus (plus the Scheduler rules), one SES rule set, and one Budgets alarm.

That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.

All posts