Engineering reference: the content repurposer architecture
Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, the S3 Vectors index, Lambda inventory, IAM scopes, the SES inbound rule set, EventBridge Scheduler config, the DynamoDB schemas, and the Slack interactive flow for the review desk. Read alongside the previous six posts; this one’s the build sheet.
Region and account shape
Default region: ap-southeast-1 (Singapore). SES inbound, Bedrock cross-Region inference, S3 Vectors, and EventBridge Scheduler are all in good shape there. A second region for resilience isn’t worth the setup at SMB volume — the failure mode here is a draft that shows up an hour late, not a regional outage, and nothing in this system is on a hard real-time path. One AWS account dedicated to the repurposer (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.
Topology
Lambda functions
All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.
source-sync— EventBridge Scheduler target, fires every few minutes. Uses the Google Drive API (service-account credentials in Secrets Manager undercr/drive/sa) to export any new or changed doc in the source folder as plain text and write it tos3://cr-source-store/<piece-id>.txtonly if it has changed since the last sync. Same pattern syncs the voice and rules docs tos3://cr-rules-source/. Memory: 256 MB. Timeout: 30 s.intake-cleaner— S3 PUT trigger ons3://cr-raw-inbound/. Parses the forwarded MIME, pulls the transcript body (or attachment), and strips timestamps, speaker labels, and filler with a small set of format rules (Otter, Zoom, Fireflies, and plain-paste formats handled; unknown formats fall back to a generic line-and-timestamp regex). Writes the cleaned plain text intos3://cr-source-store/taggedkind=transcript. No Bedrock. Memory: 256 MB. Timeout: 30 s.fetcher— Lambda Function URL (the paste-a-link form). Fetches the single pasted URL, runs a readability extraction (trafilatura, with areadability-lxmlfallback) to strip navigation, ads, and footer down to the article body, and writes plain text intos3://cr-source-store/taggedkind=web. Only fetches the exact URL submitted; no crawling, with an allowlist of schemes and a request timeout. Memory: 512 MB. Timeout: 30 s.points— S3 PUT trigger ons3://cr-source-store/. Splits the piece into passages (paragraph- and topic-boundary chunking, ~3–5 sentences each), calls Titan Text Embeddings V2 to embed each passage into the S3 Vectors indexcr-passages, calls Bedrock Haiku 4.5 to score each passage for postability, keeps the top N per the rules doc, and emits onecr.point_pickedevent per kept point with the point and its source passage as payload. Dropped passages emit nothing. Memory: 512 MB. Timeout: 120 s.drafter— EventBridge rule oncr.point_picked. Reads the voice and rules docs, drafts each requested format with Bedrock Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0viaglobal.anthropic.claude-haiku-4-5-20251001-v1:0); routes the hard pieces (kind=transcriptor a length/complexity flag) to Claude Sonnet 4.6 (global.anthropic.claude-sonnet-4-6-20250930-v1:0). Runs the source-check by embedding the draft and queryingcr-passagesfor the nearest passage, dropping any claim not supported and re-prompting once if the draft drifted. Trims to the platform length. Writes each draft tocr-draftsand posts it to the Slack review desk. Memory: 512 MB. Timeout: 120 s.approve-handler— Lambda Function URL, public withAuthType: NONE; verifies a Slack signature on the request body. Triggered by Slack interactive button clicks (Approve/Edit/Skip) and the Edit modal submission. Writes tocr-draftsandcr-audit; on approve, queues the draft on the post scheduler (a one-off EventBridge Scheduler rule per drip slot) or routes it to the review channel; on skip, optionally requests a backup point frompoints. Memory: 256 MB. Timeout: 15 s.drip— EventBridge Scheduler one-off target. Sends one approved draft to its channel at its scheduled time (Slack channel post, or a webhook to the configured post scheduler). Readscr-drafts, marks the draftsent, writes asentrow tocr-audit. No Bedrock. Memory: 256 MB. Timeout: 15 s.recap— EventBridge Scheduler target, weekly Sunday 6pm. Reads the past week’scr-audit; sends a recap to a configured Slack channel: pieces repurposed, drafts approved, edited, skipped, and the approve rate per format. The message is a plain summary table; no Bedrock. Memory: 256 MB.
Storage
- DynamoDB ·
cr-drafts— one row per draft. PK(piece_id, draft_id); attributes:format(thread/post/caption),point_tier,passage_ref(S3 Vectors id of the source passage),model_text,final_text,status(pending/approved/edited/skipped/sent). On-demand. - DynamoDB ·
cr-audit— one row per write action of any kind. PK(draft_id, ts); attributes:action(approve/edit/skip/sent),by_user,before,after,passage_ref. On-demand. No TTL — this is the long-term audit trail. - DynamoDB ·
cr-pieces— one row per loaded piece. PKpiece_id; attributes:title,kind(drive/transcript/web),source_link,loaded_at,passage_count,status. On-demand. - S3 Vectors ·
cr-passages— one vector per passage, Titan Text Embeddings V2 at 1024 dimensions, with metadata(piece_id, passage_id, position, text). Queried by thepointsscorer context and thedraftersource-check. - S3 ·
cr-source-store— cleaned plain-text pieces, one file each. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 2 years. - S3 ·
cr-rules-source— mirrored voice and rules docs as plain text. Versioning enabled. - S3 ·
cr-raw-inbound— raw inbound MIME from forwarded transcripts. Lifecycle to Glacier at 30 days; expiry at 1 year.
Bedrock
- Foundation models.
anthropic.claude-haiku-4-5-20251001-v1:0via the Global cross-Region inference profile for passage scoring and the routine drafting;anthropic.claude-sonnet-4-6-20250930-v1:0via its Global profile for the hard pieces only, selected by thedrafterfrom source kind and a complexity flag. - Embeddings.
amazon.titan-embed-text-v2:0at 1024 dimensions, one call per passage, written to the S3 Vectors indexcr-passages. This is what makes the grounding source-check a single nearest-neighbour lookup. - Quotas. Default account quotas are more than enough at SMB volume. The system only calls Bedrock when a piece is loaded; there is no background traffic.
EventBridge Scheduler config
cr-source-sync—rate(5 minutes). Target:source-syncLambda.cr-weekly-recap—cron(0 18 ? * SUN *)inTZ_NAME. Target:recapLambda.- Drip one-offs — created by
approve-handlerper approved draft at the slot the rules doc assigns (e.g. one a day at 9am local). Useat(YYYY-MM-DDTHH:MM:SS)expressions with--action-after-completion DELETEso each rule self-cleans after it firesdrip. - Backup-refill one-offs — created by
approve-handleron a skip when auto-refill is enabled, targetingpointsto draft a replacement from a backup point.
SES inbound and outbound
- Set the MX record on a dedicated subdomain (e.g.
repurpose.your-company.com) toinbound-smtp.ap-southeast-1.amazonaws.com. - SES inbound rule set
cr-inbound-rules: one rule with recipientrepurpose@your-company.com→ spam scan → S3 PUT tos3://cr-raw-inbound/<message-id>→ stop. The S3 PUT triggersintake-cleaner. - SES outbound for the weekly recap email (optional, if you prefer email to Slack): verify a sender identity at
repurposer@your-company.comwith DKIM and SPF on the parent domain. Out of sandbox by request.
IAM (least privilege per Lambda)
Each Lambda has its own role with policies scoped to exact ARNs. Sketch:
- points role:
s3:GetObjecton the source store;bedrock:InvokeModelon the Titan and Haiku ARNs;s3vectors:PutVectors+QueryVectorsoncr-passages;events:PutEventson the default bus;dynamodb:PutItemoncr-pieces. - drafter role:
s3:GetObjecton the rules source;bedrock:InvokeModelon the Haiku and Sonnet ARNs;s3vectors:QueryVectorsoncr-passages;dynamodb:PutItemoncr-drafts;secretsmanager:GetSecretValueon the Slack bot token; outbound network toslack.com. - approve-handler role:
dynamodb:PutItem+UpdateItemoncr-draftsandcr-audit;scheduler:CreateSchedulefor the drip one-offs;secretsmanager:GetSecretValueon the Slack signing secret;events:PutEventsfor the optional backup-refill. - intake-cleaner role:
s3:GetObjectoncr-raw-inbound;s3:PutObjectoncr-source-store. No Bedrock, no network egress. - source-sync and fetcher roles:
secretsmanager:GetSecretValueon the Google service-account secret (source-sync only);s3:PutObjecton the source and rules buckets; outbound network towww.googleapis.com(source-sync) or the open web with a scheme allowlist (fetcher).
Slack interactive flow
Drafts are posted to the review desk via the chat.postMessage Web API with Block Kit blocks: the draft text, the format and point tier, the source passage in a context block, and three action buttons. Button clicks are sent by Slack to the configured Interactivity request URL, which is the approve-handler Function URL. approve-handler verifies the Slack signing secret on the inbound request, parses the action_id (approve, edit, skip), opens a modal for Edit (pre-filled with the draft), and processes the response. Approve and Skip are one-tap; Edit submits through the modal.
The Slack app needs chat:write and im:write, and the Interactivity URL configured. The bot token lives in Secrets Manager under cr/slack/bot-token; the signing secret is cr/slack/signing-secret.
Observability and cost gates
- CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on
"error"+"throttle"+"timeout"to a CloudWatch metric for alerting. - Alarms:
drafterfailure rate > 1% in 24h; source-check drop rate > some threshold (a spike means the model is drifting and the prompt or model choice needs a look); approve-handler signature-verification failures > 5/hour (might mean the Slack secret rotated). - X-Ray: off by default. Not worth the cost at SMB volume.
- AWS Budgets: $15/month threshold, alarm at 80% and 100%, posts to SNS topic
cr-cost-alarmsubscribed to the admin’s email and Slack.
Config and secrets
Service-account credentials for the Drive API live in Secrets Manager under cr/drive/sa. Slack bot token and signing secret under cr/slack/*. The post-scheduler webhook (if you drip to an external scheduler instead of a Slack channel) under cr/scheduler/webhook. The configured timezone, drip slots, format mix, top-N point count, and model-routing thresholds all live in Parameter Store under /cr/config/, with the voice and rules docs themselves in Drive (mirrored to S3). Lambdas fetch config on cold start and cache for the lifetime of the execution environment.
Deploy
GitHub Actions with OIDC into a deploy role (no long-lived keys), building and shipping with AWS SAM. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for cr-source-store and cr-rules-source so a bad edit can be rolled back in one click, and keep the S3 Vectors index in its own stack so re-indexing never forces a full app redeploy. Total deployable surface: around eight Lambdas, three DDB tables, one S3 Vectors index, three S3 buckets, one EventBridge rule on the default bus (plus the Scheduler rules), one SES rule set, and one Budgets alarm.
That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.
All posts