Engineering reference: the content moderator architecture
Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the SQS review queue, the EventBridge config, the DynamoDB schemas, and the Slack interactive flow. Read alongside the previous six posts; this one’s the build sheet.
Region and account shape
Default region: ap-southeast-1 (Singapore). Lambda Function URLs, Bedrock Global cross-Region inference, SQS, and EventBridge are all available there. A second region for resilience isn’t worth the extra setup at SMB volume — the failure mode for an SMB is a flagged comment waiting an extra hour for review, not a regional outage. One AWS account dedicated to the moderator (separate from your other workloads) keeps the IAM blast radius small and lets one AWS Budgets alarm cover the whole system.
Topology
Lambda functions
All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.
webhook-in— Lambda Function URL,AuthType: NONE; verifies each platform’s HMAC signature (secrets per platform in Secrets Manager undercm/webhook/<platform>) before doing anything. Writes the raw payload tos3://cm-raw/<item-id>.jsonand returns 200 fast so the platform doesn’t retry. All real work is deferred to the S3 PUT trigger. Memory: 256 MB. Timeout: 10 s.intake— S3 PUT trigger ons3://cm-raw/. Strips HTML, normalizes text, extracts author/links/length/area, and upserts one record tocm-itemskeyed byitem_id(idempotent on platform retries). Runs the deterministic rule pass against the allow list, banned-word list, and blocked-domain list loaded froms3://cm-rules-source/. Onpassit marks the item published; onholdit emitscm.hold; onborderlineit async-invokeschecker. Memory: 256 MB. Timeout: 30 s. No Bedrock calls.checker— async-invoked byintakefor borderline items only. Readsrules.txtfroms3://cm-rules-source/and the worked-examples set for the area fromcm-examples. Calls Bedrock Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0viaglobal.anthropic.claude-haiku-4-5-20251001-v1:0) with a strict JSON-only contract:{verdict, confidence, rule}. Applies the per-area confidence threshold from the rules doc, then emitscm.hold,cm.send_to_human, orcm.hold_notify— or marks the item published on a confident pass. Memory: 512 MB. Timeout: 30 s.dispatch— EventBridge rule on the three move events. Resolves the reviewer (per-area, then admin fallback), checks quiet hours, groups repeat offenders by author, formats the card fromvoice.txt, and sends via the Slackchat.postMessageWeb API (Block Kit) or SESSendRawEmail. Enqueues the card reference to thecm-review-queueSQS queue and writes acm-queuerow so a re-drive won’t double-send. Acm.hold_notifybypasses quiet-hours batching. Memory: 256 MB. Timeout: 30 s.action-handler— Lambda Function URL, public withAuthType: NONE; verifies the Slack signing secret on the request body. Triggered by Slack interactive button clicks (Publish/Remove/Edit) and by email-link clicks. Calls the originating platform’s API to publish, remove, or post an edited version; writes the decision tocm-audit; on an overturn, appends a worked example tocm-examples; clears thecm-queueentry. Memory: 256 MB. Timeout: 15 s.drive-sync— EventBridge Scheduler target, every 15 minutes. Uses the Google Drive API (service-account credentials in Secrets Manager undercm/drive/sa) to export the rules and voice docs as plain text and write tos3://cm-rules-source/only if they changed since the last sync. Memory: 256 MB. Timeout: 30 s.digest— EventBridge Scheduler target, weekly Monday 9am inTZ_NAME. Reads the past week’scm-auditandcm-queue; calls Bedrock Haiku 4.5 once to write a short narrative summarizing holds, removals, overturns, and any dead-letter items; emails it via SES to the configured stakeholder list and posts a summary to a configured Slack channel. Memory: 512 MB. Timeout: 60 s.
Storage
- DynamoDB ·
cm-items— one row per item. PKitem_id; attributes:platform,area,author,text,links,rule_pass,state(published/held/removed/edited),raw_s3_key. On-demand. - DynamoDB ·
cm-queue— one row per dispatched card. PK(item_id, card_id); attributes:reviewer,sent_via(slack/email),verdict,rule,group_key. On-demand. Marks that a card was sent so re-drives don’t duplicate. - DynamoDB ·
cm-audit— one row per write action of any kind. PK(item_id, ts); attributes:action(publish/remove/edit),by_user,rule,before,after. On-demand. No TTL — this is the long-term audit trail. - DynamoDB ·
cm-examples— curated worked examples from moderator overturns. PKarea; sort keyts; attributes:text,system_verdict,human_decision,rule. Capped per area (most recent N) by a small compaction step indigest. On-demand. - S3 ·
cm-raw— raw inbound webhook payloads. Versioning enabled. Lifecycle to Glacier at 30 days; expiry at 2 years. - S3 ·
cm-rules-source— mirrored rules and voice docs as plain text. Versioning enabled, so a bad Drive edit rolls back in one click. - S3 ·
cm-originals— originals of edited items, kept so any edit-and-publish is reversible and auditable.
Bedrock
- Foundation model.
anthropic.claude-haiku-4-5-20251001-v1:0via the Global cross-Region inference profileglobal.anthropic.claude-haiku-4-5-20251001-v1:0. Two callsites:checkerfor the borderline verdict, anddigestfor the weekly narrative. A heavier reasoning path on Claude Sonnet 4.6 isn’t justified here — the verdict is a short, well-scoped classification, and Haiku 4.5 with worked examples handles it cheaply. - Embeddings. Not used. The house rules are a short doc fed straight into the prompt; deterministic lists plus a few worked examples beat vector retrieval at this scale. No Knowledge Base, no S3 Vectors. (If a customer’s rules ever grew past a single prompt, Amazon Titan Text Embeddings V2 at 1024 dimensions into Amazon S3 Vectors would be the path — not needed at SMB volume.)
- Quotas. Default account quotas are more than enough at SMB volume. The rule pass keeps most items off Bedrock entirely.
SQS review queue
cm-review-queue— standard SQS queue holding card references awaiting a moderator. Visibility timeout 5 min; the dispatch path is the producer, the Slack/SES send is the consumer side.cm-review-dlq— dead-letter queue,maxReceiveCount: 5. Anything that fails to send repeatedly lands here instead of looping or vanishing; the weeklydigestreads and reports DLQ depth.- Grouping — the dispatch computes a
group_keyof(author, rule)and folds new items for an existing open key into one card, so a burst of identical spam is one review, not fifty.
EventBridge config
cm-move-rule— rule on the default bus matchingcm.hold,cm.send_to_human,cm.hold_notify. Target:dispatchLambda.cm-drive-sync— Schedulerrate(15 minutes). Target:drive-syncLambda.cm-weekly-digest— Schedulercron(0 9 ? * 2#1 *)(Monday 9am, weekly cadence) inTZ_NAME. Target:digestLambda.- Notify path — a
cm.hold_notifyevent is matched by the samecm-move-rule;dispatchreads the event detail and skips quiet-hours batching for it.
Platform webhooks and APIs
- Each platform (community page, comment plugin, review source) is configured to POST new-content webhooks to the
webhook-inFunction URL with a shared signing secret. - Per-platform API credentials for the publish/remove/edit calls live in Secrets Manager under
cm/platform/<platform>. Theaction-handlerdispatches to the right client by the item’splatformattribute. - Platforms that don’t support editing a member’s content have the Edit button suppressed at card-compose time; only Publish and Remove are shown.
IAM (least privilege per Lambda)
Each Lambda has its own role with policies scoped to exact ARNs. Sketch:
- webhook-in role:
s3:PutObjectoncm-raw;secretsmanager:GetSecretValueon the per-platform webhook secrets. Nothing else. - intake role:
s3:GetObjectoncm-rawandcm-rules-source;dynamodb:PutItemoncm-items;events:PutEventson the default bus;lambda:InvokeFunctiononchecker. Nobedrock:*. - checker role:
s3:GetObjectoncm-rules-source;dynamodb:Queryoncm-examples;bedrock:InvokeModelon the Haiku ARN;events:PutEventson the default bus. - dispatch role:
sqs:SendMessageoncm-review-queue;secretsmanager:GetSecretValueon the Slack bot token;ses:SendRawEmailfrom the verified sender identity;dynamodb:PutItem+Queryoncm-queue; outbound network toslack.com. - action-handler role:
dynamodb:PutItemoncm-auditandcm-examples;dynamodb:UpdateItemoncm-items;secretsmanager:GetSecretValueon the Slack signing secret and the per-platform API secrets;s3:PutObjectoncm-originals; outbound network to the platform API hosts. - drive-sync and digest roles: drive-sync gets
secretsmanager:GetSecretValueon the Google service-account secret ands3:PutObjectoncm-rules-source; digest getsdynamodb:Queryoncm-audit/cm-queue,sqs:GetQueueAttributeson the DLQ,bedrock:InvokeModelon the Haiku ARN, andses:SendRawEmail.
Slack interactive flow
Cards are posted via the chat.postMessage Web API with Block Kit blocks containing the action buttons (Publish/Remove/Edit). Button clicks are sent by Slack to the configured Interactivity request URL, which is the action-handler Function URL. action-handler verifies the Slack signing secret, parses the action_id (publish, remove, edit), opens a modal for Edit, and processes the decision on submit. Publish and Remove are one-tap.
The Slack app needs chat:write and im:write, plus the Interactivity URL configured. The bot token lives in Secrets Manager under cm/slack/bot-token; the signing secret is cm/slack/signing-secret.
Observability and cost gates
- CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on
"error"+"throttle"+"timeout"to a metric for alerting. - Alarms:
webhook-in5xx rate > 1% in 5 min (dropped inbound content is the worst failure);cm-review-dlqdepth > 0;action-handlersignature-verification failures > 5/hour (might mean the Slack secret rotated). - X-Ray: off by default. Not worth the cost at SMB volume.
- AWS Budgets: $25/month threshold, alarm at 80% and 100%, posts to SNS topic
cm-cost-alarmsubscribed to the on-call admin’s email and Slack.
Config and secrets
Google service-account credentials for the Drive API live in Secrets Manager under cm/drive/sa. Slack bot token and signing secret under cm/slack/*. Per-platform webhook signing secrets under cm/webhook/* and per-platform API credentials under cm/platform/*. The configured timezone, quiet-hours window, per-area confidence thresholds, and admin fallback reviewer live in Parameter Store under /cm/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.
Deploy
GitHub Actions with OIDC into a deploy role — no long-lived AWS keys — running AWS SAM to ship the stack. The opinionated bits: turn on S3 versioning for cm-raw, cm-rules-source, and cm-originals; keep the dead-letter queue and its alarm in the same stack as the review queue so they ship together; and pin the EventBridge Scheduler timezone so the weekly digest doesn’t silently start running in UTC after a CI rotation. Total deployable surface: seven Lambdas, four DDB tables, three S3 buckets, one SQS queue plus its DLQ, one EventBridge rule on the default bus (plus the Scheduler rules), and one Budgets alarm.
That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.
All posts