Engineering reference: the testimonial collector architecture
Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the SES inbound rule set, EventBridge Scheduler config, the DynamoDB schemas, and the Slack interactive flow. Read alongside the previous six posts; this one’s the build sheet.
Region and account shape
Default region: ap-southeast-1 (Singapore). SES inbound, Bedrock cross-Region inference, and EventBridge Scheduler are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for an SMB is a missed testimonial, not a regional outage. One AWS account dedicated to the collector (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.
Topology
Lambda functions
All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.
drive-sync— EventBridge Scheduler target, fires every 15 minutes. Uses the Google Drive API + Sheets API (service-account credentials in Secrets Manager undertc/drive/sa) to export the candidate sheet as CSV and write tos3://tc-list-source/list.csvonly if the sheet has changed since the last sync. Same pattern syncs the rules and voice docs tos3://tc-rules-source/. Memory: 256 MB. Timeout: 30 s.ratings-hook— Lambda Function URL, public withAuthType: NONE; verifies a shared secret (in Secrets Manager undertc/ratings/secret) on each inbound request from the review tool. Reads the score; if it clears the threshold in the rules doc, writes a candidate row to the Drive sheet via the Sheets API with the moment set torating. Low scores are dropped. Memory: 256 MB. Timeout: 15 s.intake-ses-parser— S3 PUT trigger ons3://tc-raw-mime/. Parses MIME, extracts the email body and the original sender. Calls Bedrock Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0viaglobal.anthropic.claude-haiku-4-5-20251001-v1:0) to decide whether the message is genuine praise and, if so, propose a candidate row (name, email, one-line summary, confidence). Posts the proposal to Slack viachat.postMessagewith Approve/Edit/Discard buttons. Praise arrives as plain email text, so there is no document parsing on this path — no Textract. Memory: 512 MB. Timeout: 30 s.collector— EventBridge Scheduler target, daily at 9am local time (the schedule expression runs inTZ_NAMEset to the SMB’s timezone, e.g.Asia/Singapore). Readss3://tc-list-source/list.csvand the rules and voice docs. For each row, computesdays_since_moment, reads state fromtc-asksandtc-state, applies the never-nag cool-down, and decides on a move. Emits one event per row that needs action:tc.first_askortc.reminder, with the candidate context as the event payload. Wait/stop emit nothing. Memory: 512 MB. Timeout: 60 s. No Bedrock calls.dispatch— EventBridge rule on the two ask events. Resolves the contact, checks quiet hours and holiday calendar, formats the ask from the voice template, and ships via SESSendRawEmailwith a signed reply-form link. On a quiet-hours or holiday defer, creates a one-off EventBridge Scheduler rule that re-invokesdispatchat the next available business minute. Writes a row totc-asksafter a successful send. Memory: 256 MB. Timeout: 30 s.reply-handler— Lambda Function URL, public withAuthType: NONE; serves the reply form (GET, with a signed token tying it to the candidate) and accepts the submission (POST). On submit: writes the raw reply and the permission choice totc-stateand the list, untouched, first. If permission is granted, calls Bedrock Haiku 4.5 once to clean the text into a quote and once for the faithfulness check, then posts a Slack review card viachat.postMessage. If permission is declined, marks the candidate declined and writes the year-long do-not-ask entry. Memory: 512 MB. Timeout: 30 s.signoff-handler— Lambda Function URL, public withAuthType: NONE; verifies a Slack signature on the request body. Triggered by Slack interactive button clicks (Approve/Edit/Discard). Writes totc-stateandtc-audit; on approve (or a small edit), copies the quote to the approved sheet via the Sheets API; on a large edit, flags for re-consent rather than auto-approving. Memory: 256 MB. Timeout: 15 s.digest— EventBridge Scheduler target, weekly Sunday 6pm. Readstc-asksandtc-statefor the past week; sends a digest message to a configured Slack channel summarizing asks sent, replies received, and quotes awaiting sign-off. No Bedrock; the message is a plain summary table. Memory: 256 MB.summary— EventBridge Scheduler target, monthly on the first Monday at 9am. Reads the past month’stc-asks,tc-state, andtc-audit; calls Bedrock Haiku 4.5 to write a one-paragraph narrative (asks, reply rate, approvals, best new quotes); emails it via SES to the configured stakeholder list. Memory: 512 MB.
Storage
- DynamoDB ·
tc-asks— one row per email sent. PK(customer_id, step); attributes:ask_date,sent_via(email),step(first_ask/reminder),moment. On-demand. No TTL. - DynamoDB ·
tc-state— one row per state change. PKcustomer_id; sort keystate_date; attributes:state(replied/declined/approved/discarded),permission(bool),raw_reply,clean_quote,do_not_ask_until(if declined). On-demand. - DynamoDB ·
tc-audit— one row per write action of any kind. PK(customer_id, ts); attributes:action,by_user,before,after. On-demand. No TTL — this is the long-term consent and approval trail. - DynamoDB ·
tc-published— mirror of the approved sheet for fast lookup of what’s live. PKcustomer_id; attributes:quote,approved_by,approved_at,permission_ref. On-demand. - S3 ·
tc-list-source— mirrored CSV from the Drive candidate sheet. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 7 years. - S3 ·
tc-rules-source— mirrored rules and voice docs as plain text. Versioning enabled. - S3 ·
tc-raw-mime— raw inbound MIME from forwarded praise. Lifecycle to Glacier at 30 days; expiry at 7 years. - S3 ·
tc-replies— archived raw reply text and the permission record per submission, kept for the consent trail.
Bedrock
- Foundation model.
anthropic.claude-haiku-4-5-20251001-v1:0via the Global cross-Region inference profileglobal.anthropic.claude-haiku-4-5-20251001-v1:0. Three callsites:intake-ses-parser(praise classification),reply-handler(clean-up + faithfulness check), andsummary(monthly narrative). Heavier reasoning isn’t needed anywhere, soanthropic.claude-sonnet-4-6is left out; if quote clean-up ever needed more nuance, the reply-handler is the one callsite that would justify it. - Embeddings. Not used. Each reply is cleaned on its own and the list is structured rows; there’s nothing to retrieve. No Knowledge Base, no S3 Vectors, no Titan Text Embeddings V2.
- Quotas. Default account quotas are more than enough at SMB volume. The collector itself doesn’t call Bedrock; the model fires only on replies, forwarded praise, and the monthly summary.
EventBridge Scheduler config
tc-daily-tick—cron(0 9 * * ? *)in the SMB’s timezone. Target:collectorLambda.tc-drive-sync—rate(15 minutes). Target:drive-syncLambda.tc-weekly-digest—cron(0 18 ? * SUN *)in TZ. Target:digestLambda.tc-monthly-summary—cron(0 9 ? * 2#1 *)(first Monday at 9am) in TZ. Target:summaryLambda.- One-off rules — created on the fly by
dispatchwhen a quiet-hours or holiday defer is needed. Useat(YYYY-MM-DDTHH:MM:SS)expressions with--action-after-completion DELETEso the rule self-cleans.
SES inbound and outbound
- Set the MX record on a dedicated subdomain (e.g.
kudos.your-company.com) toinbound-smtp.ap-southeast-1.amazonaws.com. - SES inbound rule set
tc-inbound-rules: one rule with recipientkudos@your-company.com→ spam scan → S3 PUT tos3://tc-raw-mime/<message-id>→ stop. The S3 PUT triggersintake-ses-parser. - SES outbound for the asks, reminders, and the monthly summary: verify a sender identity at
hello@your-company.comwith DKIM and SPF on the parent domain. Out of sandbox by request. Keep a dedicated configuration set so bounce and complaint rates on the ask emails are tracked separately from transactional mail.
IAM (least privilege per Lambda)
Each Lambda has its own role with policies scoped to exact ARNs. Sketch:
- collector role:
s3:GetObjecton the list, rules, and voice keys;dynamodb:Query+GetItemontc-asks,tc-state;events:PutEventson the default bus. Nobedrock:*. - dispatch role:
events:ListSchedules+CreateSchedulefor the deferred one-offs;secretsmanager:GetSecretValueon the reply-form signing secret;ses:SendRawEmailfrom the verified sender identity;dynamodb:PutItemontc-asks. - reply-handler role:
dynamodb:PutItemontc-stateandtc-audit;s3:PutObjectontc-replies;bedrock:InvokeModelon the Haiku ARN;secretsmanager:GetSecretValueon the Slack bot token and the reply-form signing secret; outbound network toslack.com. - signoff-handler role:
dynamodb:PutItemontc-state,tc-audit,tc-published;secretsmanager:GetSecretValueon the Sheets-API service-account secret and the Slack signing secret; outbound network tosheets.googleapis.com. - intake-ses-parser role:
s3:GetObjectontc-raw-mime;bedrock:InvokeModelon the Haiku ARN;secretsmanager:GetSecretValueon the Slack bot token. - drive-sync and ratings-hook roles:
secretsmanager:GetSecretValueon the Google service-account secret (and the ratings shared secret);s3:PutObjecton the list and rules buckets; outbound network towww.googleapis.com.
Slack interactive flow
The alert and review messages are posted via the chat.postMessage Web API with Block Kit blocks containing the action buttons. Button clicks are sent by Slack to the configured Interactivity request URL, which is the signoff-handler Function URL (the praise-approval buttons from intake-ses-parser share the same handler, keyed by action_id). signoff-handler verifies the Slack signing secret on the inbound request, parses the action_id (approve, edit, discard, praise_approve, praise_edit, praise_discard), opens a modal when needed (Edit opens a modal; Approve and Discard are one-tap), and processes the response when the modal is submitted.
The Slack app needs chat:write and the Interactivity URL configured. The bot token lives in Secrets Manager under tc/slack/bot-token. The signing secret is tc/slack/signing-secret.
Observability and cost gates
- CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on
"error"+"throttle"+"timeout"to a CloudWatch metric for alerting. - Alarms: collector Lambda failures > 0 in a day (the daily tick is the one piece that has to run); SES bounce or complaint rate above the SES threshold (a noisy ask list is a real risk); signoff-handler signature-verification failures > 5/hour (might mean the Slack secret rotated).
- X-Ray: off by default. Not worth the cost at SMB volume.
- AWS Budgets: $15/month threshold, alarm at 80% and 100%, posts to SNS topic
tc-cost-alarmsubscribed to the on-call admin’s email and Slack.
Config and secrets
Service-account credentials for the Drive and Sheets APIs live in Secrets Manager under tc/drive/sa (one service account with scopes for both). Slack bot token and signing secret under tc/slack/*. The ratings shared secret under tc/ratings/secret and the reply-form signing key under tc/reply/signing. SES sender identity lives in IAM and the verified-domain config. The configured timezone, holiday list reference, quiet-hours window, per-moment timings, never-nag windows, rating threshold, and max_edit_distance all live in Parameter Store under /tc/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.
Deploy
GitHub Actions with OIDC into a deploy role (no long-lived keys) running AWS SAM. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for both tc-list-source and tc-rules-source so a bad Drive edit can be rolled back in one click, and version the EventBridge Scheduler timezone setting so you don’t accidentally start running the daily tick in UTC after a CI rotation. Total deployable surface: around nine Lambdas, four DDB tables, four S3 buckets, one EventBridge rule on the default bus (plus the Scheduler rules), one SES rule set, and one Budgets alarm.
That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.
All posts