Part 7 of 7 · Testimonial collector series ~8 min read

Engineering reference: the testimonial collector architecture

Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the SES inbound rule set, EventBridge Scheduler config, the DynamoDB schemas, and the Slack interactive flow. Read alongside the previous six posts; this one’s the build sheet.

Region and account shape

Default region: ap-southeast-1 (Singapore). SES inbound, Bedrock cross-Region inference, and EventBridge Scheduler are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for an SMB is a missed testimonial, not a regional outage. One AWS account dedicated to the collector (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.

Topology

AWS topology of the testimonial collector A topology diagram with three regions stacked vertically inside one AWS account boundary. Top region: ingress. Three boxes show the three intake lanes — a Drive sheet sync via the drive-sync Lambda triggered every 15 minutes by EventBridge Scheduler that mirrors the candidate CSV to s3://tc-list-source/, an SES inbound rule set with action S3 PUT to s3://tc-raw-mime/ plus the parser Lambda intake-ses-parser that calls Bedrock Haiku 4.5 to propose a candidate row for Slack approval, and a ratings-hook Function URL Lambda that accepts 5-star ratings from your review tool and adds candidates directly. Middle region: scheduled processing. The collector Lambda is triggered daily at 9am local by EventBridge Scheduler; it reads s3://tc-list-source/list.csv, iterates rows, computes days_since_moment per candidate, looks up the timing in s3://tc-rules-source/rules.txt, reads ask and reply state from DynamoDB, and emits one of two events to the EventBridge default bus per candidate that needs an action: tc.first_ask or tc.reminder. Bottom region: dispatch and sign-off. The dispatch Lambda is triggered by an EventBridge rule on those two event types; it resolves the contact, checks quiet hours and the holiday calendar, fetches the ask template from s3://tc-rules-source/voice.txt, and sends the ask via SES outbound with a link to the reply form. The reply-handler Function URL Lambda serves the reply form, stores the raw reply and permission, calls Bedrock Haiku 4.5 to clean it into a quote and to run a faithfulness check, and posts a Slack review card. The signoff-handler Function URL Lambda processes the Approve, Edit, and Discard button clicks, and on approve copies the quote to the approved sheet via the Google Sheets API. CloudWatch Logs collects from every Lambda at 7-day retention. Across the right edge: a small box labelled AWS Budgets alarm at $15 monthly threshold, posting to SNS topic tc-cost-alarm. A note at the bottom: nothing is published without permission and approval — and every interaction is logged to tc-audit. Ingress Lambda · drive-sync every 15 min Sheets API → s3://tc-list-source/ list.csv SES inbound rule set tc-inbound-rules action: S3 PUT s3://tc-raw-mime/ trigger: intake-ses-parser Lambda · ratings-hook Function URL 5-star from review tool shared-secret verified → candidate row Drive candidate list canonical store · mirrored to S3 Scheduled processing EventBridge Scheduler cron(0 9 * * ? *) in TZ_NAME target: collector Lambda + deferred one-offs Lambda · collector reads CSV from S3 + rules.txt + voice.txt computes days, picks one of four moves EventBridge default bus tc.first_ask tc.reminder (wait/stop → no event) one ask, one reminder Dispatch & sign-off Lambda · dispatch resolves contact, quiet hours, holidays; SES outbound ask + reply-form link Lambda · reply-handler Function URL form; stores raw + consent; Haiku clean + check; posts Slack card Lambda · signoff-handler Approve / Edit / Discard clicks; on approve copies to approved sheet Nothing is published without permission and approval — every interaction is logged to tc-audit.
Fig 7. AWS topology, in three regions of the diagram: ingress (three lanes into the list), scheduled processing (the daily collector tick emitting events), dispatch and sign-off (the ask ships, the reply is cleaned, and the human decision is recorded). Every Lambda is event- or schedule-driven; nothing is synchronous-chained.

Lambda functions

All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.

  • drive-sync — EventBridge Scheduler target, fires every 15 minutes. Uses the Google Drive API + Sheets API (service-account credentials in Secrets Manager under tc/drive/sa) to export the candidate sheet as CSV and write to s3://tc-list-source/list.csv only if the sheet has changed since the last sync. Same pattern syncs the rules and voice docs to s3://tc-rules-source/. Memory: 256 MB. Timeout: 30 s.
  • ratings-hook — Lambda Function URL, public with AuthType: NONE; verifies a shared secret (in Secrets Manager under tc/ratings/secret) on each inbound request from the review tool. Reads the score; if it clears the threshold in the rules doc, writes a candidate row to the Drive sheet via the Sheets API with the moment set to rating. Low scores are dropped. Memory: 256 MB. Timeout: 15 s.
  • intake-ses-parser — S3 PUT trigger on s3://tc-raw-mime/. Parses MIME, extracts the email body and the original sender. Calls Bedrock Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0 via global.anthropic.claude-haiku-4-5-20251001-v1:0) to decide whether the message is genuine praise and, if so, propose a candidate row (name, email, one-line summary, confidence). Posts the proposal to Slack via chat.postMessage with Approve/Edit/Discard buttons. Praise arrives as plain email text, so there is no document parsing on this path — no Textract. Memory: 512 MB. Timeout: 30 s.
  • collector — EventBridge Scheduler target, daily at 9am local time (the schedule expression runs in TZ_NAME set to the SMB’s timezone, e.g. Asia/Singapore). Reads s3://tc-list-source/list.csv and the rules and voice docs. For each row, computes days_since_moment, reads state from tc-asks and tc-state, applies the never-nag cool-down, and decides on a move. Emits one event per row that needs action: tc.first_ask or tc.reminder, with the candidate context as the event payload. Wait/stop emit nothing. Memory: 512 MB. Timeout: 60 s. No Bedrock calls.
  • dispatch — EventBridge rule on the two ask events. Resolves the contact, checks quiet hours and holiday calendar, formats the ask from the voice template, and ships via SES SendRawEmail with a signed reply-form link. On a quiet-hours or holiday defer, creates a one-off EventBridge Scheduler rule that re-invokes dispatch at the next available business minute. Writes a row to tc-asks after a successful send. Memory: 256 MB. Timeout: 30 s.
  • reply-handler — Lambda Function URL, public with AuthType: NONE; serves the reply form (GET, with a signed token tying it to the candidate) and accepts the submission (POST). On submit: writes the raw reply and the permission choice to tc-state and the list, untouched, first. If permission is granted, calls Bedrock Haiku 4.5 once to clean the text into a quote and once for the faithfulness check, then posts a Slack review card via chat.postMessage. If permission is declined, marks the candidate declined and writes the year-long do-not-ask entry. Memory: 512 MB. Timeout: 30 s.
  • signoff-handler — Lambda Function URL, public with AuthType: NONE; verifies a Slack signature on the request body. Triggered by Slack interactive button clicks (Approve/Edit/Discard). Writes to tc-state and tc-audit; on approve (or a small edit), copies the quote to the approved sheet via the Sheets API; on a large edit, flags for re-consent rather than auto-approving. Memory: 256 MB. Timeout: 15 s.
  • digest — EventBridge Scheduler target, weekly Sunday 6pm. Reads tc-asks and tc-state for the past week; sends a digest message to a configured Slack channel summarizing asks sent, replies received, and quotes awaiting sign-off. No Bedrock; the message is a plain summary table. Memory: 256 MB.
  • summary — EventBridge Scheduler target, monthly on the first Monday at 9am. Reads the past month’s tc-asks, tc-state, and tc-audit; calls Bedrock Haiku 4.5 to write a one-paragraph narrative (asks, reply rate, approvals, best new quotes); emails it via SES to the configured stakeholder list. Memory: 512 MB.

Storage

  • DynamoDB · tc-asks — one row per email sent. PK (customer_id, step); attributes: ask_date, sent_via (email), step (first_ask/reminder), moment. On-demand. No TTL.
  • DynamoDB · tc-state — one row per state change. PK customer_id; sort key state_date; attributes: state (replied/declined/approved/discarded), permission (bool), raw_reply, clean_quote, do_not_ask_until (if declined). On-demand.
  • DynamoDB · tc-audit — one row per write action of any kind. PK (customer_id, ts); attributes: action, by_user, before, after. On-demand. No TTL — this is the long-term consent and approval trail.
  • DynamoDB · tc-published — mirror of the approved sheet for fast lookup of what’s live. PK customer_id; attributes: quote, approved_by, approved_at, permission_ref. On-demand.
  • S3 · tc-list-source — mirrored CSV from the Drive candidate sheet. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 7 years.
  • S3 · tc-rules-source — mirrored rules and voice docs as plain text. Versioning enabled.
  • S3 · tc-raw-mime — raw inbound MIME from forwarded praise. Lifecycle to Glacier at 30 days; expiry at 7 years.
  • S3 · tc-replies — archived raw reply text and the permission record per submission, kept for the consent trail.

Bedrock

  • Foundation model. anthropic.claude-haiku-4-5-20251001-v1:0 via the Global cross-Region inference profile global.anthropic.claude-haiku-4-5-20251001-v1:0. Three callsites: intake-ses-parser (praise classification), reply-handler (clean-up + faithfulness check), and summary (monthly narrative). Heavier reasoning isn’t needed anywhere, so anthropic.claude-sonnet-4-6 is left out; if quote clean-up ever needed more nuance, the reply-handler is the one callsite that would justify it.
  • Embeddings. Not used. Each reply is cleaned on its own and the list is structured rows; there’s nothing to retrieve. No Knowledge Base, no S3 Vectors, no Titan Text Embeddings V2.
  • Quotas. Default account quotas are more than enough at SMB volume. The collector itself doesn’t call Bedrock; the model fires only on replies, forwarded praise, and the monthly summary.

EventBridge Scheduler config

  • tc-daily-tickcron(0 9 * * ? *) in the SMB’s timezone. Target: collector Lambda.
  • tc-drive-syncrate(15 minutes). Target: drive-sync Lambda.
  • tc-weekly-digestcron(0 18 ? * SUN *) in TZ. Target: digest Lambda.
  • tc-monthly-summarycron(0 9 ? * 2#1 *) (first Monday at 9am) in TZ. Target: summary Lambda.
  • One-off rules — created on the fly by dispatch when a quiet-hours or holiday defer is needed. Use at(YYYY-MM-DDTHH:MM:SS) expressions with --action-after-completion DELETE so the rule self-cleans.

SES inbound and outbound

  • Set the MX record on a dedicated subdomain (e.g. kudos.your-company.com) to inbound-smtp.ap-southeast-1.amazonaws.com.
  • SES inbound rule set tc-inbound-rules: one rule with recipient kudos@your-company.com → spam scan → S3 PUT to s3://tc-raw-mime/<message-id> → stop. The S3 PUT triggers intake-ses-parser.
  • SES outbound for the asks, reminders, and the monthly summary: verify a sender identity at hello@your-company.com with DKIM and SPF on the parent domain. Out of sandbox by request. Keep a dedicated configuration set so bounce and complaint rates on the ask emails are tracked separately from transactional mail.

IAM (least privilege per Lambda)

Each Lambda has its own role with policies scoped to exact ARNs. Sketch:

  • collector role: s3:GetObject on the list, rules, and voice keys; dynamodb:Query + GetItem on tc-asks, tc-state; events:PutEvents on the default bus. No bedrock:*.
  • dispatch role: events:ListSchedules + CreateSchedule for the deferred one-offs; secretsmanager:GetSecretValue on the reply-form signing secret; ses:SendRawEmail from the verified sender identity; dynamodb:PutItem on tc-asks.
  • reply-handler role: dynamodb:PutItem on tc-state and tc-audit; s3:PutObject on tc-replies; bedrock:InvokeModel on the Haiku ARN; secretsmanager:GetSecretValue on the Slack bot token and the reply-form signing secret; outbound network to slack.com.
  • signoff-handler role: dynamodb:PutItem on tc-state, tc-audit, tc-published; secretsmanager:GetSecretValue on the Sheets-API service-account secret and the Slack signing secret; outbound network to sheets.googleapis.com.
  • intake-ses-parser role: s3:GetObject on tc-raw-mime; bedrock:InvokeModel on the Haiku ARN; secretsmanager:GetSecretValue on the Slack bot token.
  • drive-sync and ratings-hook roles: secretsmanager:GetSecretValue on the Google service-account secret (and the ratings shared secret); s3:PutObject on the list and rules buckets; outbound network to www.googleapis.com.

Slack interactive flow

The alert and review messages are posted via the chat.postMessage Web API with Block Kit blocks containing the action buttons. Button clicks are sent by Slack to the configured Interactivity request URL, which is the signoff-handler Function URL (the praise-approval buttons from intake-ses-parser share the same handler, keyed by action_id). signoff-handler verifies the Slack signing secret on the inbound request, parses the action_id (approve, edit, discard, praise_approve, praise_edit, praise_discard), opens a modal when needed (Edit opens a modal; Approve and Discard are one-tap), and processes the response when the modal is submitted.

The Slack app needs chat:write and the Interactivity URL configured. The bot token lives in Secrets Manager under tc/slack/bot-token. The signing secret is tc/slack/signing-secret.

Observability and cost gates

  • CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on "error" + "throttle" + "timeout" to a CloudWatch metric for alerting.
  • Alarms: collector Lambda failures > 0 in a day (the daily tick is the one piece that has to run); SES bounce or complaint rate above the SES threshold (a noisy ask list is a real risk); signoff-handler signature-verification failures > 5/hour (might mean the Slack secret rotated).
  • X-Ray: off by default. Not worth the cost at SMB volume.
  • AWS Budgets: $15/month threshold, alarm at 80% and 100%, posts to SNS topic tc-cost-alarm subscribed to the on-call admin’s email and Slack.

Config and secrets

Service-account credentials for the Drive and Sheets APIs live in Secrets Manager under tc/drive/sa (one service account with scopes for both). Slack bot token and signing secret under tc/slack/*. The ratings shared secret under tc/ratings/secret and the reply-form signing key under tc/reply/signing. SES sender identity lives in IAM and the verified-domain config. The configured timezone, holiday list reference, quiet-hours window, per-moment timings, never-nag windows, rating threshold, and max_edit_distance all live in Parameter Store under /tc/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.

Deploy

GitHub Actions with OIDC into a deploy role (no long-lived keys) running AWS SAM. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for both tc-list-source and tc-rules-source so a bad Drive edit can be rolled back in one click, and version the EventBridge Scheduler timezone setting so you don’t accidentally start running the daily tick in UTC after a CI rotation. Total deployable surface: around nine Lambdas, four DDB tables, four S3 buckets, one EventBridge rule on the default bus (plus the Scheduler rules), one SES rule set, and one Budgets alarm.

That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.

All posts