Engineering reference: the survey analyzer architecture

Region and account shape

Default region: ap-southeast-1 (Singapore). SES inbound, Bedrock Global cross-Region inference, S3 Vectors, and EventBridge Scheduler are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for an SMB is a weekly summary that goes out a day late, not a regional outage. One AWS account dedicated to the analyzer (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.

Topology

Fig 7. AWS topology, in three regions of the diagram: ingress (three lanes into the answer store), scheduled processing (the weekly grouper building the theme table), summary and flags (the summary ships and per-answer urgents fan out via SNS). Every Lambda is event- or schedule-driven; nothing is synchronous-chained.

Lambda functions

All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.

form-submit — Lambda Function URL, AuthType: NONE; verifies a shared secret (HMAC over the body) the survey form includes. Cleans the answer text, runs the urgent check via Bedrock Haiku 4.5, writes the answer to sa-answers, and mirrors the raw payload to s3://sa-answers-source/. On an urgent or callback verdict, publishes to sa-urgent or appends to the callback queue. Memory: 256 MB. Timeout: 15 s.
intake-ses-parser — S3 PUT trigger on s3://sa-raw-mime/. Parses MIME, strips signatures and quoted reply trails, and extracts one or more answers from the body or from a CSV/XLSX attachment (openpyxl for spreadsheets). For genuinely ambiguous layouts only, calls Bedrock Haiku 4.5 to split the text into discrete answers. Each extracted answer runs the same clean + urgent-check + store path as form-submit. Memory: 512 MB. Timeout: 60 s.
drive-sync — EventBridge Scheduler target, fires every 15 minutes. Uses the Google Drive API + Sheets API (service-account credentials in Secrets Manager under sa/drive/sa) to export the answer sheet as CSV and write to s3://sa-answers-source/answers.csv only if the sheet has changed since the last sync. A small importer step reads rows not yet seen and runs them through the clean + urgent-check + store path. Same pattern syncs the rules and voice docs to s3://sa-rules-source/. Memory: 256 MB. Timeout: 30 s.
grouper — EventBridge Scheduler target, weekly (Sunday 10pm local in TZ_NAME). Reads the period’s answers from sa-answers. Calls Bedrock Titan Text Embeddings V2 (amazon.titan-embed-text-v2:0, 1024-dim) for each new answer and upserts into the sa-vectors S3 Vectors index. Pulls the vectors back, clusters with HDBSCAN in scikit-learn/hdbscan, drops clusters below min_theme_size, then calls Bedrock Haiku 4.5 (global.anthropic.claude-haiku-4-5-20251001-v1:0) once per surviving cluster to name it and confirm a representative quote (the answer nearest the cluster centroid). Writes the theme table to sa-themes. Memory: 1024 MB. Timeout: 300 s.
summary — EventBridge Scheduler target, weekly just after grouper (chained via the Scheduler flexible time window, or invoked at the tail of grouper). Reads sa-themes and the week’s sa-flags; calls Bedrock Sonnet 4.6 (global.anthropic.claude-sonnet-4-6-20250930-v1:0) to draft the summary; runs the count check (every number against the cluster sizes) and the quote check (every quote against sa-answers); shapes to the voice doc and length cap; sends via SES SendRawEmail. Writes the run to sa-runs. Memory: 512 MB. Timeout: 120 s.
callback-sweep — EventBridge Scheduler target, daily at 9am local. Reads the callback queue accumulated by intake, posts a single digest to the on-call Slack channel, and clears the entries it lists. No Bedrock. Memory: 256 MB. Timeout: 30 s.

Storage

DynamoDB · sa-answers — one row per answer. PK answer_id; attributes: survey, received_at, rating, raw_text, clean_text, urgent (bool), theme_id (set by the weekly grouper). GSI on received_at for the weekly range read. On-demand.
DynamoDB · sa-themes — one row per theme per weekly run. PK (run_id, theme_id); attributes: name, count, quote_answer_id, centroid_ref. On-demand.
DynamoDB · sa-flags — one row per urgent-check outcome. PK (answer_id, ts); attributes: outcome (urgent/callback/normal), reason, notified. On-demand. No TTL — this is the long-term flag trail.
DynamoDB · sa-runs — one row per weekly run. PK run_id; attributes: window_start, window_end, n_answers, n_themes, draft, final, checks_passed. On-demand. Keeps each summary reproducible.
S3 Vectors · sa-vectors — the answer embeddings. 1024-dim, cosine distance, metadata {answer_id, received_at}. Queried for clustering and centroid lookup. No always-on cluster.
S3 · sa-answers-source — mirrored CSV from the Drive sheet and raw form payloads. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 7 years.
S3 · sa-rules-source — mirrored rules and voice docs as plain text. Versioning enabled.
S3 · sa-raw-mime — raw inbound MIME from forwarded feedback. Lifecycle to Glacier at 30 days; expiry at 7 years.

Bedrock

Embeddings. amazon.titan-embed-text-v2:0, 1024 dimensions, normalized. One call per answer in grouper; results stored in the sa-vectors S3 Vectors index.
Cheap-path model. anthropic.claude-haiku-4-5-20251001-v1:0 via the Global cross-Region inference profile global.anthropic.claude-haiku-4-5-20251001-v1:0. Callsites: the per-answer urgent check (in form-submit and intake-ses-parser), the optional answer-splitting in the parser, and the per-theme naming in grouper.
Heavier model. anthropic.claude-sonnet-4-6-20250930-v1:0 via global.anthropic.claude-sonnet-4-6-20250930-v1:0. One callsite: the weekly summary draft, where weighing several themes and striking a tone justifies the heavier model. Fires once a week.
Quotas. Default account quotas are more than enough at SMB volume. The per-answer Haiku check is the highest-frequency callsite; it’s a short prompt with a one-token verdict.

EventBridge Scheduler config

sa-weekly-group — cron(0 22 ? * SUN *) in the SMB’s timezone. Target: grouper Lambda.
sa-weekly-summary — cron(30 22 ? * SUN *) in TZ (30 minutes after the grouper). Target: summary Lambda.
sa-drive-sync — rate(15 minutes). Target: drive-sync Lambda.
sa-callback-sweep — cron(0 9 * * ? *) in TZ. Target: callback-sweep Lambda.

Set the MX record on a dedicated subdomain (e.g. feedback.your-company.com) to inbound-smtp.ap-southeast-1.amazonaws.com.
SES inbound rule set sa-inbound-rules: one rule with recipient feedback@your-company.com → spam scan → S3 PUT to s3://sa-raw-mime/<message-id> → stop. The S3 PUT triggers intake-ses-parser.
SES outbound for the weekly summary: verify a sender identity at insights@your-company.com with DKIM and SPF on the parent domain. Out of sandbox by request.
SNS topic sa-urgent: subscriptions for the on-call email and a Slack channel (via an AWS Chatbot configuration or a small relay Lambda). This is the path an angry answer takes within a minute of landing.

IAM (least privilege per Lambda)

Each Lambda has its own role with policies scoped to exact ARNs. Sketch:

form-submit role: dynamodb:PutItem on sa-answers and sa-flags; s3:PutObject on sa-answers-source; bedrock:InvokeModel on the Haiku ARN; sns:Publish on sa-urgent; secretsmanager:GetSecretValue on the shared-secret.
grouper role: dynamodb:Query on sa-answers (the received_at GSI) + PutItem on sa-themes; bedrock:InvokeModel on the Titan and Haiku ARNs; s3vectors:PutVectors + QueryVectors on the sa-vectors index. No SES, no Sonnet.
summary role: dynamodb:Query on sa-themes, sa-flags, and sa-answers (for the quote check) + PutItem on sa-runs; bedrock:InvokeModel on the Sonnet ARN; ses:SendRawEmail from the verified sender identity.
intake-ses-parser role: s3:GetObject on sa-raw-mime; dynamodb:PutItem on sa-answers and sa-flags; bedrock:InvokeModel on the Haiku ARN; sns:Publish on sa-urgent.
drive-sync role: secretsmanager:GetSecretValue on the Google service-account secret; s3:PutObject on the answers and rules buckets; dynamodb:PutItem on sa-answers; outbound network to www.googleapis.com.

Urgent check and clustering internals

The urgent check is a single Haiku call with a JSON-only contract: {"verdict": "urgent|callback|normal", "reason": "<one line>"}. The system prompt embeds the rules-doc definition of urgent and forbids any action beyond classification — the model never drafts a reply. A verdict of urgent triggers an sns:Publish; callback appends to the callback queue; normal stores and moves on. The verdict is written to sa-flags in all three cases.

Clustering uses HDBSCAN over the 1024-dim vectors, which finds dense groups without forcing a fixed cluster count and labels sparse points as noise (the long tail). min_cluster_size is derived from min_theme_size in the rules doc. The count for a theme is len(cluster_members) — a plain integer, never model-generated. The representative quote is the member with the smallest cosine distance to the cluster centroid, resolved back to its answer_id and pulled verbatim from sa-answers.

Observability and cost gates

CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on "error" + "throttle" + "timeout" to a CloudWatch metric for alerting.
Alarms: grouper or summary failures > 0 in a week (the weekly pass is the one piece that has to run); urgent-publish failures > 0 (a missed angry customer is the costly miss); count-check or quote-check rejections > 2 in a run (might mean a prompt regression).
X-Ray: off by default. Not worth the cost at SMB volume.
AWS Budgets: $30/month threshold, alarm at 80% and 100%, posts to SNS topic sa-cost-alarm subscribed to the on-call admin’s email and Slack.

Config and secrets

Service-account credentials for the Drive and Sheets APIs live in Secrets Manager under sa/drive/sa. The form-submit shared secret lives under sa/form/secret; the Slack relay token (if used) under sa/slack/*. SES sender identity lives in IAM and the verified-domain config. The configured timezone, the urgent definition reference, min_theme_size, the summary length cap, and the on-call and owner addresses all live in Parameter Store under /sa/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.

Deploy

GitHub Actions with OIDC into a deploy role (no long-lived keys) running AWS SAM. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), create the S3 Vectors index in its own stack so a re-index doesn’t churn the rest, turn on S3 versioning for sa-answers-source and sa-rules-source so a bad Drive paste can be rolled back in one click, and pin the EventBridge Scheduler timezone so you don’t accidentally run the weekly pass in UTC after a CI rotation. Total deployable surface: around six Lambdas, four DDB tables, one S3 Vectors index, three S3 buckets, the Scheduler rules, one SES rule set, two SNS topics, and one Budgets alarm.

That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.

All posts