Engineering reference: the survey analyzer architecture
Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, the S3 Vectors index, Lambda inventory, IAM scopes, EventBridge Scheduler config, the DynamoDB schemas, and the SNS fan-out. Read alongside the previous six posts; this one’s the build sheet.
Region and account shape
Default region: ap-southeast-1 (Singapore). SES inbound, Bedrock Global cross-Region inference, S3 Vectors, and EventBridge Scheduler are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for an SMB is a weekly summary that goes out a day late, not a regional outage. One AWS account dedicated to the analyzer (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.
Topology
Lambda functions
All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.
form-submit— Lambda Function URL,AuthType: NONE; verifies a shared secret (HMAC over the body) the survey form includes. Cleans the answer text, runs the urgent check via Bedrock Haiku 4.5, writes the answer tosa-answers, and mirrors the raw payload tos3://sa-answers-source/. On an urgent or callback verdict, publishes tosa-urgentor appends to the callback queue. Memory: 256 MB. Timeout: 15 s.intake-ses-parser— S3 PUT trigger ons3://sa-raw-mime/. Parses MIME, strips signatures and quoted reply trails, and extracts one or more answers from the body or from a CSV/XLSX attachment (openpyxlfor spreadsheets). For genuinely ambiguous layouts only, calls Bedrock Haiku 4.5 to split the text into discrete answers. Each extracted answer runs the same clean + urgent-check + store path asform-submit. Memory: 512 MB. Timeout: 60 s.drive-sync— EventBridge Scheduler target, fires every 15 minutes. Uses the Google Drive API + Sheets API (service-account credentials in Secrets Manager undersa/drive/sa) to export the answer sheet as CSV and write tos3://sa-answers-source/answers.csvonly if the sheet has changed since the last sync. A small importer step reads rows not yet seen and runs them through the clean + urgent-check + store path. Same pattern syncs the rules and voice docs tos3://sa-rules-source/. Memory: 256 MB. Timeout: 30 s.grouper— EventBridge Scheduler target, weekly (Sunday 10pm local inTZ_NAME). Reads the period’s answers fromsa-answers. Calls Bedrock Titan Text Embeddings V2 (amazon.titan-embed-text-v2:0, 1024-dim) for each new answer and upserts into thesa-vectorsS3 Vectors index. Pulls the vectors back, clusters with HDBSCAN inscikit-learn/hdbscan, drops clusters belowmin_theme_size, then calls Bedrock Haiku 4.5 (global.anthropic.claude-haiku-4-5-20251001-v1:0) once per surviving cluster to name it and confirm a representative quote (the answer nearest the cluster centroid). Writes the theme table tosa-themes. Memory: 1024 MB. Timeout: 300 s.summary— EventBridge Scheduler target, weekly just aftergrouper(chained via the Scheduler flexible time window, or invoked at the tail ofgrouper). Readssa-themesand the week’ssa-flags; calls Bedrock Sonnet 4.6 (global.anthropic.claude-sonnet-4-6-20250930-v1:0) to draft the summary; runs the count check (every number against the cluster sizes) and the quote check (every quote againstsa-answers); shapes to the voice doc and length cap; sends via SESSendRawEmail. Writes the run tosa-runs. Memory: 512 MB. Timeout: 120 s.callback-sweep— EventBridge Scheduler target, daily at 9am local. Reads the callback queue accumulated by intake, posts a single digest to the on-call Slack channel, and clears the entries it lists. No Bedrock. Memory: 256 MB. Timeout: 30 s.
Storage
- DynamoDB ·
sa-answers— one row per answer. PKanswer_id; attributes:survey,received_at,rating,raw_text,clean_text,urgent(bool),theme_id(set by the weekly grouper). GSI onreceived_atfor the weekly range read. On-demand. - DynamoDB ·
sa-themes— one row per theme per weekly run. PK(run_id, theme_id); attributes:name,count,quote_answer_id,centroid_ref. On-demand. - DynamoDB ·
sa-flags— one row per urgent-check outcome. PK(answer_id, ts); attributes:outcome(urgent/callback/normal),reason,notified. On-demand. No TTL — this is the long-term flag trail. - DynamoDB ·
sa-runs— one row per weekly run. PKrun_id; attributes:window_start,window_end,n_answers,n_themes,draft,final,checks_passed. On-demand. Keeps each summary reproducible. - S3 Vectors ·
sa-vectors— the answer embeddings. 1024-dim, cosine distance, metadata{answer_id, received_at}. Queried for clustering and centroid lookup. No always-on cluster. - S3 ·
sa-answers-source— mirrored CSV from the Drive sheet and raw form payloads. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 7 years. - S3 ·
sa-rules-source— mirrored rules and voice docs as plain text. Versioning enabled. - S3 ·
sa-raw-mime— raw inbound MIME from forwarded feedback. Lifecycle to Glacier at 30 days; expiry at 7 years.
Bedrock
- Embeddings.
amazon.titan-embed-text-v2:0, 1024 dimensions, normalized. One call per answer ingrouper; results stored in thesa-vectorsS3 Vectors index. - Cheap-path model.
anthropic.claude-haiku-4-5-20251001-v1:0via the Global cross-Region inference profileglobal.anthropic.claude-haiku-4-5-20251001-v1:0. Callsites: the per-answer urgent check (inform-submitandintake-ses-parser), the optional answer-splitting in the parser, and the per-theme naming ingrouper. - Heavier model.
anthropic.claude-sonnet-4-6-20250930-v1:0viaglobal.anthropic.claude-sonnet-4-6-20250930-v1:0. One callsite: the weeklysummarydraft, where weighing several themes and striking a tone justifies the heavier model. Fires once a week. - Quotas. Default account quotas are more than enough at SMB volume. The per-answer Haiku check is the highest-frequency callsite; it’s a short prompt with a one-token verdict.
EventBridge Scheduler config
sa-weekly-group—cron(0 22 ? * SUN *)in the SMB’s timezone. Target:grouperLambda.sa-weekly-summary—cron(30 22 ? * SUN *)in TZ (30 minutes after the grouper). Target:summaryLambda.sa-drive-sync—rate(15 minutes). Target:drive-syncLambda.sa-callback-sweep—cron(0 9 * * ? *)in TZ. Target:callback-sweepLambda.
SES inbound/outbound and SNS
- Set the MX record on a dedicated subdomain (e.g.
feedback.your-company.com) toinbound-smtp.ap-southeast-1.amazonaws.com. - SES inbound rule set
sa-inbound-rules: one rule with recipientfeedback@your-company.com→ spam scan → S3 PUT tos3://sa-raw-mime/<message-id>→ stop. The S3 PUT triggersintake-ses-parser. - SES outbound for the weekly summary: verify a sender identity at
insights@your-company.comwith DKIM and SPF on the parent domain. Out of sandbox by request. - SNS topic
sa-urgent: subscriptions for the on-call email and a Slack channel (via an AWS Chatbot configuration or a small relay Lambda). This is the path an angry answer takes within a minute of landing.
IAM (least privilege per Lambda)
Each Lambda has its own role with policies scoped to exact ARNs. Sketch:
- form-submit role:
dynamodb:PutItemonsa-answersandsa-flags;s3:PutObjectonsa-answers-source;bedrock:InvokeModelon the Haiku ARN;sns:Publishonsa-urgent;secretsmanager:GetSecretValueon the shared-secret. - grouper role:
dynamodb:Queryonsa-answers(thereceived_atGSI) +PutItemonsa-themes;bedrock:InvokeModelon the Titan and Haiku ARNs;s3vectors:PutVectors+QueryVectorson thesa-vectorsindex. No SES, no Sonnet. - summary role:
dynamodb:Queryonsa-themes,sa-flags, andsa-answers(for the quote check) +PutItemonsa-runs;bedrock:InvokeModelon the Sonnet ARN;ses:SendRawEmailfrom the verified sender identity. - intake-ses-parser role:
s3:GetObjectonsa-raw-mime;dynamodb:PutItemonsa-answersandsa-flags;bedrock:InvokeModelon the Haiku ARN;sns:Publishonsa-urgent. - drive-sync role:
secretsmanager:GetSecretValueon the Google service-account secret;s3:PutObjecton the answers and rules buckets;dynamodb:PutItemonsa-answers; outbound network towww.googleapis.com.
Urgent check and clustering internals
The urgent check is a single Haiku call with a JSON-only contract: {"verdict": "urgent|callback|normal", "reason": "<one line>"}. The system prompt embeds the rules-doc definition of urgent and forbids any action beyond classification — the model never drafts a reply. A verdict of urgent triggers an sns:Publish; callback appends to the callback queue; normal stores and moves on. The verdict is written to sa-flags in all three cases.
Clustering uses HDBSCAN over the 1024-dim vectors, which finds dense groups without forcing a fixed cluster count and labels sparse points as noise (the long tail). min_cluster_size is derived from min_theme_size in the rules doc. The count for a theme is len(cluster_members) — a plain integer, never model-generated. The representative quote is the member with the smallest cosine distance to the cluster centroid, resolved back to its answer_id and pulled verbatim from sa-answers.
Observability and cost gates
- CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on
"error"+"throttle"+"timeout"to a CloudWatch metric for alerting. - Alarms: grouper or summary failures > 0 in a week (the weekly pass is the one piece that has to run); urgent-publish failures > 0 (a missed angry customer is the costly miss); count-check or quote-check rejections > 2 in a run (might mean a prompt regression).
- X-Ray: off by default. Not worth the cost at SMB volume.
- AWS Budgets: $30/month threshold, alarm at 80% and 100%, posts to SNS topic
sa-cost-alarmsubscribed to the on-call admin’s email and Slack.
Config and secrets
Service-account credentials for the Drive and Sheets APIs live in Secrets Manager under sa/drive/sa. The form-submit shared secret lives under sa/form/secret; the Slack relay token (if used) under sa/slack/*. SES sender identity lives in IAM and the verified-domain config. The configured timezone, the urgent definition reference, min_theme_size, the summary length cap, and the on-call and owner addresses all live in Parameter Store under /sa/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.
Deploy
GitHub Actions with OIDC into a deploy role (no long-lived keys) running AWS SAM. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), create the S3 Vectors index in its own stack so a re-index doesn’t churn the rest, turn on S3 versioning for sa-answers-source and sa-rules-source so a bad Drive paste can be rolled back in one click, and pin the EventBridge Scheduler timezone so you don’t accidentally run the weekly pass in UTC after a CI rotation. Total deployable surface: around six Lambdas, four DDB tables, one S3 Vectors index, three S3 buckets, the Scheduler rules, one SES rule set, two SNS topics, and one Budgets alarm.
That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.
All posts