Engineering reference: the form intake router architecture
Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the Function URL setup, the SQS and dead-letter queue config, the DynamoDB schemas, and the idempotency design. Read alongside the previous six posts; this one’s the build sheet.
Region and account shape
Default region: ap-southeast-1 (Singapore). Lambda Function URLs, SQS, SES, Bedrock cross-Region inference, and DynamoDB are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for an SMB is a downstream tool being briefly unreachable, which the queue already handles, not a regional outage. One AWS account dedicated to the router (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.
Topology
Lambda functions
All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.
intake-door— Lambda Function URL,AuthType: NONE, CORS locked to your site origins. On each POST it parses the body, writes the raw payload tos3://fir-raw/<submission_key>, performs a conditionalPutItemon thesubmissionstable keyed bysubmission_key(so a duplicate re-post is a no-op that returns the same 200), and returns the acknowledgment. It then invokescheckerasynchronously (InvocationType: Event) so the customer round-trip never waits on validation. Memory: 256 MB. Timeout: 10 s.email-adapter— S3 PUT trigger ons3://fir-raw-mime/. Parses the inbound MIME from the email-fallback lane, maps the email’s fields to the canonical submission shape with asubmission_keyderived from the message id, writes tos3://fir-raw/and thesubmissionstable, and invokeschecker. Falls back to a permissive field parser for plain-text bodies. Memory: 256 MB. Timeout: 30 s.checker— invoked byintake-door/email-adapter. Reads the saved submission ands3://fir-rules-source/rules.json. Validates required fields and formats; runs the deterministic spam stack (honeypot, time-on-page, per-address rate limit via thefir-rate-limittable, banned patterns). Calls Bedrock Haiku 4.5 only on a borderline spam score or an unknown category. Resolves the routing rule froms3://fir-rules-source/routing.json. Onrouted-ready, emits one SQS message per delivery to thefir-deliveriesqueue; onheld, updates status and stops. Memory: 512 MB. Timeout: 30 s.delivery-worker— SQS event source onfir-deliverieswithBatchSize: 1andReportBatchItemFailuresenabled (partial-batch responses). Per job, switches ondelivery_type:team_email/customer_replyvia SESSendEmail;crmvia the CRM API;sheetvia the Sheets API. Before sending it checksfir-deliveriesfor an existing row keyed by(submission_key, delivery_type); if present it acks the message without re-sending. On success it writes that row with a conditional put and anfir-auditentry. On failure it raises, letting SQS redrive per the queue’smaxReceiveCount. Memory: 256 MB. Timeout: 30 s.sheet-sync— EventBridge Scheduler target, every 15 minutes. Uses the Google Sheets API (service-account credentials in Secrets Manager underfir/google/sa) to export the routing sheet and the rules tab as JSON and write tos3://fir-rules-source/only if changed since the last sync. Memory: 256 MB. Timeout: 30 s.dlq-alarm— SQS event source onfir-deliveries-dlq. On any message, marks the affected submission’s delivery asdead-letteredinfir-deliveries, writesfir-audit, and publishes to thefir-cost-alarmSNS topic’s siblingfir-ops-alarmfor the on-call admin. Does not delete the DLQ message; an operator replays after the downstream tool recovers. Memory: 256 MB.replay— manual / admin Function URL (IAM-auth). Redrives messages fromfir-deliveries-dlqback tofir-deliveriesin batches, or re-drives a specific submission from its saved record. Used after an outage is resolved. Memory: 256 MB. Timeout: 60 s.held-review— Function URL behind the internal held-queue UI. Listsheldsubmissions with theirheld_reason; on release, re-runscheckerwith the spam stack bypassed for that one submission so a false-positive lead routes normally. Memory: 256 MB.
Storage and queues
- DynamoDB ·
submissions— one row per submission. PKsubmission_key; attributes:form_id,category,fields,status(received/checking/held/routed-ready),held_reason,received_at. On-demand. TTL on a copy of the raw fields at 90 days (the S3 payload is the long-term store). - DynamoDB ·
fir-deliveries— one row per completed or dead-lettered delivery. PK(submission_key, delivery_type); attributes:result(sent/dead-lettered/duplicate-skipped),target,sent_at,attempts. On-demand. The conditional write on this table is the de-dup guard. - DynamoDB ·
fir-audit— one row per action of any kind. PK(submission_key, ts); attributes:action,delivery_type,result,actor. On-demand. No TTL — long-term audit trail. - DynamoDB ·
fir-rate-limit— sliding-window counters for the spam rate check. PKsource_ip; attributecountwith a short TTL so windows self-expire. On-demand. - SQS ·
fir-deliveries— standard queue, visibility timeout 6× the worker timeout,maxReceiveCount: 6in the redrive policy tofir-deliveries-dlq. Backoff is the natural product of visibility-timeout redelivery. - SQS ·
fir-deliveries-dlq— dead-letter queue, 14-day retention, triggersdlq-alarm. - S3 ·
fir-raw— canonical raw payload per submission. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 7 years. - S3 ·
fir-raw-mime— raw inbound MIME from the email-fallback lane. Lifecycle to Glacier at 30 days; expiry at 7 years. - S3 ·
fir-rules-source— mirrored routing table and rules as JSON. Versioning enabled so a bad sheet edit rolls back in one click.
Bedrock
- Foundation model.
anthropic.claude-haiku-4-5-20251001-v1:0via the Global cross-Region inference profileglobal.anthropic.claude-haiku-4-5-20251001-v1:0. Two callsites, both inchecker: the borderline-spam second-opinion and the category guess for generic contact forms. Heavier reasoning (Sonnet 4.6) is not used — neither call justifies it. - Embeddings. Not used. Routing is a deterministic table lookup keyed by
form_idand category; spam is rule-driven. No Knowledge Base, no S3 Vectors. - Quotas. Default account quotas are more than enough at SMB volume. The hot path doesn’t call Bedrock; only the minority of submissions that are borderline or uncategorized do.
Function URL and ingress
- The
intake-doorFunction URL isAuthType: NONE(public, by design — it’s a form endpoint) with CORSAllowOriginspinned to your site domains andAllowMethods: POST. The function rejects bodies over a small size cap and requires theform_idandsubmission_keyfields. - Abuse control is layered, not at the edge: the spam stack’s rate limit (the
fir-rate-limittable) plus the honeypot and time-on-page checks. A reserved-concurrency cap onintake-doorbounds a flood’s blast radius. - For the email-fallback lane, set the MX record on a dedicated subdomain (e.g.
forms.your-company.com) toinbound-smtp.ap-southeast-1.amazonaws.com; the SES inbound rule setfir-inbound-rulesdoes spam-scan → S3 PUT tos3://fir-raw-mime/<message-id>→ stop.
IAM (least privilege per Lambda)
Each Lambda has its own role with policies scoped to exact ARNs. Sketch:
- intake-door role:
s3:PutObjectonfir-raw;dynamodb:PutItem(conditional) onsubmissions;lambda:InvokeFunctiononchecker. Nobedrock:*, no SES, no SQS. - checker role:
s3:GetObjectonfir-rawandfir-rules-source;dynamodb:GetItem+UpdateItemonsubmissionsandfir-rate-limit;sqs:SendMessageonfir-deliveries;bedrock:InvokeModelon the Haiku ARN. - delivery-worker role:
sqs:ReceiveMessage+DeleteMessageonfir-deliveries;ses:SendEmailfrom the verified sender identity;dynamodb:PutItem(conditional) onfir-deliveriesandPutItemonfir-audit;secretsmanager:GetSecretValueon the CRM and Sheets secrets; outbound network access to the CRM API host andsheets.googleapis.com. - dlq-alarm / replay roles:
sqs:*scoped tofir-deliveries-dlqand (replay only)sqs:SendMessageonfir-deliveries;sns:Publishonfir-ops-alarm;dynamodb:UpdateItemonfir-deliveriesandfir-audit. - sheet-sync role:
secretsmanager:GetSecretValueonfir/google/sa;s3:PutObjectonfir-rules-source; outbound network tosheets.googleapis.com.
Idempotency and exactly-once-effect
The system is “at-least-once” end to end and leans on two conditional writes to get exactly-once effect. At ingress, the submissions PutItem is conditional on attribute_not_exists(submission_key) — a duplicate re-post finds the row present, skips creation, and returns the original acknowledgment. At egress, the fir-deliveries PutItem is conditional on attribute_not_exists((submission_key, delivery_type)) — a redelivered SQS message whose work already completed is detected and acked without a second send. Both keys are derived deterministically (the browser-generated submission_key, and the fixed delivery_type enum), so retries always collide with their own prior success rather than creating a new effect.
SES outbound
- Verify a sender identity at
forms@your-company.comwith DKIM and SPF on the parent domain; request production access out of sandbox. - Two template families in the rules doc: the team-notification template (full submission, link to the held-review UI) and the per-form customer auto-reply template. Both rendered by
delivery-worker; no model in the rendering path. - Set the SES configuration set to publish bounce and complaint events to an SNS topic so a customer-reply address that hard-bounces is flagged rather than silently retried.
Observability and cost gates
- CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on
"error"+"throttle"+"timeout"to a CloudWatch metric for alerting. - Alarms:
fir-deliveries-dlqdepth > 0 (a delivery type is failing);intake-door5xx rate > 1% in 5 min (the door is the one piece that must always answer); checker error rate > 1% in 24h. - X-Ray: off by default. Not worth the cost at SMB volume.
- AWS Budgets: $15/month threshold, alarm at 80% and 100%, posts to SNS topic
fir-cost-alarmsubscribed to the on-call admin’s email and Slack.
Config and secrets
Service-account credentials for the Sheets API live in Secrets Manager under fir/google/sa. CRM API tokens live under fir/crm/token. The SES sender identity lives in IAM and the verified-domain config. The allowed CORS origins, the spam thresholds, the rate-limit window, the category list, and the admin fallback address all live in Parameter Store under /fir/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.
Deploy
GitHub Actions with OIDC into a deploy role (no long-lived keys) running AWS SAM. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for fir-raw and fir-rules-source so a bad payload or sheet edit can be rolled back, and keep the SQS redrive policy and DLQ in the same stack as the worker so maxReceiveCount and the queue are versioned together. Total deployable surface: around eight Lambdas, four DynamoDB tables, two SQS queues (main + DLQ), three S3 buckets, one EventBridge Scheduler rule, one SES rule set, and one Budgets alarm.
That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.
All posts