Engineering reference: the FAQ builder architecture
Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the SES inbound rule set, S3 Vectors config, EventBridge Scheduler config, the DynamoDB schemas, and the Slack interactive flow. Read alongside the previous six posts; this one’s the build sheet.
Region and account shape
Default region: ap-southeast-1 (Singapore). SES inbound, S3 Vectors, Bedrock cross-Region inference (Titan Text Embeddings V2 and Claude Haiku 4.5), and EventBridge Scheduler are all available there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for an SMB is a delayed FAQ proposal, not a regional outage. One AWS account dedicated to the builder (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.
Topology
Lambda functions
All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.
drive-sync— EventBridge Scheduler target, fires every 15 minutes. Uses the Google Drive API + Docs API (service-account credentials in Secrets Manager underfb/drive/sa) to export the live FAQ doc, the help docs, and any new chat transcripts and write them tos3://fb-docs-source/only if changed since the last sync. The same pattern syncs the rules and voice docs tos3://fb-rules-source/. On a help-doc change, it enqueues the changed doc for re-chunking and re-embedding. Memory: 256 MB. Timeout: 30 s.intake— S3 PUT trigger ons3://fb-raw-mime/(support email) and on new transcripts unders3://fb-docs-source/chat/. Parses MIME or transcript, extracts the customer’s question(s), strips signatures/greetings and PII (regex plus a small allowlist), then calls Bedrock Titan Text Embeddings V2 (amazon.titan-embed-text-v2:0, 1024-dim) to embed each cleaned question and writes the vector to thefb-questionsS3 Vectors index with metadata infb-clusters. The manual lane reaches the same cleaning/embedding code path viaack-handler. Memory: 512 MB. Timeout: 60 s.doc-embed— triggered bydrive-syncwhen a help doc changes. Chunks the doc into short passages (heading-aware, ~500 tokens), embeds each with Titan V2, and upserts to thefb-helpdocsS3 Vectors index keyed by(doc_id, chunk_id)with the source section as metadata. Old chunks for that doc are deleted first so retrieval never returns stale passages. Memory: 512 MB. Timeout: 120 s.grouper— EventBridge Scheduler target, daily at 6pm local (the schedule runs inTZ_NAME, e.g.Asia/Singapore). Reads the questions added since the last pass, queriesfb-questionsS3 Vectors for nearest neighbors, joins or starts clusters by thejoin_thresholddistance, increments times-seen counters over the configured window, and appliesmin_asks_for_candidatefrom the rules doc. Writes cluster state tofb-clustersand invokesdrafterper candidate. No Bedrock calls — the embedding already happened at intake. Memory: 512 MB. Timeout: 120 s.drafter— invoked bygrouperper candidate (or refresh). Embeds the cluster’s representative question, queriesfb-helpdocsS3 Vectors for the top k passages, calls Bedrock Claude Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0viaglobal.anthropic.claude-haiku-4-5-20251001-v1:0) with a grounded system prompt (answer from passages only; cite source; returnNOT COVEREDotherwise). Validates the citation maps to a retrieved passage; on failure routes the cluster to the human-write queue. Formats to the voice doc and writes the proposal tofb-proposals. For ambiguous, multi-part clusters where Haiku’s grounding is weak, it can escalate a single retry to Claude Sonnet 4.6 (global.anthropic.claude-sonnet-4-6-20250930-v1:0) — justified only because that retry is rare and a wrong public answer is expensive. Memory: 512 MB. Timeout: 60 s.dispatch— triggered when a proposal is written. Posts the proposal to the configured Slack channel viachat.postMessagewith Block Kit Approve/Edit/Reject buttons and the source link. Writes nothing to the FAQ; it only surfaces the proposal for review. Memory: 256 MB. Timeout: 30 s.ack-handler— Lambda Function URL, public withAuthType: NONE; verifies a Slack signature on the request body. Triggered by Slack button clicks (Approve/Edit/Reject), by the Edit modal submission, and by the manual-lane question form. On approve or edit, writes the entry to the live FAQ Google Doc via the Docs API, marks the cluster published infb-clusters, and adds it to the covered set. On reject, records the reason and marks the cluster rejected. Every action writes tofb-audit. Memory: 256 MB. Timeout: 15 s.digest— EventBridge Scheduler target, weekly Friday 5pm. Readsfb-proposalsandfb-clustersfor the past week; posts a Slack summary of what published, what’s waiting in review, and which clusters hit the human-write queue (gaps in the docs). No Bedrock; a plain summary. Memory: 256 MB.
Storage
- S3 Vectors ·
fb-questions— one vector per cleaned question. 1024-dim (Titan V2). Metadata:cluster_id,source(email/chat/manual),first_seen,seen_count. Used by the grouper’s nearest-neighbor search. - S3 Vectors ·
fb-helpdocs— one vector per help-doc chunk. 1024-dim. Metadata:doc_id,chunk_id,section,source_url. Used by the drafter’s retrieval. Re-embedded on doc change. - DynamoDB ·
fb-clusters— one row per cluster. PKcluster_id; attributes:centroid_ref,seen_count,window_counts,state(warm/candidate/published/rejected),covered_entry_id,priority. On-demand. - DynamoDB ·
fb-proposals— one row per drafted proposal. PK(cluster_id, draft_ts); attributes:question,answer,source_ref,model,status(pending/approved/edited/rejected). On-demand. - DynamoDB ·
fb-audit— one row per write action of any kind. PK(cluster_id, ts); attributes:action,by_user,before,after. On-demand. No TTL — this is the long-term audit trail. - S3 ·
fb-docs-source— mirrored FAQ doc, help docs, and chat transcripts. Versioning enabled. Lifecycle to Glacier at 90 days for transcripts; FAQ and help docs kept hot. - S3 ·
fb-rules-source— mirrored rules and voice docs as plain text. Versioning enabled. - S3 ·
fb-raw-mime— raw inbound MIME from the support inbox. Lifecycle to Glacier at 30 days; expiry at 1 year (questions are already extracted into the pile; the raw MIME is only kept for audit).
Bedrock
- Embeddings.
amazon.titan-embed-text-v2:0(Amazon Titan Text Embeddings V2), 1024-dim. Two callsites:intake/doc-embedfor the question and help-doc vectors. This is the dominant Bedrock cost — one embed per incoming question. - Drafting model.
anthropic.claude-haiku-4-5-20251001-v1:0via the Global cross-Region inference profileglobal.anthropic.claude-haiku-4-5-20251001-v1:0. One callsite:drafter, on candidate and refresh clusters only. - Escalation model.
global.anthropic.claude-sonnet-4-6-20250930-v1:0(Claude Sonnet 4.6), used only on a rare retry when Haiku’s grounded draft fails validation on a multi-part question. Capped at one retry per cluster per pass. - Quotas. Default account quotas are more than enough at SMB volume. The grouper itself doesn’t call Bedrock; embeddings batch the day’s questions, and the drafter fires a few times a week.
EventBridge Scheduler config
fb-daily-group—cron(0 18 * * ? *)in the SMB’s timezone. Target:grouperLambda.fb-drive-sync—rate(15 minutes). Target:drive-syncLambda.fb-weekly-digest—cron(0 17 ? * FRI *)in TZ. Target:digestLambda.- Re-embed on demand —
drive-syncinvokesdoc-embeddirectly when a help doc changes; no standing schedule needed.
SES inbound
- Set the MX record on a dedicated subdomain (e.g.
questions.your-company.com) toinbound-smtp.ap-southeast-1.amazonaws.com. - SES inbound rule set
fb-inbound-rules: one rule with recipientquestions@your-company.com→ spam scan → S3 PUT tos3://fb-raw-mime/<message-id>→ stop. The S3 PUT triggersintake. - No SES outbound identity is required — the builder delivers proposals to Slack, not email. If you want email proposals, verify a sender and add an
ses:SendRawEmailpath todispatch.
IAM (least privilege per Lambda)
Each Lambda has its own role with policies scoped to exact ARNs. Sketch:
- intake role:
s3:GetObjectonfb-raw-mimeand the chat prefix;bedrock:InvokeModelon the Titan ARN;s3vectors:PutVectorsonfb-questions;dynamodb:PutItemonfb-clusters. No access to the drafting model. - grouper role:
s3vectors:QueryVectorsonfb-questions;s3:GetObjecton the rules doc;dynamodb:Query+UpdateItemonfb-clusters;lambda:InvokeFunctionondrafter. Nobedrock:*. - drafter role:
s3vectors:QueryVectorsonfb-helpdocs;bedrock:InvokeModelon the Haiku and Sonnet ARNs;s3:GetObjecton the voice doc;dynamodb:PutItemonfb-proposals. - ack-handler role:
dynamodb:PutItemonfb-clusters,fb-proposals,fb-audit;secretsmanager:GetSecretValueon the Docs-API service-account secret and the Slack signing secret; outbound network access todocs.googleapis.comandslack.com;bedrock:InvokeModelon the Titan ARN for the manual lane’s embed;s3vectors:PutVectorsonfb-questions. - dispatch role:
secretsmanager:GetSecretValueon the Slack bot token; outbound network access toslack.com;dynamodb:GetItemonfb-proposals. - drive-sync / doc-embed roles:
secretsmanager:GetSecretValueon the Google service-account secret;s3:PutObjecton the docs and rules buckets;bedrock:InvokeModelon Titan ands3vectors:PutVectors/DeleteVectorsonfb-helpdocs(doc-embed only); outbound network towww.googleapis.com.
Slack interactive flow
Proposals are posted via the chat.postMessage Web API with Block Kit blocks containing the Approve/Edit/Reject buttons (the incoming webhook can’t carry interactive components). Button clicks are sent by Slack to the configured Interactivity request URL, which is the ack-handler Function URL. ack-handler verifies the Slack signing secret on the inbound request, parses the action_id (approve, edit, reject), opens a modal for Edit (pre-filled draft) and Reject (reason), and processes the response on modal submission. The manual-lane question form is a separate Block Kit modal opened from a slash command, submitting to the same Function URL.
The Slack app needs chat:write and the Interactivity URL configured. The bot token lives in Secrets Manager under fb/slack/bot-token; the signing secret under fb/slack/signing-secret.
Observability and cost gates
- CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on
"error"+"throttle"+"timeout"to a CloudWatch metric for alerting. - Alarms: grouper Lambda failures > 0 in a day (the daily pass is the one piece that has to run); drafter citation-validation failure rate > 30% in 24h (might mean the help docs drifted from how customers phrase questions); ack-handler signature-verification failures > 5/hour (might mean the Slack secret rotated).
- X-Ray: off by default. Not worth the cost at SMB volume.
- AWS Budgets: $15/month threshold, alarm at 80% and 100%, posts to SNS topic
fb-cost-alarmsubscribed to the on-call admin’s email and Slack.
Config and secrets
Service-account credentials for Drive and Docs APIs live in Secrets Manager under fb/drive/sa (one service account scoped to both). Slack bot token and signing secret under fb/slack/*. The join_threshold, min_asks_for_candidate, the count window, retrieval k, the configured timezone, and the review Slack channel all live in Parameter Store under /fb/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.
Deploy
GitHub Actions with OIDC into a deploy role (no long-lived keys) and AWS SAM for the stack. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for fb-docs-source and fb-rules-source so a bad Drive edit can be rolled back in one click, and version the EventBridge Scheduler timezone setting so you don’t accidentally start running the daily pass in UTC after a CI rotation. Keep the two S3 Vectors indexes (fb-questions, fb-helpdocs) in their own stack so a re-index never risks the rest. Total deployable surface: around eight Lambdas, three DDB tables, two S3 Vectors indexes, three S3 buckets, the Scheduler rules, one SES rule set, and one Budgets alarm.
That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.
All posts