Engineering reference: the staff policy answerer architecture
Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, the S3 Vectors index config, Lambda inventory, IAM scopes, the Slack app config, the DynamoDB schemas, and the retrieval pipeline. Read alongside the previous six posts; this one’s the build sheet.
Region and account shape
Default region: ap-southeast-1 (Singapore). Bedrock (Claude Haiku 4.5 via Global cross-Region inference, Titan Text Embeddings V2), S3 Vectors, Lambda Function URLs, and SES inbound are all available there. A second region for resilience isn’t worth the setup at SMB volume — the failure mode is a staff member waiting a few minutes for HR instead of getting an instant answer, not a regional outage. One AWS account dedicated to the answerer (separate from other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system. All handbook content stays inside this account: the only data that leaves is the prompt-and-sections payload to Bedrock, which is not retained for training.
Topology
Lambda functions
All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.
spa-intake— Lambda Function URL,AuthType: NONE, verifies the Slack signing secret (spa/slack/signing-secret) on the raw request body before doing anything. Handles Slack URL verification, themessage(IM) andapp_mentionevents. Returns 200 within 3 s, then invokesanswererasynchronously with the normalized question. De-dupes on Slack’sX-Slack-Retry-Numheader so a slow downstream doesn’t double-answer. Memory: 256 MB. Timeout: 10 s.intake-email— S3 PUT trigger ons3://spa-raw-mime/. Parses MIME, strips quoted history and signatures, extracts the question text and sender, and invokesanswererwithreply_channel=email. Memory: 256 MB. Timeout: 30 s.answerer— invoked by the intake functions. Embeds the question with Titan Text Embeddings V2 (amazon.titan-embed-text-v2:0, 1024-dim), queriesspa-handbook-indexin S3 Vectors for top-k (k=8) nearest sections by cosine similarity, applies the confidence floor (drop if top score <SIM_FLOOR, default 0.62), keeps the best 3–5, and calls Claude Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0viaglobal.anthropic.claude-haiku-4-5-20251001-v1:0) with a grounded, citation-required prompt. Returns a structured result (answer text, cited section ids, confidence, off_limits flag) and invokesreply. Sonnet 4.6 (anthropic.claude-sonnet-4-6-...) is wired as an optional escalation for multi-part or cross-policy questions where Haiku declines, gated behind a flag and off by default. Memory: 512 MB. Timeout: 60 s.reply— invoked byanswerer. Runs the four guardrail gates (topic check againstspa-rules, citation trace-back, hedge downgrade, compose), formats per the voice template, attaches the deep section link, and ships via Slackchat.postMessage(bot tokenspa/slack/bot-token) or SESSendRawEmail. Writes a row tospa-log. Memory: 256 MB. Timeout: 30 s.indexer— EventBridge Scheduler target every 5 minutes, plus on-demand from the admin rebuild button. Uses the Google Drive + Docs API (service-account credentials in Secrets Manager underspa/drive/sa) to detect changed docs via the revision marker, exports each changed doc to text, splits on headings into sections (with a soft 1,200-token cap and overlap), computes a content hash per section, re-embeds only sections whose hash changed via Titan, and upserts them intospa-handbook-index(deleting vectors for removed sections). Writes a row tospa-audit. Memory: 1024 MB. Timeout: 120 s.gap-report— EventBridge Scheduler target, weekly Monday 9am inTZ_NAME. Scansspa-logfor the past week’soutcome=ask_hrrows, clusters them (embed each question, group by cosine proximity), and posts HR a ranked list of uncovered topics to the admin Slack channel. No model needed beyond the embeddings already inspa-log. Memory: 512 MB. Timeout: 60 s.
Storage
- S3 Vectors ·
spa-handbook-index— one vector per handbook section. 1024-dim (Titan V2), cosine distance. Metadata per vector:doc_id,section_id,heading,deep_link,content_hash,updated_at. Queried with a metadata filter to scope by doc when needed. - DynamoDB ·
spa-log— one row per question. PK(asker_id, ts); attributes:question,outcome(answered/ask_hr/off_limits),cited_sections,sim_top,reply_channel,q_embedding(reused by gap-report). On-demand. TTL 400 days on the raw question text; aggregates kept longer. - DynamoDB ·
spa-audit— one row per index refresh. PK(doc_id, ts); attributes:sections_touched,trigger(sync/manual),by_user(if manual). On-demand. No TTL — long-term freshness trail. - S3 ·
spa-handbook-source— mirrored plain-text export of each handbook doc, keyed bydoc_id/revision. Versioning enabled; this is what the deep link and the citation check resolve against. - S3 ·
spa-rules-source— mirrored rules and voice docs as plain text (off-limits list, escalation contacts, tone). Versioning enabled. - S3 ·
spa-raw-mime— raw inbound MIME from the email lane. Lifecycle to Glacier at 30 days; expiry at 1 year.
Bedrock
- Answer model.
anthropic.claude-haiku-4-5-20251001-v1:0via the Global cross-Region inference profileglobal.anthropic.claude-haiku-4-5-20251001-v1:0. One callsite:answerer. Prompt is grounded (sections only), citation-required, with an explicit instruction to decline rather than use outside knowledge.temperature: 0for deterministic answers. - Escalation model.
anthropic.claude-sonnet-4-6-...via its Global profile, behindESCALATE_TO_SONNET(default off). Only fires on multi-part questions Haiku declines and the search still has strong sections — the rare case where the reasoning, not the retrieval, is the bottleneck. - Embeddings.
amazon.titan-embed-text-v2:0, 1024-dim, normalized. Used byanswerer(query embedding),indexer(section embeddings), andgap-report(clustering). Embedding dim must match the index dim exactly. - Quotas. Default account quotas are plenty at SMB volume. The hot path is one Haiku call plus one Titan embedding per question.
EventBridge Scheduler config
spa-index-sync—rate(5 minutes). Target:indexerLambda.spa-gap-report—cron(0 9 ? * MON *)in the SMB’s timezone. Target:gap-reportLambda.- Manual reindex — not a Scheduler rule; the admin rebuild button invokes
indexerdirectly via the Function URL backing the Slack admin action.
Slack app config
The Slack app needs chat:write, im:write, im:history, and app_mentions:read. Event subscriptions point at the spa-intake Function URL: message.im and app_mention. Interactivity (the admin rebuild button and the ask-HR footer actions) also points at spa-intake, which routes admin actions to the indexer. The bot token lives in Secrets Manager under spa/slack/bot-token; the signing secret under spa/slack/signing-secret. The admin channel id and the per-topic HR contacts live in Parameter Store under /spa/config/.
SES inbound and outbound
- Set the MX record on a dedicated subdomain (e.g.
policy.your-company.com) toinbound-smtp.ap-southeast-1.amazonaws.com. - SES inbound rule set
spa-inbound-rules: one rule with recipientpolicy@your-company.com→ spam scan → S3 PUT tos3://spa-raw-mime/<message-id>→ stop. The S3 PUT triggersintake-email. - SES outbound for email replies: verify a sender identity at
policy@your-company.comwith DKIM and SPF on the parent domain. Out of sandbox by request.
IAM (least privilege per Lambda)
Each Lambda has its own role with policies scoped to exact ARNs. Sketch:
- spa-intake role:
secretsmanager:GetSecretValueon the Slack signing secret;lambda:InvokeFunctiononanswererandindexer. No Bedrock, no DynamoDB. - answerer role:
bedrock:InvokeModelon the Titan and Haiku ARNs (and the Sonnet ARN if escalation is enabled);s3vectors:QueryVectorsonspa-handbook-index;lambda:InvokeFunctiononreply;s3:GetObjectonspa-rules-source. - reply role:
secretsmanager:GetSecretValueon the Slack bot token;ses:SendRawEmailfrom the verified identity;dynamodb:PutItemonspa-log;s3:GetObjectonspa-handbook-source(deep-link + citation resolve); outbound network toslack.com. - indexer role:
secretsmanager:GetSecretValueonspa/drive/sa;bedrock:InvokeModelon the Titan ARN;s3vectors:PutVectors+DeleteVectorsonspa-handbook-index;s3:PutObjectonspa-handbook-sourceandspa-rules-source;dynamodb:PutItemonspa-audit; outbound network towww.googleapis.com. - gap-report role:
dynamodb:Queryonspa-log;secretsmanager:GetSecretValueon the Slack bot token; outbound network toslack.com.
Retrieval and grounding details
Chunking is heading-aware: each section is a heading plus its body, capped near 1,200 tokens with a small overlap so a rule that spans a page boundary isn’t split mid-sentence. The content_hash per section is what makes the sync incremental — unchanged hashes are skipped, so a one-line edit re-embeds one section, not the whole doc. The query keeps k=8 from S3 Vectors, applies SIM_FLOOR, then trims to the top 3–5 by score before the Haiku call. The grounded prompt requires the model to return JSON: {answer, cited_section_ids, declined}. The citation gate rejects any cited_section_id not in the pulled set; the hedge gate downgrades when sim_top < SIM_SOFT (default 0.70) or the answer contains hedge markers. Every threshold (SIM_FLOOR, SIM_SOFT, top-k, top-n) lives in Parameter Store so tuning needs no deploy.
Observability and cost gates
- CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on
"error"+"throttle"+"timeout"to a CloudWatch metric for alerting. - Alarms:
answerererror rate > 1% in 24h;spa-intakesignature-verification failures > 5/hour (might mean the Slack secret rotated);indexerfailures > 0 in a day (a stale index is a silent correctness bug); ask-HR rate spike > 2× baseline (might mean the handbook export broke). - X-Ray: off by default. Not worth the cost at SMB volume.
- AWS Budgets: $20/month threshold, alarm at 80% and 100%, posts to SNS topic
spa-cost-alarmsubscribed to the on-call admin’s email and Slack.
Config and secrets
Service-account credentials for the Drive and Docs APIs live in Secrets Manager under spa/drive/sa. Slack bot token and signing secret under spa/slack/*. SES sender identity lives in IAM and the verified-domain config. The off-limits topic list, per-topic HR contacts, timezone, retrieval thresholds, and the escalation flag all live in Parameter Store under /spa/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.
Deploy
GitHub Actions with OIDC into a deploy role (no long-lived keys) and AWS SAM for the stack. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for spa-handbook-source and spa-rules-source so a bad Drive export can be rolled back, and keep the S3 Vectors index dimension pinned to 1024 to match Titan V2 — a dimension mismatch is a silent retrieval failure. Total deployable surface: six Lambdas, one S3 Vectors index, two DynamoDB tables, three S3 buckets, two EventBridge Scheduler rules, one SES rule set, and one Budgets alarm.
That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.
All posts