Engineering reference: the staff policy answerer architecture

Region and account shape

Default region: ap-southeast-1 (Singapore). Bedrock (Claude Haiku 4.5 via Global cross-Region inference, Titan Text Embeddings V2), S3 Vectors, Lambda Function URLs, and SES inbound are all available there. A second region for resilience isn’t worth the setup at SMB volume — the failure mode is a staff member waiting a few minutes for HR instead of getting an instant answer, not a regional outage. One AWS account dedicated to the answerer (separate from other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system. All handbook content stays inside this account: the only data that leaves is the prompt-and-sections payload to Bedrock, which is not retained for training.

Topology

Fig 7. AWS topology, in three regions of the diagram: ingress (three lanes into the normalizer), retrieval and answer (embed, search S3 Vectors, ground on Haiku, while the indexer keeps the index fresh), reply and logging (the answer ships, gets logged, and feeds the gap report). Every Lambda is event- or schedule-driven.

Lambda functions

All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.

spa-intake — Lambda Function URL, AuthType: NONE, verifies the Slack signing secret (spa/slack/signing-secret) on the raw request body before doing anything. Handles Slack URL verification, the message (IM) and app_mention events. Returns 200 within 3 s, then invokes answerer asynchronously with the normalized question. De-dupes on Slack’s X-Slack-Retry-Num header so a slow downstream doesn’t double-answer. Memory: 256 MB. Timeout: 10 s.
intake-email — S3 PUT trigger on s3://spa-raw-mime/. Parses MIME, strips quoted history and signatures, extracts the question text and sender, and invokes answerer with reply_channel=email. Memory: 256 MB. Timeout: 30 s.
answerer — invoked by the intake functions. Embeds the question with Titan Text Embeddings V2 (amazon.titan-embed-text-v2:0, 1024-dim), queries spa-handbook-index in S3 Vectors for top-k (k=8) nearest sections by cosine similarity, applies the confidence floor (drop if top score < SIM_FLOOR, default 0.62), keeps the best 3–5, and calls Claude Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0 via global.anthropic.claude-haiku-4-5-20251001-v1:0) with a grounded, citation-required prompt. Returns a structured result (answer text, cited section ids, confidence, off_limits flag) and invokes reply. Sonnet 4.6 (anthropic.claude-sonnet-4-6-...) is wired as an optional escalation for multi-part or cross-policy questions where Haiku declines, gated behind a flag and off by default. Memory: 512 MB. Timeout: 60 s.
reply — invoked by answerer. Runs the four guardrail gates (topic check against spa-rules, citation trace-back, hedge downgrade, compose), formats per the voice template, attaches the deep section link, and ships via Slack chat.postMessage (bot token spa/slack/bot-token) or SES SendRawEmail. Writes a row to spa-log. Memory: 256 MB. Timeout: 30 s.
indexer — EventBridge Scheduler target every 5 minutes, plus on-demand from the admin rebuild button. Uses the Google Drive + Docs API (service-account credentials in Secrets Manager under spa/drive/sa) to detect changed docs via the revision marker, exports each changed doc to text, splits on headings into sections (with a soft 1,200-token cap and overlap), computes a content hash per section, re-embeds only sections whose hash changed via Titan, and upserts them into spa-handbook-index (deleting vectors for removed sections). Writes a row to spa-audit. Memory: 1024 MB. Timeout: 120 s.
gap-report — EventBridge Scheduler target, weekly Monday 9am in TZ_NAME. Scans spa-log for the past week’s outcome=ask_hr rows, clusters them (embed each question, group by cosine proximity), and posts HR a ranked list of uncovered topics to the admin Slack channel. No model needed beyond the embeddings already in spa-log. Memory: 512 MB. Timeout: 60 s.

Storage

S3 Vectors · spa-handbook-index — one vector per handbook section. 1024-dim (Titan V2), cosine distance. Metadata per vector: doc_id, section_id, heading, deep_link, content_hash, updated_at. Queried with a metadata filter to scope by doc when needed.
DynamoDB · spa-log — one row per question. PK (asker_id, ts); attributes: question, outcome (answered/ask_hr/off_limits), cited_sections, sim_top, reply_channel, q_embedding (reused by gap-report). On-demand. TTL 400 days on the raw question text; aggregates kept longer.
DynamoDB · spa-audit — one row per index refresh. PK (doc_id, ts); attributes: sections_touched, trigger (sync/manual), by_user (if manual). On-demand. No TTL — long-term freshness trail.
S3 · spa-handbook-source — mirrored plain-text export of each handbook doc, keyed by doc_id/revision. Versioning enabled; this is what the deep link and the citation check resolve against.
S3 · spa-rules-source — mirrored rules and voice docs as plain text (off-limits list, escalation contacts, tone). Versioning enabled.
S3 · spa-raw-mime — raw inbound MIME from the email lane. Lifecycle to Glacier at 30 days; expiry at 1 year.

Bedrock

Answer model. anthropic.claude-haiku-4-5-20251001-v1:0 via the Global cross-Region inference profile global.anthropic.claude-haiku-4-5-20251001-v1:0. One callsite: answerer. Prompt is grounded (sections only), citation-required, with an explicit instruction to decline rather than use outside knowledge. temperature: 0 for deterministic answers.
Escalation model. anthropic.claude-sonnet-4-6-... via its Global profile, behind ESCALATE_TO_SONNET (default off). Only fires on multi-part questions Haiku declines and the search still has strong sections — the rare case where the reasoning, not the retrieval, is the bottleneck.
Embeddings. amazon.titan-embed-text-v2:0, 1024-dim, normalized. Used by answerer (query embedding), indexer (section embeddings), and gap-report (clustering). Embedding dim must match the index dim exactly.
Quotas. Default account quotas are plenty at SMB volume. The hot path is one Haiku call plus one Titan embedding per question.

EventBridge Scheduler config

spa-index-sync — rate(5 minutes). Target: indexer Lambda.
spa-gap-report — cron(0 9 ? * MON *) in the SMB’s timezone. Target: gap-report Lambda.
Manual reindex — not a Scheduler rule; the admin rebuild button invokes indexer directly via the Function URL backing the Slack admin action.

Slack app config

The Slack app needs chat:write, im:write, im:history, and app_mentions:read. Event subscriptions point at the spa-intake Function URL: message.im and app_mention. Interactivity (the admin rebuild button and the ask-HR footer actions) also points at spa-intake, which routes admin actions to the indexer. The bot token lives in Secrets Manager under spa/slack/bot-token; the signing secret under spa/slack/signing-secret. The admin channel id and the per-topic HR contacts live in Parameter Store under /spa/config/.

SES inbound and outbound

Set the MX record on a dedicated subdomain (e.g. policy.your-company.com) to inbound-smtp.ap-southeast-1.amazonaws.com.
SES inbound rule set spa-inbound-rules: one rule with recipient policy@your-company.com → spam scan → S3 PUT to s3://spa-raw-mime/<message-id> → stop. The S3 PUT triggers intake-email.
SES outbound for email replies: verify a sender identity at policy@your-company.com with DKIM and SPF on the parent domain. Out of sandbox by request.

IAM (least privilege per Lambda)

Each Lambda has its own role with policies scoped to exact ARNs. Sketch:

spa-intake role: secretsmanager:GetSecretValue on the Slack signing secret; lambda:InvokeFunction on answerer and indexer. No Bedrock, no DynamoDB.
answerer role: bedrock:InvokeModel on the Titan and Haiku ARNs (and the Sonnet ARN if escalation is enabled); s3vectors:QueryVectors on spa-handbook-index; lambda:InvokeFunction on reply; s3:GetObject on spa-rules-source.
reply role: secretsmanager:GetSecretValue on the Slack bot token; ses:SendRawEmail from the verified identity; dynamodb:PutItem on spa-log; s3:GetObject on spa-handbook-source (deep-link + citation resolve); outbound network to slack.com.
indexer role: secretsmanager:GetSecretValue on spa/drive/sa; bedrock:InvokeModel on the Titan ARN; s3vectors:PutVectors + DeleteVectors on spa-handbook-index; s3:PutObject on spa-handbook-source and spa-rules-source; dynamodb:PutItem on spa-audit; outbound network to www.googleapis.com.
gap-report role: dynamodb:Query on spa-log; secretsmanager:GetSecretValue on the Slack bot token; outbound network to slack.com.

Retrieval and grounding details

Chunking is heading-aware: each section is a heading plus its body, capped near 1,200 tokens with a small overlap so a rule that spans a page boundary isn’t split mid-sentence. The content_hash per section is what makes the sync incremental — unchanged hashes are skipped, so a one-line edit re-embeds one section, not the whole doc. The query keeps k=8 from S3 Vectors, applies SIM_FLOOR, then trims to the top 3–5 by score before the Haiku call. The grounded prompt requires the model to return JSON: {answer, cited_section_ids, declined}. The citation gate rejects any cited_section_id not in the pulled set; the hedge gate downgrades when sim_top < SIM_SOFT (default 0.70) or the answer contains hedge markers. Every threshold (SIM_FLOOR, SIM_SOFT, top-k, top-n) lives in Parameter Store so tuning needs no deploy.

Observability and cost gates

CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on "error" + "throttle" + "timeout" to a CloudWatch metric for alerting.
Alarms: answerer error rate > 1% in 24h; spa-intake signature-verification failures > 5/hour (might mean the Slack secret rotated); indexer failures > 0 in a day (a stale index is a silent correctness bug); ask-HR rate spike > 2× baseline (might mean the handbook export broke).
X-Ray: off by default. Not worth the cost at SMB volume.
AWS Budgets: $20/month threshold, alarm at 80% and 100%, posts to SNS topic spa-cost-alarm subscribed to the on-call admin’s email and Slack.

Config and secrets

Service-account credentials for the Drive and Docs APIs live in Secrets Manager under spa/drive/sa. Slack bot token and signing secret under spa/slack/*. SES sender identity lives in IAM and the verified-domain config. The off-limits topic list, per-topic HR contacts, timezone, retrieval thresholds, and the escalation flag all live in Parameter Store under /spa/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.

Deploy

GitHub Actions with OIDC into a deploy role (no long-lived keys) and AWS SAM for the stack. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for spa-handbook-source and spa-rules-source so a bad Drive export can be rolled back, and keep the S3 Vectors index dimension pinned to 1024 to match Titan V2 — a dimension mismatch is a silent retrieval failure. Total deployable surface: six Lambdas, one S3 Vectors index, two DynamoDB tables, three S3 buckets, two EventBridge Scheduler rules, one SES rule set, and one Budgets alarm.

That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.

All posts