What the staff policy answerer costs
This is a cheap system to run. There’s no always-on server, no database humming at 3am, no model call unless somebody actually asks a question. The cost scales with how many questions staff ask — not with how big your handbook is — because the handbook is read once and only re-read when it changes. At typical SMB volume, the bill is a couple of dollars a month, fixed cost essentially zero.
Key takeaways
- Around $2/month at a small team’s volume (roughly 150 questions a month).
- Fixed AWS cost is essentially zero. No always-on compute, no NAT Gateway, no API Gateway, no search server.
- The biggest variable cost is Bedrock — one small answer call per question.
- Keeping the index fresh is cheap: only changed sections get re-embedded.
- At ~600 questions/month the bill is around $5. At ~2,400 it’s around $14.
Cost at three question volumes
Where the dollars actually go
Bedrock (the biggest slice). Each question that clears the confidence floor triggers one Claude Haiku 4.5 call: a few hundred tokens for the prompt and the pulled sections in, a couple of hundred tokens of answer out. That’s a fraction of a cent per question. Questions that fail the confidence floor or hit an off-limits topic skip the model entirely and cost nothing. At 150 questions a month it’s well under a dollar; at 2,400 it’s the dominant line item but still single-digit dollars. Haiku is the right model here — reading a few short policy sections and writing a plain answer doesn’t need a heavier model, and the cheaper call keeps the per-question cost tiny.
Embeddings (Titan). Two places. One small embedding per question (to turn it into a vector for search) — cents per thousand questions. And the index refresh: re-embedding only the handbook sections that changed. Since a typical handbook changes a few sections a week, the refresh embeddings are negligible. Embedding the whole handbook once on day one is a one-time cost of a few cents.
S3 Vectors. Stores the section fingerprints and answers search queries. You pay for the stored vectors (a small handbook is a few hundred to a few thousand vectors — pennies) and per query (one query per question). No always-on search cluster to pay for, which is the whole reason to use it over a hosted vector database. Pennies a month at these volumes.
Lambda runtime. The intake Lambda, the answerer, the indexer sync, and the gap-report job. All event- or schedule-driven, all small. Even at 2,400 questions a month the Lambda total lands under a dollar.
DynamoDB on-demand. Two small tables: the question-and-answer log and the refresh audit log. A handful of writes per question and per refresh. Pennies a month.
SES. Inbound for the email lane and outbound for email replies: $0.10 per thousand messages each way. Negligible unless your whole company is on the email lane.
What doesn’t cost money
- API Gateway. Replaced by Lambda Function URLs for the Slack endpoints.
- NAT Gateway. Nothing is in a VPC. No NAT, no $32/month minimum.
- Always-on compute. No EC2, no Fargate. Nothing runs unless a question comes in or a doc changes.
- A hosted vector database. S3 Vectors means no search cluster sitting idle and billing by the hour.
- Re-reading the whole handbook. The index refresh touches only the sections that changed, so a big handbook isn’t a big bill.
How the cost scales
The bill tracks questions, not handbook size or headcount directly. Bedrock and the per-question embedding grow linearly with how many questions get asked; everything else stays small. So a company asking 5,000 questions a month lands around $28, and 10,000 around $55 — still less than an hour of the HR time those questions would otherwise eat. A bigger handbook barely moves the needle: more sections means slightly more stored vectors and a slightly larger search, both cheap. The thing that grows your bill is your team getting more answers, which is the point.
Set an AWS Budgets alarm at $20/month so anything unusual — a runaway loop, a misconfigured retry — pages you before the bill matters. The normal-volume bill stays well under that ceiling.
Last post in the series: the engineering reference. Same system, drawn for engineers — service names, Lambda inventory, IAM scopes, the S3 Vectors index config, Bedrock model IDs, the Slack app config, and the DynamoDB schemas.
All posts