Part 6 of 7 · Refund handler series ~3 min read

What the refund handler costs

The refund handler only does work when a request comes in. Each one reads a short email, looks up a few policy lines, drafts a reply, and posts a card to Slack. There’s no daily job grinding through a big list, no always-on server. The biggest line on the bill is the model calls — reading and drafting — and even those are cheap because most cases run on a small model. At typical SMB volume, the whole thing is a few dollars a month, fixed cost essentially zero.

Key takeaways

  • About $3/month at typical SMB volume (around 200 requests a month).
  • Fixed AWS cost is essentially zero. No always-on compute, no NAT Gateway, no API Gateway.
  • The model calls are the biggest slice — reading each request and drafting each reply.
  • Most cases run on the cheap model; the heavier model fires only on the few hard ones.
  • At 1,000 requests the bill is around $7. At 2,000 requests it’s around $14.

Cost at three volumes

Monthly cost at three request volumes, broken out by component A vertical stacked-bar chart showing monthly cost in US dollars at three request volumes. The leftmost bar represents 200 requests a month and shows a total around $3, dominated by the Bedrock slice (reading each request and drafting each reply), with a small everything-else slice for Lambda, DynamoDB, SQS, SES and S3, a tiny sliver for the S3 Vectors policy index, and a tiny fixed sliver. The middle bar represents 1,000 requests and shows a total around $7, with the same shape — Bedrock grows roughly with request count because the model runs on every request. The rightmost bar represents 2,000 requests and shows a total around $14, with Bedrock still dominant; the S3 Vectors index and the fixed amounts stay small because the policy is tiny and the daily traffic is light. Below the chart is a legend explaining the four sections of each bar: Bedrock (reading and drafting on every request, heavier model only on hard cases), S3 Vectors (the tiny policy index), AWS Budgets and Secrets Manager (small fixed amounts), and an everything-else bucket for Lambda runtime, DynamoDB on-demand, SQS, SES inbound and outbound, S3, and CloudWatch. A note at the bottom: the model calls are the dominant cost, and they only happen when a real request comes in. $0 $5 $10 $15 $20 200/mo ~$3 1,000/mo ~$7 2,000/mo ~$14 Bedrock (read + draft every request) S3 Vectors (policy index) AWS Budgets + Secrets Manager (fixed) Everything else (Lambda, DDB, SQS, SES, S3, CloudWatch) The model calls are the dominant cost — and they only happen when a real request comes in.
Fig 6. Monthly cost at three request volumes. Bedrock is the dominant slice because the model reads and drafts on every request. The policy index and the fixed amounts stay tiny. The bill grows with request count, not with anything always-on.

Where the dollars actually go

Bedrock (the bulk). Two model calls per request in the common case: one small read to pull out who/what/how-much, and one small draft to write the reply — both on Claude Haiku 4.5. Each is a few thousand input tokens (the request plus a handful of policy passages) and a few hundred output tokens, so a fraction of a cent per request. The few genuinely hard cases escalate the single decision to Claude Sonnet 4.6 at a few cents each; at maybe one in twenty requests, that adds up to a small amount. Across all requests, Bedrock is the largest line on the bill — and it’s still measured in single-digit dollars.

S3 Vectors (the policy index). Your policy is a short doc — a few dozen passages. Storing those as vectors and searching them on each request costs cents a month. It re-indexes only when you edit the policy, which is rare. Effectively a rounding error.

Lambda runtime. The intake reader, the checker, the drafter, the approve-handler, and the policy-sync job. Each fires only on a real request (or every 15 minutes for the tiny policy sync). Milliseconds each. Pennies a month at any of these volumes.

DynamoDB on-demand. Two small tables: rf-requests (the live state of each request) and rf-audit (the permanent record). A few reads and writes per request. Pennies.

SQS. One queue plus a dead-letter queue. The first million requests a month are free; an SMB never gets close. Effectively free.

SES. Inbound for the help-inbox lane and outbound for the approved replies: $0.10 per thousand messages each way. A couple of cents a month at SMB volume.

S3 + storage. The raw inbound messages and the mirrored policy doc. A few hundred KB. Effectively free.

What doesn’t cost money

  • API Gateway. Replaced by Lambda Function URLs for the form webhook and the approve button.
  • NAT Gateway. Nothing is in a VPC. No NAT, no $32/month minimum.
  • Always-on compute. No EC2, no Fargate. Everything fires only on a real request.
  • A big knowledge base. The policy is a short doc, so the vector index is tiny — cents, not dollars.
  • The heavy model on every request. Claude Sonnet 4.6 runs only on the few hard cases; most requests use the cheap model.

How the cost scales

Bedrock grows roughly with request count, because the model reads and drafts on every request. Everything else grows slowly or not at all — the policy index doesn’t care how many requests come in, and the fixed costs are fixed. So the bill at 5,000 requests a month is around $32; at 10,000 it’s around $62. The main lever if volume climbs is the cheap/heavy split: keep more cases on Haiku 4.5 by tightening the policy passages, and the per-request cost drops. Past those volumes you’d also batch the read and the draft into a single call — an optimization, not a redesign.

Set an AWS Budgets alarm at $15/month so anything unusual — a runaway loop, a spam flood on the form — pages you before the bill matters. The handler’s normal-volume bill stays under that ceiling for a typical SMB.

Last post in the series: the engineering reference. Same system, drawn for engineers — service names, the S3 Vectors index, Lambda inventory, IAM scopes, DynamoDB schemas, and the SES rule set.

All posts