Part 6 of 7 · Tax doc collector series ~3 min read

What the tax doc collector costs

The collector is a cheap system to run. The daily chase tick reads a CSV from S3, does some date arithmetic, writes a few rows to DynamoDB, and sends a handful of emails. It calls no models on the tick. The cost that does add up is the document-reading lane: every upload gets read by Textract and named by Bedrock. Even so, at typical small-practice volume the bill is a few dollars a month, fixed cost essentially zero.

Key takeaways

  • Around $2.40/month at typical small-practice volume (around 200 active client files).
  • Fixed AWS cost is essentially zero. No always-on compute, no NAT Gateway, no API Gateway.
  • The daily chase tick costs pennies — no model calls.
  • Textract and Bedrock fire only when a client uploads a document, plus the monthly summary.
  • At 500 active files the bill is around $5. At 1,000 files it’s around $9.

Cost at three volumes

Monthly cost at three active-file volumes, by component A stacked-bar chart showing monthly cost in US dollars at three active-file volumes. The leftmost bar represents 200 active files and shows a total around $2.40; the everything-else slice (Lambda, DynamoDB, S3, EventBridge Scheduler, SES, CloudWatch) is the largest band, with a meaningful Bedrock slice and a meaningful Textract slice because every upload is read, and a tiny fixed slice. The middle bar represents 500 active files and shows a total around $5, with the same shape — every band grows roughly linearly with upload count because more files means more documents read. The rightmost bar represents 1,000 active files and shows a total around $9, with everything-else still the largest band and Bedrock and Textract larger in absolute terms because they fire once per uploaded document. Below the chart is a legend explaining the four sections of each bar: Bedrock (one type-confirm per upload plus the monthly summary), Textract (one read per uploaded document), AWS Budgets and Secrets Manager (small fixed amounts), and an everything-else bucket for Lambda runtime, DynamoDB on-demand, S3, EventBridge Scheduler, SES inbound and outbound, and CloudWatch. A note at the bottom: the document-reading lane is the part that scales with volume — the daily tick stays nearly free. $0 $5 $10 $15 $20 200 files ~$2.40 500 files ~$5 1,000 files ~$9 Bedrock (type-confirm per upload + monthly summary) Textract (one read per uploaded document) AWS Budgets + Secrets Manager (fixed) Everything else (Lambda, DDB, S3, Scheduler, SES, CloudWatch) The document-reading lane scales with volume — the daily chase tick stays nearly free.
Fig 6. Monthly cost at three active-file volumes. Textract and Bedrock are real slices because every upload is read, but they only fire when a document arrives. The dominant cost is the everything-else bucket: the daily tick and the dispatch emails.

Where the dollars actually go

Lambda runtime (the bulk). The chase tick runs once a day. Each tick reads the checklist CSV from S3, iterates the rows, works out what’s missing for each, and decides on a move. At 200 files that’s a few hundred milliseconds; at 1,000 it’s a couple of seconds. Add the dispatch Lambda for each send, the upload-page and action Function URLs, and the drive-sync Lambda every fifteen minutes — the Lambda total still lands under a dollar at all three volumes.

DynamoDB on-demand. Small tables: td-sends, td-uploads, td-state, td-audit. Reads dominate during the daily tick (one read per file, plus state). Writes are sends, uploads, and audit rows. Pennies a month at any of these volumes.

S3 + storage. The mirrored checklist CSV plus every uploaded document. A typical client file is a handful of PDFs and photos — a few megabytes. Even at 1,000 files that’s low single-digit gigabytes. A dollar or two of storage, and that’s being generous.

EventBridge Scheduler. The daily tick rule plus deferred-send rules from the quiet-hours and holiday gates. A few invocations a day. Pennies.

SES. Inbound (if you let clients reply or forward): $0.10 per thousand received. Outbound for the requests and reminders: $0.10 per thousand sent. A 200-file practice sends maybe a few hundred emails across a season — cents.

Textract (one read per upload). Per-page pricing; a typical tax document is one to three pages. A few cents per document. This is the band that scales with volume: more files means more uploads to read. At 200 files with their documents, it’s under a dollar; at 1,000 files it lands around a couple of dollars across the busy months.

Bedrock (only when something fires it). The daily tick uses no Bedrock. The type-confirm fires Haiku 4.5 once per uploaded document: the Textract text in, a short JSON answer out — a small fraction of a cent per call. The monthly summary is one larger call that writes a practice-ready paragraph. Bedrock stays a modest slice even at 1,000 files.

What doesn’t cost money

  • API Gateway. Replaced by Lambda Function URLs for the upload page, the intake form, and the action endpoints.
  • NAT Gateway. Nothing is in a VPC. No NAT, no $32/month minimum.
  • Always-on compute. No EC2, no Fargate. The collector sleeps until a tick fires or a client uploads.
  • A Knowledge Base. The checklist is structured rows, not free text — deterministic lookup beats vector search here. No embeddings, no Knowledge Base, no S3 Vectors.
  • Models on the tick. The daily decision is plain Python. Bedrock fires only on uploads and the monthly summary.

How the cost scales

Lambda runtime and DynamoDB grow roughly linearly with file count, because every file is evaluated on every tick. Textract and Bedrock grow with upload count, which roughly tracks file count too (each file is a few documents). So the bill at 2,500 active files is around $22; at 5,000 it’s around $42. Past those volumes you’d batch the type-confirm calls and read documents only on the first upload of a session, but those are optimizations for a large practice — not redesigns.

Set an AWS Budgets alarm at $15/month so anything unusual pages you before the bill matters. A normal-volume small practice stays well under that ceiling, even in the thick of February.

Last post in the series: the engineering reference. Same system, drawn for engineers — service names, Lambda inventory, IAM scopes, DynamoDB schemas, SES rule set, and EventBridge Scheduler config.

All posts