Series · 7 parts Published June 16, 2026

Tax doc collector

A serverless system that takes the pain out of tax season for an accountant or bookkeeper. For each client it knows the checklist of documents needed, sends a secure request, collects what they upload, checks off what arrives, chases what’s missing, and tells the preparer the moment a file is complete. It reads each upload just enough to confirm the right document type; a human reviews before anything is marked final. Seven posts on the same system — one diagram at a time — with an engineering reference at the end.

  1. 01

    A tax doc collector on AWS for a few dollars a month

    The whole system on one page — a document intake, a tracker, and a dispatch piece, plus the moves they share for every client file.

  2. 02

    How a client file gets set up

    Three ways a client file starts — the Drive sheet itself, a one-tap copy from last year’s file, and a short intake form that picks the right checklist for the client type.

  3. 03

    How a document arrives and gets checked

    A client uploads on a secure page; the file lands in private storage; the collector reads it just enough to confirm the document type, matches it to a checklist item, and checks it off — pending a human review.

  4. 04

    How a client gets chased for missing docs

    A daily tick reads each file, sees what’s still missing, compares against the chase cadence, and picks one of four moves: complete, first request, reminder, escalate. No model on the tick.

  5. 05

    How a finished file reaches the preparer

    Three actions on a complete file: accept (the file moves to done), reject one item (the client gets a fresh request for just that item), and reopen (add a forgotten item back). Every action is logged.

  6. 06

    What the tax doc collector costs

    A few dollars a month at small-practice volume. The chase tick calls no models; Bedrock fires only when a document is uploaded and on the monthly summary.

  7. 07

    Engineering reference: the tax doc collector architecture

    Same system, drawn purely for engineers. Service names, resource identifiers, region, Bedrock model IDs, Lambda inventory, IAM scopes, the SES inbound rule set, EventBridge Scheduler config, and the DynamoDB schemas.

What is a tax doc collector?
A small serverless system for an accountant or bookkeeper. For each client it knows the checklist of documents needed, sends a secure upload request, collects what the client uploads, checks off what’s received, chases what’s still missing, and tells the preparer the moment a client’s file is complete. It reads each upload just enough to confirm it’s the right kind of document; a human always reviews before anything is marked final.
How much does it cost to run?
About $2.40/month at typical small-practice volume (around 200 active client files). The fixed cost is essentially zero. The variable cost is dominated by the document-reading lane that confirms each upload’s type; the daily tick that chases missing items is plain date arithmetic and costs pennies. At 1,000 active files the bill lands around $9.
Which AWS services does it use?
Lambda (Python 3.14, arm64) with Function URLs for the secure upload page and the action endpoints, EventBridge Scheduler for the daily chase tick, DynamoDB on-demand, S3 (versioned, private) for uploaded documents, SES inbound + outbound, Secrets Manager, CloudWatch Logs (7-day retention), AWS Budgets, and Bedrock (Claude Haiku 4.5 via Global cross-Region inference) for confirming each upload’s document type and for the monthly summary. No API Gateway, no NAT Gateway, no always-on compute, no Knowledge Base.
Where does the checklist live?
In a Google Sheet in a Drive folder. One row per client with name, client type, contact email, the checklist of documents owed, a due date, and a status per item. A small drive-sync Lambda mirrors the sheet to S3 every 15 minutes; the collector reads from S3 to keep Drive API calls predictable and to get S3 versioning for free.
Does the collector use AI?
Sparingly. The daily chase tick uses no AI — it’s plain Python that reads dates and statuses and decides whom to nudge. Bedrock Haiku 4.5 fires only when a client uploads a document (it reads the page just enough to confirm the type, for example “this looks like a W-2”) and once a month for the practice-summary narrative. Most of the system is deterministic by design, and a human reviews every file before it’s marked final.
How does a request reach the client without being noisy?
Each client type has its own chase cadence in the rules doc — the first request goes out on setup, then a reminder at day 7, day 14, and day 21 for anything still missing. Reminders respect quiet hours (default 6pm–8am local) and the holiday calendar. A client who has uploaded everything stops being chased. A client can pause the chase or hand off to a spouse or partner straight from the request.
What happens when a file is complete?
The collector marks the file ready-for-review and pings the preparer with a link to the per-client status board. The preparer reviews each uploaded document and either accepts it (file moves to done) or rejects one item with a note (the client gets a fresh request for just that item). Every action is recorded in the td-audit DynamoDB table with timestamp, client, action, by-user, and a before-and-after snapshot, so the trail is auditable for years.
All posts