Part 1 of 7 · Tax doc collector series ~5 min read

A tax doc collector on AWS for a few dollars a month

Tax season is mostly chasing paper. A bookkeeper with two hundred clients spends January and February sending the same email over and over: “We still need your W-2, your mortgage interest statement, and last year’s closing statement.” The client uploads two of the three and goes quiet. Somebody has to remember who’s still missing what, send the next nudge, and notice the moment a file is finally complete so the actual work can start. This post walks through the design of a small collector that knows each client’s checklist, takes the uploads, chases the gaps, and tells the preparer when a file is ready.

Key takeaways

  • Three ways a client file starts: the Drive checklist, a copy from last year, and a short intake form.
  • Every file ends in one of four moves on each tick: complete, first request, reminder, or escalate.
  • Per-client-type rules: the chase cadence and the checklist of documents owed both live in the rules doc.
  • Requests respect quiet hours and your holiday calendar. A client who uploaded everything stops being chased.
  • Designed on AWS for about $2 a month at typical small-practice volume.

The whole system on one page

Before any code, here’s the shape of what we’re designing.

System shape: three parts outside, three pieces inside AWS At the top, three external boxes in a row. Far left, "Client files" — a Google Drive checklist sheet listing each client, their client type, the documents they owe, a due date, and a status per item, plus a copy-from-last-year lane and a short intake-form lane that start new files. Centre, "Rules and voice" — a Drive folder with a rules doc covering per-client-type checklists, chase cadences, and the review owner, plus a voice doc with the request and reminder message templates. Far right, "Clients and preparer" — the people who upload documents on a secure page, and the accountant or bookkeeper who reviews a file once it is complete. Each connects via an arrow to the AWS account container below. Client files have an outgoing arrow into AWS. Rules and voice feed in to ground every request. Clients receive a secure upload request with the exact list of what is still missing and a link to upload; the preparer receives a ping with a link to the per-client status board the moment a file is complete. Inside the AWS account are three components in a row, mirroring the layout above. On the left, the Document intake — receives uploads on a secure page, stores them privately, reads each one just enough to confirm the document type, matches it to a checklist item, and checks it off pending review. In the middle, the Tracker — runs daily; reads each client file; computes what is still missing and how overdue; picks one of four moves: complete, first request, reminder, or escalate. On the right, the Dispatch — sends the secure request or reminder to the client, respects quiet hours and the holiday calendar, and tells the preparer when a file is complete. Internal arrows flow left to right. A note at the bottom reads: a human reviews every file before it is marked final — nothing is sent to the tax return automatically. Client files checklist, copy, intake Rules and voice checklists, cadence, templates Clients, preparer upload and review files in grounds request with the list AWS account Document intake store, confirm type, check off the item Tracker picks one of four: complete, request, remind, escalate Dispatch secure request, respects quiet hours item move A human reviews every file before it is marked final — nothing goes to the return automatically.
Fig 1. Three parts outside, three pieces inside AWS. Files start from a Drive checklist, a copy from last year, or an intake form. The Tracker runs daily and picks one of four moves. Dispatch sends the right request to the right client and tells the preparer when a file is done.

What you set up once (the outside)

  • Client files. A Google Sheet in a Drive folder, one row per client: name, client type (individual, sole trader, rental owner, small company), contact email, the checklist of documents owed for this season, a due date, and a status per item (waiting, received, accepted, rejected). You can fill it in once and forget it; new files can also start via two other lanes covered in Part 2 — a copy-from-last-year lane (one tap clones last season’s file and resets every item to waiting) and a short intake-form lane (a new client fills in a few questions and the right checklist is chosen for their client type).
  • A rules folder. Two short Google Docs in a Drive folder. The rules doc lists the checklist per client type — which documents an individual owes versus a rental owner versus a small company — and the chase cadence: how many days after the first request to send a reminder, and how many times. A typical cadence is a first request on setup, then reminders at day 7, 14, and 21 for anything still missing. The doc also names the review owner (the preparer who signs off), the quiet hours, and any holiday calendars to skip. The voice doc holds one message template per step — what the first request and each reminder actually say.
  • Clients and preparer. Clients upload on a secure page; each request lands with the exact list of what’s still missing, a link to upload, and the option to pause the chase or hand off to a spouse or business partner. The preparer is the accountant or bookkeeper who reviews each file once it’s complete; they get a ping with a link to the per-client status board.

What runs on every tick (the inside)

  • The document intake. A client clicks the link in their request and lands on a secure upload page (no login to remember — the link itself carries a signed, time-limited token). They drop in a PDF or a photo of a document. The file is stored privately in S3. The collector reads it just enough to confirm the type — Textract pulls the text, and Bedrock Haiku 4.5 answers one narrow question: “which checklist item does this look like?” (for example, “this looks like a W-2”). It matches the upload to a checklist item and marks that item received, pending a human review. It never reads the numbers off the document for the return; it only confirms the kind of document.
  • The tracker. Runs once a day at 8am local. Reads each client file. For each one, works out what’s still missing and how many days since the first request. Picks one of four moves. Complete: every item received — mark the file ready for review and tell the preparer. First request: the file was just set up — send the client the full list with the upload link. Reminder: a reminder day has passed and items are still missing — re-send, listing only what’s left. Escalate: the due date has passed with items still missing — flag the file to the preparer so a person can call the client. The tracker calls no model on the daily tick — the move logic is plain Python.
  • Dispatch. Reads the voice doc, fills in the request or reminder for the chosen move, and sends it. Requests go out by email through SES outbound, each carrying a signed upload link. Both honor quiet hours (no sends between 6pm and 8am local by default) and the holiday calendar. Every send writes a row to DynamoDB so the next day’s tick knows the request already went out. A weekly digest tells the preparer which files are stuck and which are close. A monthly summary writes a practice-ready paragraph: files completed, files still open, longest-waiting clients.

In plain words

The Patel family are a returning client. Their checklist this year is a W-2, a mortgage interest statement, a childcare receipt, and last year’s state refund letter. The collector emails them on setup: “To start your return we need four documents — here’s your secure upload link.” Over the next week they upload the W-2 and the mortgage statement; the collector reads each one, confirms the type, and checks it off. On day 7 the reminder goes out, but now it only lists the two still missing — the childcare receipt and the refund letter. They upload the childcare receipt the same evening. On day 14 the last reminder names just the refund letter; Mrs Patel uploads it the next morning. The collector marks the file complete and pings the preparer: “Patel file is ready for review.” The preparer opens the status board, glances at each document, accepts all four, and starts the return.

The cost of running this is about $2 a month at small-practice volume. The cost of not running it is the week a preparer loses every February to writing the same chase email by hand, and the file that sits half-complete until April because nobody noticed it was waiting on one receipt.

Design rules that shaped every decision

  • Every request ships with the exact list of what’s still missing — never a vague “please send your documents.”
  • Four moves, always. Complete, first request, reminder, escalate. There is no fifth.
  • Quiet hours and holidays are respected. A client who uploaded everything is never chased again.
  • The collector confirms the document type only. It never reads the numbers for the return; a human always reviews.
  • The checklist lives in Drive. Adding a client, changing a checklist, or shifting a due date doesn’t need a deploy.
  • Every upload and every action is logged. Audit a file next year and you can see exactly what came in when.

Why this shape

Most small practices chase documents in one of three places: a spreadsheet that nobody updates, an inbox full of half-finished threads, or somebody’s memory. The spreadsheet works until it doesn’t — one missed update and you’re emailing a client for a document they sent last week. The inbox is worse: the request and the reply drift apart, attachments get buried, and you can’t tell at a glance who’s still missing what. And memory, of course, fails the moment the practice gets busy, which is exactly the season this matters.

The setup above keeps the source of truth in a sheet the practice already edits, but adds a small system that looks at that sheet every day and acts only when something needs acting on. Requests go out with the exact list, so the client never has to guess. Uploads are checked off the moment they arrive. Reminders shrink as documents come in, so a client who’s sent three of four sees only the one that’s left. And the preparer hears about a file exactly twice: once when it’s complete, and once if it’s overdue. The collector is invisible most days; visible only when a file is ready or stuck.

The next four posts walk through each piece in turn: how a client file gets set up, how a document arrives and gets checked, how a client gets chased for missing docs, and how a finished file reaches the preparer. One diagram per post. A cost breakdown and a final engineering reference at the end.

All posts