Part 1 of 7 · Receipt organizer series ~5 min read

A receipt organizer on AWS for a few dollars a month

Every small business has a shoebox. It might be a literal box of crumpled receipts, a phone camera roll full of blurry photos, or an email folder nobody opens. At tax time it all has to be read, sorted, and added up — usually by the owner, late at night, in a hurry. This post walks through the design of a small organizer that takes a photo of a receipt, reads the vendor, date, total, and tax off it, sorts it into the right expense category, flags anything it isn’t sure about for a quick human check, and files the clean record where the bookkeeper needs it.

Key takeaways

  • Three ways a receipt comes in: forward it by email, snap it on a phone, or upload it on the web.
  • Every receipt ends in one of four results: filed, needs-review, duplicate, or rejected.
  • The system reads vendor, date, total, and tax, then picks a category from your own chart of accounts.
  • Anything unclear is flagged for a quick human check — a wrong number never files itself silently.
  • Designed on AWS for about $3/month at typical small-business volume.

The whole system on one page

Before any code, here’s the shape of what we’re designing.

System architecture: three sources, three pieces inside AWS At the top, three external boxes in a row. Far left, "Receipts in" — staff send receipts three ways: forward a receipt by email to a dedicated address, snap a photo on a phone, or upload an image or PDF on a simple web page. Centre, "Rules and categories" — a Drive folder with a rules doc covering the chart of accounts, the per-vendor category hints, the tax rules, and the confidence threshold for a human check, plus a voice doc holding the wording of the review request. Far right, "Bookkeeper" — the person who keeps the books; clean records land in the expense sheet, and only the unclear ones land in their review queue. Each connects via an arrow to the AWS account container below. Receipts have an outgoing arrow into AWS. Rules and categories feed in to ground every decision. The bookkeeper receives a small daily review queue with the receipt image, the read fields, the proposed category, and an Approve button. Inside the AWS account are three components in a row, mirroring the layout above. On the left, the Capture — receives the receipt from each source, stores the image, and queues it for reading. In the middle, the Reader — runs Textract to read the receipt, then Bedrock Haiku 4.5 to pick the category and check the numbers; picks one of four results: filed, needs-review, duplicate, or rejected. On the right, the Filing — writes the clean record into the expense sheet, files the image in the right folder, and posts the unclear ones to the bookkeeper's review queue. Internal arrows flow left to right. A note at the bottom reads: nothing files silently when the numbers are unclear — the organizer asks a human first. Receipts in email, phone, upload Rules and categories accounts, tax, threshold Bookkeeper where records land receipts in grounds record or review AWS account Capture store image, queue for reading Reader picks one of four: filed, review, dup, reject Filing expense sheet, image filed by month image result Nothing files silently when the numbers are unclear — the organizer asks a human first.
Fig 1. Three sources outside, three pieces inside AWS. Receipts flow in by email, phone, or upload. The Reader reads each one and picks one of four results. Filing writes the clean record to the expense sheet and sends only the unclear ones to the bookkeeper.

What you set up once (the outside)

  • Receipts in. Three ways to send a receipt, covered in Part 2. Forward it by email to a dedicated address like receipts@your-company.com. Snap a photo on a phone (a saved home-screen shortcut opens the camera and uploads straight away). Or drag an image or PDF onto a simple web page. Whichever way it arrives, the system stores the original and starts reading it.
  • A rules folder. Two short Google Docs in a Drive folder. The rules doc holds your chart of accounts (the list of expense categories your bookkeeper actually uses — “meals,” “fuel,” “software,” “office supplies,” and so on), some per-vendor hints (“anything from Shell goes to fuel”), the tax rules for your region, and the confidence threshold — how sure the system has to be before it files a receipt without asking. The voice doc holds the short wording of the review request the bookkeeper sees.
  • Bookkeeper. The person who keeps the books. Clean, confident records land straight in the expense sheet. Only the unclear ones — a blurry total, an unknown vendor, a category the system isn’t sure about — land in a small review queue with the receipt image, the fields the system read, the category it proposed, and an Approve button. Most days the queue is short.

What runs on every receipt (the inside)

  • The capture. Three sources feed one place. An email-forward lane (forward a receipt to receipts@your-company.com and the system pulls the image or PDF out of the message), a mobile-snap lane (a phone shortcut uploads the photo straight to a private upload link), and a web-upload lane (drag a file onto a page). Each one stores the original image, gives it an id, and drops it on a queue to be read. Nothing is processed in line — the queue lets a busy lunch-hour batch of receipts wait its turn without losing any.
  • The reader. This is where the work happens. First, Amazon Textract reads the receipt image and pulls out the raw text and the obvious fields — vendor name, date, total, tax. Textract is good at receipts specifically; it knows what a total looks like. Then a Bedrock Haiku 4.5 call (a small, cheap AI model) takes that text, picks the right category from your chart of accounts, and double-checks the numbers add up. The reader then picks one of four results. Filed: every field is clear and the total checks out — file it. Needs-review: something is below the confidence threshold — a blurry total, an unknown vendor — so send it to the bookkeeper to confirm. Duplicate: the same vendor, date, and total already exist — flag it so the same lunch isn’t claimed twice. Rejected: the image isn’t a receipt at all (somebody forwarded a newsletter) — set it aside.
  • Filing. For a filed receipt, write one clean row to the expense sheet: date, vendor, total, tax, category, who submitted it, and a link to the stored image. File the image itself in a folder by month so the bookkeeper can find the original in one click. For a needs-review receipt, post it to the review queue instead; once the bookkeeper approves (or fixes the category), it files the same way. A weekly digest lists what was filed and what’s still waiting. A monthly summary writes a short plain-English note: spend by category, the biggest line items, anything still unreviewed.

In plain words

Your sales rep buys lunch for a client. $64.20 at a restaurant, $5.84 of that is tax. On the way out, she forwards the emailed receipt to receipts@your-company.com. Thirty seconds later the organizer has read it: vendor “Harbour Grill,” date today, total $64.20, tax $5.84. It picks the category “meals and entertainment” because the rules doc says restaurant receipts go there, and it’s confident, so it files the row in the expense sheet and tucks the image into this month’s folder. The rep does nothing else. The bookkeeper never touches it.

Now the harder one: the rep snaps a photo of a crumpled fuel receipt and the total is smudged. Textract reads “$8?.40” with low confidence on the middle digit. The organizer doesn’t guess. It marks the receipt needs-review and drops it in the bookkeeper’s queue with the image and the fields it could read. The bookkeeper glances at the photo, types “$83.40,” taps Approve, and it files. The cost of running all of this is about $3 a month. The cost of not running it is the shoebox — and the deductions nobody could prove at tax time.

Design rules that shaped every decision

  • A receipt is sent once, any way that’s convenient. Email, phone, or upload — all three feed one pipeline.
  • Four results, always. Filed, needs-review, duplicate, or rejected. There is no fifth.
  • Unclear numbers never file themselves. Below the confidence threshold, a human confirms first.
  • Duplicates are caught, not claimed twice. The same vendor, date, and total raises a flag.
  • The expense sheet lives in Drive. Adding a category or fixing a vendor hint doesn’t need a deploy.
  • Every record links back to its original image. Audit a deduction next year and the proof is one click away.

Why this shape

Most small businesses handle receipts in one of three ways: a shoebox dealt with once a year, a phone camera roll nobody sorts, or a spreadsheet somebody types by hand. The shoebox works until tax time, when a weekend disappears into reading faded thermal paper. The camera roll is worse — the photos are there, but turning two hundred of them into categorized expenses is the same manual job, just digital. And the hand-typed spreadsheet is accurate right up until the week somebody gets busy and stops typing.

The setup above keeps the part humans are good at — deciding the tricky calls — and hands off the part they hate: reading the same fields off hundreds of receipts. The system reads every receipt the moment it arrives, files the clear ones on its own, and only asks for help on the genuinely unclear ones. The bookkeeper’s job shrinks to a short daily queue instead of a yearly mountain. And because every record links back to its image, the proof is there if anyone ever asks.

The next four posts walk through each piece in turn: how a receipt gets captured, how a receipt gets read, how a receipt gets categorized, and how a receipt reaches the books. One diagram per post. A cost breakdown and a final engineering reference at the end.

All posts