How an invoice gets loaded
The chaser only chases what’s on the list. So the first job is making sure the list actually reflects what your customers owe you. There are three ways an invoice gets on: it comes across from your accounting tool into the Drive sheet, somebody forwards an invoice PDF to a dedicated address, or your accounting tool calls a small endpoint the moment the invoice is issued. The first one is the backbone. The other two exist because in real life invoices get created in a dozen places and nobody types a row in a sheet for the one they just sent.
Key takeaways
- Three intake lanes feed one list: the Drive sheet, an inbox-forwarding lane, and a webhook lane.
- Inbound PDFs are parsed by Textract; Bedrock Haiku 4.5 reads the text and proposes a row.
- Every parsed row goes to the bookkeeper’s Slack for one-tap approval before it lands on the list.
- The webhook lane picks up new invoices from your accounting tool the moment they are issued.
- The Drive sheet stays the canonical store. The other lanes are conveniences that write into it.
Three lanes into one list
Lane 1: the Drive sheet from your accounting tool
The backbone lane. Most accounting tools can export an open-invoices report to a Google Sheet, or you paste one in on a schedule. The columns are short: number, customer, billing contact email, amount, issue date, due date, terms, a paid flag, and a link to the invoice PDF. A small Lambda — drive-sync — runs every fifteen minutes, exports the sheet as plain CSV via the Drive API, and writes it to s3://ic-invoices-source/invoices.csv if the sheet has changed since the last sync. The chaser reads from S3, not Drive directly. That keeps Drive API calls predictable and gives you S3 versioning for free, so a bad bulk-edit can be rolled back in one click.
This lane covers the steady state: your accounting tool is the system of record, and the sheet is just the window the chaser looks through. Most invoices arrive this way.
Lane 2: inbox forwarding (for invoices that live outside the tool)
Set up a dedicated inbound address — something like invoices@your-company.com — via Amazon SES. Anyone on the team forwards an invoice PDF to that address and the chaser takes it from there. SES writes the raw MIME to s3://ic-raw-mime/. The S3 PUT triggers a parser Lambda. The Lambda walks the MIME tree to the PDF attachment, runs Amazon Textract on it (Textract reads PDF, PNG, JPEG, and TIFF natively; if somebody forwards a Word document, the parser falls back to python-docx), and gets back the extracted text plus any tables.
Then a Bedrock Haiku 4.5 call reads the text and emits a structured row: invoice number, customer, amount, issue date, due date, terms, and a billing-contact guess based on the “To” line of the original forward. The model prompt is short: “Extract a row for the invoice list. Return JSON only. Mark each field with a confidence score. Do not invent an amount or a date that isn’t in the text.” The output goes to a small Slack interactive message that pings the bookkeeper: the proposed row, the confidence per field, and three buttons — approve, edit, discard. On approve, a Lambda writes the row to the Drive sheet via the Sheets API. On edit, the bookkeeper gets a fillable modal pre-populated with the proposal. On discard, the message is logged and the PDF moved to a discarded prefix in S3 for audit.
The reason every parsed row goes to a human first is simple: chasing a customer for the wrong amount is far worse than an invoice that never made it onto the list. A misread amount or a wrong contact turns a polite reminder into a complaint — and an awkward apology.
Lane 3: the webhook lane
Most accounting tools can call a URL when an invoice is created. That is the cleanest lane of all: the invoice lands on the list the same minute it is issued, with no export delay and no forwarding. Your accounting tool posts the new invoice to a Lambda Function URL — the same kind of small endpoint used for the pay-links later in the series. The endpoint checks a shared secret, maps the tool’s fields to the chaser’s row shape, and runs the same Slack proposal flow as Lane 2.
For a trusted source you can skip the approval step and let webhook rows land directly, since the data comes straight from your own system of record rather than a forwarded PDF that had to be read. The webhook lane is the most opt-in of the three: a team whose tool can’t call a URL just leans on Lane 1 and loses nothing; a team whose tool can gets near-instant loading for free.
Why the sheet stays the source of truth
Three lanes in, but only one place where the chaser actually looks. That’s a deliberate constraint. If two lanes both wrote directly to the chaser’s state, every “why did this reminder go out?” question would mean checking three places. Funneling everything through the Drive sheet means there is exactly one row per invoice, and any team member can read or edit any of it without learning a new tool. The convenience lanes are first-class for getting invoices in, but they always pass through the sheet on the way.
Next post: how the chaser actually reads the list, computes days-past-due, and picks one of four moves.
All posts