Part 2 of 7 · Invoice chaser series ~4 min read

How an invoice gets loaded

The chaser only chases what’s on the list. So the first job is making sure the list actually reflects what your customers owe you. There are three ways an invoice gets on: it comes across from your accounting tool into the Drive sheet, somebody forwards an invoice PDF to a dedicated address, or your accounting tool calls a small endpoint the moment the invoice is issued. The first one is the backbone. The other two exist because in real life invoices get created in a dozen places and nobody types a row in a sheet for the one they just sent.

Key takeaways

  • Three intake lanes feed one list: the Drive sheet, an inbox-forwarding lane, and a webhook lane.
  • Inbound PDFs are parsed by Textract; Bedrock Haiku 4.5 reads the text and proposes a row.
  • Every parsed row goes to the bookkeeper’s Slack for one-tap approval before it lands on the list.
  • The webhook lane picks up new invoices from your accounting tool the moment they are issued.
  • The Drive sheet stays the canonical store. The other lanes are conveniences that write into it.

Three lanes into one list

Three intake lanes funnel into one invoice list A diagram with three vertical lane columns at the top and a single unified row at the bottom. Lane one, Drive sheet: an export from your accounting tool drops rows into the Google Sheet that holds the invoice list; the drive-sync Lambda mirrors the sheet to S3 every 15 minutes, and the chaser reads from there. Lane two, Inbox forwarding: somebody forwards an invoice PDF to a dedicated address, invoices-at-your-company; SES writes the raw MIME to S3; a parser Lambda runs Textract on the PDF, then calls Bedrock Haiku 4.5 to extract the invoice number, customer, amount, issue date, due date, and terms; the proposed row gets posted to the bookkeeper's Slack with Approve and Edit buttons; on approve, the row is added to the Drive sheet via the Sheets API. Lane three, Webhook: your accounting tool posts each newly issued invoice to a Lambda Function URL the moment it is created; the same proposal-and-approval flow runs in Slack before the row lands in the sheet, or it lands directly if the source is trusted. All three lanes converge on the same Drive sheet, which the drive-sync Lambda keeps mirrored to S3 for the chaser to read. A note at the bottom: the Drive sheet stays the source of truth — the other lanes are conveniences that propose rows for it. Lane 1 · accounting export Drive sheet • Export drops rows into the sheet • drive-sync mirrors to S3 every 15 min • Chaser reads from S3 • Source of truth stays in Drive Lane 2 · SES + Textract Inbox forwarding • Forward PDF to invoices-address • SES writes MIME to S3 • Textract reads PDF; Haiku 4.5 extracts row • Slack one-tap approve → sheet Lane 3 · on issue Webhook • Accounting tool posts new invoice • Hits a Function URL the moment it issues • Same proposal flow to Slack as Lane 2 • On approve → added to sheet Drive invoice sheet (source of truth) number · customer · contact · amount · issued · due · terms · paid · link drive-sync mirrors it to S3 every 15 min — chaser reads from S3 to chaser, daily tick The Drive sheet stays the source of truth — the other lanes are conveniences that propose rows for it.
Fig 2. Three lanes converge on one Drive sheet. The sheet is the source of truth; the inbox lane and the webhook lane are conveniences that propose rows for approval. The drive-sync Lambda mirrors the sheet to S3 so the chaser can read it without hitting Drive on every tick.

Lane 1: the Drive sheet from your accounting tool

The backbone lane. Most accounting tools can export an open-invoices report to a Google Sheet, or you paste one in on a schedule. The columns are short: number, customer, billing contact email, amount, issue date, due date, terms, a paid flag, and a link to the invoice PDF. A small Lambda — drive-sync — runs every fifteen minutes, exports the sheet as plain CSV via the Drive API, and writes it to s3://ic-invoices-source/invoices.csv if the sheet has changed since the last sync. The chaser reads from S3, not Drive directly. That keeps Drive API calls predictable and gives you S3 versioning for free, so a bad bulk-edit can be rolled back in one click.

This lane covers the steady state: your accounting tool is the system of record, and the sheet is just the window the chaser looks through. Most invoices arrive this way.

Lane 2: inbox forwarding (for invoices that live outside the tool)

Set up a dedicated inbound address — something like invoices@your-company.com — via Amazon SES. Anyone on the team forwards an invoice PDF to that address and the chaser takes it from there. SES writes the raw MIME to s3://ic-raw-mime/. The S3 PUT triggers a parser Lambda. The Lambda walks the MIME tree to the PDF attachment, runs Amazon Textract on it (Textract reads PDF, PNG, JPEG, and TIFF natively; if somebody forwards a Word document, the parser falls back to python-docx), and gets back the extracted text plus any tables.

Then a Bedrock Haiku 4.5 call reads the text and emits a structured row: invoice number, customer, amount, issue date, due date, terms, and a billing-contact guess based on the “To” line of the original forward. The model prompt is short: “Extract a row for the invoice list. Return JSON only. Mark each field with a confidence score. Do not invent an amount or a date that isn’t in the text.” The output goes to a small Slack interactive message that pings the bookkeeper: the proposed row, the confidence per field, and three buttons — approve, edit, discard. On approve, a Lambda writes the row to the Drive sheet via the Sheets API. On edit, the bookkeeper gets a fillable modal pre-populated with the proposal. On discard, the message is logged and the PDF moved to a discarded prefix in S3 for audit.

The reason every parsed row goes to a human first is simple: chasing a customer for the wrong amount is far worse than an invoice that never made it onto the list. A misread amount or a wrong contact turns a polite reminder into a complaint — and an awkward apology.

Lane 3: the webhook lane

Most accounting tools can call a URL when an invoice is created. That is the cleanest lane of all: the invoice lands on the list the same minute it is issued, with no export delay and no forwarding. Your accounting tool posts the new invoice to a Lambda Function URL — the same kind of small endpoint used for the pay-links later in the series. The endpoint checks a shared secret, maps the tool’s fields to the chaser’s row shape, and runs the same Slack proposal flow as Lane 2.

For a trusted source you can skip the approval step and let webhook rows land directly, since the data comes straight from your own system of record rather than a forwarded PDF that had to be read. The webhook lane is the most opt-in of the three: a team whose tool can’t call a URL just leans on Lane 1 and loses nothing; a team whose tool can gets near-instant loading for free.

Why the sheet stays the source of truth

Three lanes in, but only one place where the chaser actually looks. That’s a deliberate constraint. If two lanes both wrote directly to the chaser’s state, every “why did this reminder go out?” question would mean checking three places. Funneling everything through the Drive sheet means there is exactly one row per invoice, and any team member can read or edit any of it without learning a new tool. The convenience lanes are first-class for getting invoices in, but they always pass through the sheet on the way.

Next post: how the chaser actually reads the list, computes days-past-due, and picks one of four moves.

All posts