Part 3 of 7 · Vendor onboarder series ~5 min read

How a vendor document gets checked

A vendor drops a PDF on their upload page. Now the onboarder has to work out what it is, whether it fills a gap in the checklist, and whether it’s any good — a certificate of insurance that expired last month is present but useless. The reading uses a model; the deciding is plain Python; and a person confirms before the item turns green. This post follows one uploaded file from the moment it lands to the moment it counts.

Key takeaways

  • An upload triggers the checker once — Textract reads the page, Haiku 4.5 reads the fields.
  • Plain Python matches the file to a required item and checks both present and in-date.
  • An expired insurance certificate is present but fails the date check — it stays not-done.
  • A person confirms the read before the item counts; a misread date is caught here, not later.
  • The checker calls a model only on upload — never on the daily chase tick.

The check flow, per uploaded file

Check flow for one uploaded document A vertical decision flow diagram. At the top, an input box "File uploaded" with the vendor, the file name, and the S3 key. Below that, a step "Read the file" — Textract pulls the text and Bedrock Haiku 4.5 proposes the document type and the fields that matter, such as the tax-ID on a tax form or the expiry date on an insurance certificate. Below that, a check "Matches a required item?" — if the proposed type doesn't map to anything on this vendor's checklist, route to "Unmatched" (hold for a person to look at). If it does, continue. The next step "Is it present?" — confirms the file actually contains the document, not a blank or wrong page. The next step "Is it still in date?" — for items with an expiry (insurance, some tax forms), compares the expiry date against today; if expired, route to "Expired" — the item stays not-done with a note asking for a current copy. If in date or no expiry applies, continue. The next step "Human confirms the read" — a person sees the proposed item and fields and taps confirm or fix. On confirm, route to "Item done" — the checklist row in DynamoDB marks this item present, in date, and confirmed. Each terminal box — Unmatched, Expired, Item done — updates the vendor's checklist state. A note at the bottom: the model only reads; plain Python decides present and in-date; a human confirms before anything counts as done. File uploaded vendor · file name · S3 key Step 1 Read the file Textract + Haiku 4.5 fields Step 2 Matches a required item? map type to checklist Step 3 Is it present? real document, not a blank Step 4 Is it still in date? no expiry → pass expired → not done Step 5 Human confirms the read? confirm or fix the fields Unmatched hold for review Expired ask for current copy Item done present + in date Fix needed re-read or re-upload no match expired unclear confirm fix The model only reads — plain Python decides present and in-date, and a human confirms before it counts.
Fig 3. The checker’s flow for one uploaded file. Read it once, match it to a required item, check present and in-date, and let a human confirm. Only then does the item turn green.

Reading the file: a model, used narrowly

When a file lands in the vendor’s folder, the S3 write triggers the checker Lambda. It runs Amazon Textract on the file — Textract is a managed service that reads the text and tables out of PDFs and images, so a scanned certificate works as well as a clean PDF. The extracted text then goes to one Bedrock Haiku 4.5 call with a tight prompt: “Here is the text of a document a vendor uploaded. Tell me which of these it is — bank details, tax form, insurance certificate, signed agreement, or unknown — and pull these fields if present: the tax-ID, the policy expiry date, the bank account name. Return JSON only. Do not invent a date that isn’t in the text.”

That’s the only place a model is involved. It reads; it doesn’t decide anything. Everything after this is plain Python comparing what the model pulled against the checklist rules.

Two checks: present, and in date

A document can fail in two different ways, and the checker tests for both.

Present. Does this file actually fill a gap on the checklist? The Python maps the model’s proposed type to a required item for this vendor. If the vendor owes an insurance certificate and the upload reads as one, that item is now a candidate to mark present. If the file reads as something the vendor doesn’t owe — or as “unknown” — it’s held as unmatched for a person to look at, rather than silently dropped.

In date. Some documents have an expiry; a certificate of insurance is the clearest case, and some tax forms are only valid for the current year. The checklist doc says which items carry a date rule. For those, the Python compares the expiry date the model pulled against today. An insurance certificate that expired last month is present but fails the date check, so the item stays not-done with a note: “Insurance certificate received, but it expired on 2026-04-30 — please upload a current one.” This is the check that catches the most common real-world problem: a supplier who genuinely sent a certificate, just last year’s.

Why a human confirms every read

Even a good model misreads a smudged date or a tax-ID with an extra digit now and then. For a vendor file, a wrong read is dangerous in a quiet way: it can mark an item done when it isn’t, and the system would then stop chasing it. So every read goes to a one-tap confirmation. The owner (or whoever set up the vendor) sees the proposed item, the pulled fields, and a thumbnail, with two buttons: confirm and fix. On confirm, the checklist row in DynamoDB marks the item present, in date, and confirmed. On fix, they correct the field — usually a date — and the corrected value is what’s stored.

The confirmation is deliberately lightweight, because most reads are right and you don’t want to make the common case slow. But the principle holds: a document counts as done only when a person has agreed it’s the right document and the read is correct. The cost of a wrong “done” here is a vendor approved on paperwork that wasn’t really valid.

State that makes the checklist trustworthy

The vendor’s checklist lives in one DynamoDB row per vendor, with a small map for each required item: (item, status, present, in_date, expiry, confirmed_by, confirmed_at). Every upload updates exactly one item’s entry. Because the state is explicit, the chase tick in the next post never has to re-read a document — it just looks at which items are still not done. Re-uploading a replacement (a current certificate over an expired one) overwrites that one item’s entry and leaves the rest alone.

Why the daily tick uses no model

The checker calls Textract and Bedrock only when a file is uploaded — a handful of times per vendor, total. The daily chase tick in Part 4 reads the same DynamoDB row and calls no model at all; it just compares the checklist state against the calendar. That split keeps the cost down (you pay to read a document once, not every day) and keeps the part that runs every day completely predictable.

Next post: how a vendor gets chased for the items still missing — the daily tick, the four moves, and how a reminder lists only what’s left.

All posts