How the AI reads a document

Key takeaways

Two AIs, one job each — a layout specialist for words and tables, a meaning generalist for fields.
The specialist hands a clean map of the page to the generalist; neither does the other’s job.
Splitting the work is more accurate, cheaper, and easier to debug than one big AI doing both.
Output is named fields with values in the right shape and a per-field confidence score.
That confidence score is what the validator uses next to decide trust-or-review.

Fig 3. Layout specialist first, meaning generalist second. Output: structured fields the validator can score.

Two AIs, one job each

You could in theory hand the whole document to a single big AI and ask “please extract the fields.” It works — sometimes. It also costs more, makes more mistakes on tables and signatures, and gets worse the longer the document gets.

The pipeline splits the work in two. A small specialist does the boring half (where are the words?). A small generalist does the interesting half (what do they mean?). Each tool gets used for what it’s good at.

The specialist: layout

The first AI is built specifically for documents. It finds words on the page and tells you exactly where they are. It recognises tables and gives you each cell. It detects signatures, checkmarks, and form fields. It does this consistently — the same scan twice gives you the same result.

What it does not do: decide which words are the “invoice number” and which are the “customer reference.” That’s not its job. It hands the next stage a clean map of the page.

The generalist: meaning

The second AI is the kind that can be told what to look for in plain language. The pipeline tells it: “this is an invoice. Look for these fields: vendor name, invoice number, total amount, due date, line items.” The AI reads the layout from stage one and fills out the form.

Crucially, the generalist isn’t scanning the original document — it’s working from the specialist’s clean map. Faster, cheaper, more accurate.

Why two not one

Three reasons:

Accuracy. Specialist tools beat generalist ones at the parts they specialise in. Asking a generalist to find table cells inside a scanned PDF is a recipe for hallucinated numbers.
Cost. The specialist is cheap per page and predictable. The generalist is also cheap, but only when it’s working on a small clean map — not a giant blob of raw scan.
Trust. When something goes wrong, you can tell which AI got it wrong. The split makes the system debuggable.

What comes out the other end

For each field the generalist extracts, you get three things:

The field name (vendor, total, due date).
The value, in the right shape (a number for “total”, a date for “due date”).
A confidence score — how sure the AI is.

That confidence score is the next post’s entire subject — it’s how the validator decides whether a document goes straight through or needs a human to peek at it for two seconds.

In plain words

One AI finds the words. Another AI decides what they mean. Together they read a document better, faster, and cheaper than either could alone — and crucially, when they’re wrong, the system knows.

All posts