How extraction stays accurate

Fig 4. Confident extractions go straight through. Unsure ones get a 5-second human check, then merge back into the same flow.

Two checks, in order

For every extracted document, the validator runs two checks before letting it move on.

Check 1 — do the rules hold?

The rules file describes what each document type should look like:

The required fields for this type (an invoice without a total isn’t an invoice).
The shape of each value (a date should look like a date; an amount should be a number).
Cross-checks between fields where they apply (line items should add up to the total, give or take rounding).

These rules are written once, in plain language, and live in the same Drive folder as the rest of your config. They’re free to run.

Check 2 — is the AI sure?

Every field came out of the reader with a confidence score — how sure the AI was about that specific value. The validator compares each score to a threshold for that field type (you can set strict thresholds for amounts and dates, looser ones for free-text notes).

Any field below its threshold raises a flag.

Three possible verdicts

Pass. All rules hold, all fields are confident enough. The document moves on to the router untouched. Most documents land here.
Review. A rule is questionable, or a field came in fuzzy. The document goes to a small review queue. An operator opens it, sees the flagged field highlighted next to the original document image, approves or fixes, moves on. Five seconds, maybe ten.
Reject. The document is so badly extracted that the rules can’t hold at all (e.g. an invoice with no recognisable line items). The document is sent back to the operator with a clear note saying what looked wrong, and the original sender hears nothing automatic.

Why the loop matters

This is the part that separates a useful document pipeline from a frustrating one.

Without the validator, the system would either stop trusting the AI entirely (everything goes to a queue, the human is the bottleneck again) or trust it too much (wrong totals quietly land in your books). The validator’s job is to know when to trust the AI for free and when to spend five seconds of a human’s time. Confidence scores plus rule checks make that decision precise.

What the operator actually sees

When a document lands in the review queue, the operator sees:

The original document image on one side.
The extracted fields on the other side, with the flagged ones highlighted.
One button per flagged field: approve as-is, fix and approve, or reject.

No code, no terminal. The whole queue can be cleared between coffee sips.

In plain words

The AI is allowed to be confident, and the validator decides how to respond when it isn’t. Most documents pass on their own. The few that don’t are obvious to spot and quick to fix. Wrong values never quietly slip into your tools — they always hit a human gate first.

All posts