Document pipeline
A serverless pipeline on AWS that reads documents for you, checks each extraction, and sends the structured data to whichever tools you already use. Seven posts on the same system — one diagram at a time — with an engineering reference at the end.
-
01
A document pipeline on AWS for a few dollars a month
The whole system on one page — a reader, a validator, a router, and a small rules file that controls where everything goes.
-
02
How a document enters the pipeline
Three ways in — uploaded, emailed, or dropped into a folder. One shared place after intake, screened for trouble before any AI runs.
-
03
How the AI reads a document
Two AIs in sequence — one specialist for layout, one generalist for meaning. Together they read a document better, faster, and cheaper than either alone.
-
04
How extraction stays accurate
The validator gates the obvious cases through and queues the unsure ones for a quick human review. Wrong values never quietly slip into your tools.
-
05
How the data flows out
Each destination is a small pluggable piece. The rules file decides which document type lands where — sheet, accounting software, Slack.
-
06
What the document pipeline costs
A coffee a month at typical SMB volume. Cents per document, not dollars. Where the bill actually comes from.
-
07
Engineering reference: the document pipeline architecture
Same system, drawn purely for engineers. Service names, resource identifiers, region, Bedrock model IDs, and the actual flow operations.