Series · 7 parts Published April 28, 2026

Document pipeline

A serverless pipeline on AWS that reads documents for you, checks each extraction, and sends the structured data to whichever tools you already use. Seven posts on the same system — one diagram at a time — with an engineering reference at the end.

  1. 01

    A document pipeline on AWS for a few dollars a month

    The whole system on one page — a reader, a validator, a router, and a small rules file that controls where everything goes.

  2. 02

    How a document enters the pipeline

    Three ways in — uploaded, emailed, or dropped into a folder. One shared place after intake, screened for trouble before any AI runs.

  3. 03

    How the AI reads a document

    Two AIs in sequence — one specialist for layout, one generalist for meaning. Together they read a document better, faster, and cheaper than either alone.

  4. 04

    How extraction stays accurate

    The validator gates the obvious cases through and queues the unsure ones for a quick human review. Wrong values never quietly slip into your tools.

  5. 05

    How the data flows out

    Each destination is a small pluggable piece. The rules file decides which document type lands where — sheet, accounting software, Slack.

  6. 06

    What the document pipeline costs

    A coffee a month at typical SMB volume. Cents per document, not dollars. Where the bill actually comes from.

  7. 07

    Engineering reference: the document pipeline architecture

    Same system, drawn purely for engineers. Service names, resource identifiers, region, Bedrock model IDs, and the actual flow operations.

All posts