How a document enters the pipeline
Documents arrive three ways — uploaded, emailed, or dropped into a shared folder. The intake step gets them into one shared place, screens them for trouble, and tells the rest of the pipeline what type each one is.
Three ways in
Different teams have different habits. The pipeline meets people where they are:
- An upload form — a small page on your site where a customer or staff member drags a file in. The simplest path. Good for one-off scans and customer submissions.
- An email forward — a real email address (something like
docs@yourdomain). When you get an invoice in your inbox, you forward it. The pipeline picks it up automatically. - A folder drop — the office scanner or a shared drive folder. Drop a file in; the pipeline takes it from there.
You don’t have to use all three. Most clients pick one and stick with it.
What gets attached at intake
The intake step doesn’t read the document yet. It just saves the file and writes a small label alongside it:
- Where the document came from (upload form, email, folder).
- Who sent it, if known (email sender, signed-in user).
- What time it arrived.
- A best-guess document type, if the source gives a hint (e.g. emailed invoices land in the “invoice” bucket by default).
That label travels with the document through every step that follows. If something goes wrong later, you can always see how it got there.
Why virus scan first
Documents arrive from outside your business. Some of them — not many, but some — arrive carrying things you don’t want. Before the AI reader spends a single cent reading a file, the system runs a quick virus scan.
Two outcomes:
- Pass. Almost everything passes. The document moves on to the next stage.
- Fail. The file is held in a quarantine spot, an alert goes to the operator, and nothing in the pipeline reads its contents. The original sender hears nothing automatic — that’s for the operator to decide.
What gets quarantined (and what happens then)
It’s rare. Most quarantines aren’t actual viruses — they’re corrupted scans, password-locked PDFs the scanner can’t open, or files in formats the system doesn’t support. The operator decides whether to release them, ask the sender to resend, or simply ignore.
The actual scanner is a small piece you can swap. The simplest version is an open-source scanner the pipeline runs itself. The cloud also offers a managed version that does the same job without you running anything — it’s the right call when it’s available in your region and the pay-per-scan price fits your volume.
In plain words
Three ways in, one mailbox, one safety check. Once a document is past the virus scan, the rest of the pipeline can trust what it’s reading. The label that intake attached travels with the document everywhere — so a year from now you can still trace any field on any sheet back to the exact file it came from.
All posts