Part 2 of 7 · Contract summarizer series ~4 min read

How a contract gets read

The summarizer only reads what reaches it. So the first job is making it easy to hand a contract over, in whatever form it already lives in. There are three ways one gets in: somebody drops a file in a Drive folder, somebody forwards a PDF to a dedicated address, or your e-sign tool pings the system the moment an agreement is signed. Once the file is in, Textract turns the pages into text and a small pass splits it into numbered clauses so everything that follows can point at exact lines.

Key takeaways

  • Three intake lanes feed one reader: a Drive folder, an inbox-forwarding lane, and an e-sign webhook.
  • Whatever lane it comes from, the contract lands as a file in S3 and kicks off the same read.
  • Textract turns the pages into text — it reads PDFs, scans, and photos of a contract.
  • A small Python pass splits the text into numbered clauses so later steps can cite exact lines.
  • No model runs here. Reading and splitting is plain, cheap, and the same for every contract.

Three lanes into one reader

Three intake lanes funnel into one reader A diagram with three vertical lane columns at the top and a single unified row at the bottom. Lane one, Drive folder: somebody drops a contract file into a watched Google Drive folder; the drive-watch Lambda copies the new file to S3 and starts a read job. Lane two, Inbox forwarding: somebody forwards a contract PDF to a dedicated address, review-at-your-company; SES writes the raw MIME to S3; a Lambda pulls out the PDF attachment and starts a read job. Lane three, E-sign webhook: when an agreement is signed in your e-sign tool, the tool calls a Function URL with the document; the Lambda fetches the signed PDF and starts a read job. All three lanes converge on the same S3 contracts bucket, which holds one object per contract. From there the intake runs Textract to turn the pages into text, then a Python pass splits the text into numbered clauses and writes the clause list back to S3 for the reader. A note at the bottom: every lane ends the same way — a file in S3 and a clause list — so the reader never has to know how the contract arrived. Lane 1 · drop a file Drive folder • Drop a contract into the folder • drive-watch copies it to S3 • Starts a read job for that file • Good for batches you already have Lane 2 · SES + Textract Inbox forwarding • Forward PDF to review-address • SES writes MIME to S3 • Lambda pulls the PDF, starts read job • The lane most owners use Lane 3 · on signing E-sign webhook • Agreement signed in your e-sign tool • Tool calls a Function URL • Lambda fetches the signed PDF, reads it • Summary waits before you file it S3 contracts bucket → Textract → numbered clauses one file per contract · pages turned to text · split into clause 1, clause 2, … clause list written back to S3 — the reader reads from there to reader, pull terms Every lane ends the same way — a file in S3 and a clause list — so the reader is one path.
Fig 2. Three lanes converge on one S3 bucket. From there the read is identical: Textract turns the pages into text and a Python pass splits it into numbered clauses. The reader never has to know whether the contract came from Drive, an email, or your e-sign tool.

Lane 1: the Drive folder

The simplest lane. You have a folder in Drive — call it Contracts to review. Drop a file in, and you’re done. A small Lambda — drive-watch — checks the folder for new files, copies each one to s3://cs-contracts/, and starts a read job. This is the lane for the stack of agreements you already have on your laptop, or the renewal a supplier emailed you that you just save and drag in. It is also the easiest way to feed the system a batch on day one: select twenty old contracts, drop them in the folder, and come back to twenty summaries.

Because everything lands in the same S3 bucket no matter the lane, this folder is just a friendly front door. The work the system does on a file dropped here is exactly the work it does on a file that arrived any other way.

Lane 2: inbox forwarding (the lane most owners use)

Set up a dedicated inbound address — something like review@your-company.com — via Amazon SES. Anyone on the team forwards a contract PDF to that address and the system takes it from there. SES writes the raw email to s3://cs-raw-mime/. That S3 write triggers a Lambda that walks the email, finds the PDF attachment (or a Word or scanned document — Textract reads images and scans natively; a Word file falls back to a small text reader), copies it into the contracts bucket, and starts a read job.

This is the lane that fits how contracts actually arrive: as an attachment in an email, while you’re busy doing something else. You forward and forget. A few minutes later the summary is waiting wherever you asked for it — covered in Part 5. No new tool to learn, no folder to remember; just a forward.

The system always reads the document a person handed it. It never goes looking for contracts on its own, and it never sends anything anywhere off the back of an intake — the only thing an intake produces is a draft summary that waits for you. That boundary matters: the system reads, it does not act.

Lane 3: the e-sign webhook

If your business signs through an e-sign tool — DocuSign, PandaDoc, and the like — the system can read the agreement the moment it’s signed. You set up a webhook in the e-sign tool pointing at a Lambda Function URL (a plain web address that runs a small function; covered in the engineering reference). When an envelope completes, the tool calls that address. The Lambda fetches the signed PDF, copies it to the contracts bucket, and starts a read job — so the plain-English summary of what everyone just agreed to is sitting in your inbox before the celebration email.

This lane is the most hands-off of the three: nobody has to forward or drop anything, because the signing event itself is the trigger. It is also the one that catches the contract you might otherwise never re-read — the one you signed in a hurry and filed. A summary of it lands automatically, and if a clause should worry you, you find out now rather than at renewal.

Why one reader, not three

Three lanes in, but only one place where the reading actually happens. That’s deliberate. If each lane did its own reading, a bug in how a forwarded PDF gets parsed would behave differently from the same bug on a dropped file, and every “why did this summary look wrong?” question would mean checking three code paths. Funneling everything down to one file in S3 means there is exactly one read step, one way clauses get numbered, and one place to look when something needs fixing. The lanes are just doors; the room behind them is the same.

Next post: how the reader takes those numbered clauses and pulls the key terms — parties, deal, money, dates — into a fixed shape, with every field tied back to the clause it came from.

All posts