Part 2 of 7 · Document expiry watcher series ~4 min read

How an item gets tracked

The watcher only watches what’s in the registry. So the first job is making sure the registry actually reflects what your business has. There are three ways an item gets in: somebody types a row in the Drive sheet, somebody forwards a PDF contract to a dedicated address, or somebody puts an event on their Google Calendar with a small tag. The first one is obvious. The other two exist because in real life nobody types a row in a sheet for the contract they signed three minutes ago.

Key takeaways

  • Three intake lanes feed one registry: the Drive sheet, an inbox-forwarding lane, and a calendar import.
  • Inbound PDFs are parsed by Textract; Bedrock Haiku 4.5 reads the text and proposes a row.
  • Every parsed row goes to a rep’s Slack for one-tap approval before it lands in the registry.
  • Calendar events tagged #expires get pulled hourly via the Google Calendar API.
  • The registry stays the canonical store. The other lanes are conveniences that write into it.

Three lanes into one registry

Three intake lanes funnel into one registry A diagram with three vertical lane columns at the top and a single unified row at the bottom. Lane one, Drive sheet: somebody types a row directly into the Google Sheet that holds the registry; the drive-sync Lambda mirrors the sheet to S3 every 15 minutes, and the watcher reads from there. Lane two, Inbox forwarding: somebody forwards a contract PDF to a dedicated address, expires-at-your-company; SES writes the raw MIME to S3; a parser Lambda runs Textract on the PDF, then calls Bedrock Haiku 4.5 to extract name, vendor, expiry date, contract value, and category; the proposed row gets posted to a rep's Slack with Approve and Edit buttons; on approve, the row is added to the Drive sheet via the Sheets API. Lane three, Calendar import: events on someone's Google Calendar with the hashtag #expires in the description get pulled hourly by a calendar-sync Lambda using the Calendar API; the same proposal-and-approval flow runs in Slack before the row lands in the sheet. All three lanes converge on the same Drive sheet, which the drive-sync Lambda keeps mirrored to S3 for the watcher to read. A note at the bottom: the Drive sheet stays the source of truth — the other lanes are conveniences that propose rows for it. Lane 1 · manual Drive sheet • Somebody types a row directly • drive-sync mirrors to S3 every 15 min • Watcher reads from S3 • Source of truth stays in Drive Lane 2 · SES + Textract Inbox forwarding • Forward PDF to expires-address • SES writes MIME to S3 • Textract reads PDF; Haiku 4.5 extracts row • Slack one-tap approve → sheet Lane 3 · hourly poll Calendar import • Tag a calendar event with #expires • calendar-sync polls hourly via Calendar API • Same proposal flow to Slack as Lane 2 • On approve → added to sheet Drive registry sheet (source of truth) name · category · owner · vendor · expiry · renewal cost · contract value · link drive-sync mirrors it to S3 every 15 min — watcher reads from S3 to watcher, daily tick The Drive sheet stays the source of truth — the other lanes are conveniences that propose rows for it.
Fig 2. Three lanes converge on one Drive sheet. The sheet is the source of truth; the inbox lane and the calendar lane are conveniences that propose rows for human approval. The drive-sync Lambda mirrors the sheet to S3 so the watcher can read it without hitting Drive on every tick.

Lane 1: the Drive sheet itself

The simplest lane. Open the registry sheet in Drive, add a row, save. The columns are short: name, category, owner email, vendor, expiry date, renewal cost, contract value, and a link to the source document. A small Lambda — drive-sync — runs every fifteen minutes, exports the sheet as plain CSV via the Drive API, and writes it to s3://ew-registry-source/registry.csv if the sheet has changed since the last sync. The watcher reads from S3, not Drive directly. That keeps Drive API calls predictable and gives you S3 versioning for free, so a bad bulk-edit can be rolled back in one click.

This lane covers the cases where you already have a contract, you know when it expires, and you can spend thirty seconds typing it in. Most existing items go in this way during the initial setup.

Lane 2: inbox forwarding (the lane most teams actually use)

Set up a dedicated inbound address — something like expires@your-company.com — via Amazon SES. Anyone on the team forwards a contract PDF to that address and the watcher takes it from there. SES writes the raw MIME to s3://ew-raw-mime/. The S3 PUT triggers a parser Lambda. The Lambda walks the MIME tree to the PDF attachment, runs Amazon Textract on it (Textract reads PDF, PNG, JPEG, and TIFF natively; if somebody forwards a Word document, the parser falls back to python-docx), and gets back the extracted text plus any tables.

Then a Bedrock Haiku 4.5 call reads the text and emits a structured row: name, vendor, category, expiry date, renewal cost (if present in the document), contract value (if present), and an owner-suggestion based on the “To” line of the original forward. The model prompt is short: “Extract a row for the registry. Return JSON only. Mark each field with a confidence score. Do not invent a date that isn’t in the text.” The output goes to a small Slack interactive message that pings the rep who forwarded the email: the proposed row, the confidence per field, and three buttons — approve, edit, discard. On approve, a Lambda writes the row to the Drive sheet via the Sheets API. On edit, the rep gets a fillable modal pre-populated with the proposal. On discard, the message is logged and the PDF moved to a discarded prefix in S3 for audit.

The reason every parsed row goes to a human first is simple: a contract expiry the model misread is worse than a contract that never made it into the registry at all. The misread one will quietly tell you everything is fine until the morning the policy lapses.

Lane 3: calendar import

Some teams already track renewals on a calendar. The lease is on Maria’s personal calendar with a yellow flag. The SOC 2 audit window is on the engineering calendar. The cyber-insurance renewal is on the office manager’s calendar. Forcing those teams to also type rows in a sheet is a fight you don’t need to have on day one.

Lane 3 picks up calendar events tagged with #expires in the description. A small calendar-sync Lambda runs hourly, iterates through the configured Google Calendars (using a service-account credential stored in Secrets Manager), and pulls any events with the tag whose start time is in the future. Each pulled event becomes a proposal in the same Slack flow as Lane 2 — one-tap approve to add to the registry. Once approved, the calendar event itself can stay where it is or be deleted; the registry now owns the renewal.

Calendar import is the most opt-in of the three lanes. A team that doesn’t use it loses nothing; a team that does avoids retyping things they already typed once.

Why the registry stays the source of truth

Three lanes in, but only one place where the watcher actually looks. That’s a deliberate constraint. If two lanes both wrote directly to the watcher’s state, every “why did this ping go out?” question would mean checking three places. Funneling everything through the Drive sheet means there is exactly one row per item, and any rep can read or edit any of it without learning a new tool. The convenience lanes are first-class for getting items in, but they always pass through the sheet on the way.

Next post: how the watcher actually reads the registry, computes days-to-expiry, and picks one of four moves.

All posts