Part 3 of 7 · Document expiry watcher series ~5 min read

How the watcher knows when to alert

Once a day, at 8am local time, an EventBridge Scheduler rule fires the watcher Lambda. The Lambda reads the registry, looks at one row at a time, computes the days remaining, and decides whether to do nothing or to fire an alert — and if so, which kind. The whole decision is plain Python. No model. No vector retrieval. Every threshold lives in the rules doc, where a rep can edit it without a deploy.

Key takeaways

  • The watcher runs once a day via EventBridge Scheduler at 8am local time.
  • Per-category window chains live in the rules doc — legal contracts get 90/60/30/14/3, insurance gets 60/30/7, software gets 30/14/3.
  • Four moves per item, every tick: healthy, first alert, reminder, escalate.
  • DynamoDB tracks last-ping and acknowledgment per item so reminders aren’t duplicate spam.
  • The watcher itself never calls a model. The decision is entirely deterministic.

The decision flow, per item

Decision flow per item on every daily tick A vertical decision flow diagram. At the top, an input box "Item from registry" with the row's name, category, expiry date, owner, and DynamoDB state. Below that, a step "Compute days_to_expiry" — today's date in the configured timezone subtracted from the expiry date. Below that, a check "Already acknowledged?" — if yes, route to "Healthy" (do nothing this tick). If no, continue. The next step "Look up window chain for category" — pulls the chain from the rules doc; for example legal contracts return [90, 60, 30, 14, 3]. The next step "Find smallest window threshold not yet crossed" — iterates the chain and picks the first window where days_to_expiry equals the threshold or fell below it since the last tick. If none, route to "Healthy." If one, look at the DynamoDB state to see if a ping has already gone out for this window. If no ping yet for this window, route to "First alert" (or "Reminder" if a previous window has already pinged). If the smallest crossed window is the final one in the chain, route to "Escalate" — ping the escalation target named in the rules doc instead of (or in addition to) the owner. Each terminal box — Healthy, First alert, Reminder, Escalate — emits an event to EventBridge with the move and the item context. A note at the bottom: the rules doc holds every threshold; the watcher's code only enforces them — change a window in the doc and tomorrow's tick uses the new value. Item from registry name · category · expiry · owner Step 1 Compute days_to_expiry today (TZ) − expiry_date Step 2 Already acknowledged? read DDB ew-ack table Step 3 Look up window chain e.g. legal → [90, 60, 30, 14, 3] Step 4 Smallest window crossed? none → healthy final → escalate Step 5 Already pinged this window? read DDB ew-pings table Healthy do nothing First alert first window crossed Reminder subsequent window Escalate final window, no ack if yes none final first subsequent The rules doc holds every threshold — change a window and tomorrow’s tick uses the new value.
Fig 3. The watcher’s decision tree, per item, per daily tick. Five steps decide which of four moves applies. The rules doc holds every threshold; the watcher only enforces them.

Window chains: 90/60/30/14/3 isn’t magic, it’s in the doc

The rules doc has one short section per category. Each section names the chain in plain prose: “Legal contracts: ping at 90, 60, 30, 14, and 3 days. Insurance: 60, 30, 7. Software subscriptions: 30, 14, 3. Domain renewals: 60, 30, 14, 7, 3.” The numbers are days remaining when the alert fires. The first number is the first alert. The last number is the escalation point — if the owner hasn’t acknowledged by then, the escalation target gets pinged too.

The chains exist for a reason. A 90-day legal-contract ping gives time to negotiate with the vendor or solicit a competing quote. A 30-day insurance ping is when the broker actually has time to bind cover. A 3-day software ping is the last-chance “you forgot, please go pay this” reminder before the SaaS app locks the team out. Different categories have different mechanics; the chains reflect that.

Per-item overrides exist too. The registry sheet has an optional column called chain_override. Type a comma-separated list of days there and the watcher uses your numbers instead of the category default for that one row. This is the right escape hatch for the contract you signed knowing it has a 120-day notice period.

Four moves, always

Every item, every tick, lands in exactly one of four buckets. The names are simple on purpose.

  • Healthy. The expiry is more than the first window away, or the item has been acknowledged for the current chain. Do nothing. Most items, most days, are healthy.
  • First alert. The expiry just crossed the first window threshold and there’s no acknowledgment yet. Send a fresh alert with full context. Write a row to the ew-pings DynamoDB table marking that the first window has fired.
  • Reminder. A subsequent window crossed without acknowledgment. Send a follow-up that names the previous ping’s date so the owner doesn’t feel like they’re seeing the alert for the first time. Write the new ping to ew-pings.
  • Escalate. The final window in the chain crossed without acknowledgment. Send to the escalation target named in the rules doc — usually the owner’s manager — in addition to the owner. Mark the item as escalated in DynamoDB; the next tick will keep escalating daily until somebody acknowledges or renews. Bad timing burns pings; an escalated item is one of the few cases where daily noise is the right answer.

State that makes the decision deterministic

The watcher reads two DynamoDB tables every tick. ew-pings records every alert that’s gone out: (item_id, chain_index, ping_date, dispatched_via). ew-ack records every acknowledgment: (item_id, ack_date, by_user). With those two tables, the move-decision logic is a few dozen lines of Python and zero magic. A given item with a given expiry, a given window chain, and a given ack/ping history always produces the same move. Re-running the tick produces no extra pings (because the state in DDB shows what already fired).

Renewing an item is an explicit reset of both tables for that item: rows for the old chain are kept for audit, but a new chain starts fresh against the new expiry date. Part 5 covers the renewal flow in detail.

Why the daily tick uses no model

The watcher could call a model on the tick to write a smarter alert message, or to decide whether to ping at all. It doesn’t. Two reasons. First, the daily tick should be the one part of the system that is utterly predictable — if the rules doc says ping at 60 days and there’s no ack, the ping fires. A model in that loop introduces variance the team can’t reason about. Second, model calls cost money, and most days most items are healthy, so the call would be wasted nine days out of ten.

Bedrock fires elsewhere — on the inbound parsing lane in Part 2, and on the monthly summary mentioned in Part 6. Not on the daily tick. The watcher itself is plain Python that reads a doc and writes events.

Next post: how an alert finds the right person, how quiet hours and holidays are honored, and what an acknowledgment actually does to the chain.

All posts