How the watcher knows when to alert

Key takeaways

The watcher runs once a day via EventBridge Scheduler at 8am local time.
Per-category window chains live in the rules doc — legal contracts get 90/60/30/14/3, insurance gets 60/30/7, software gets 30/14/3.
Four moves per item, every tick: healthy, first alert, reminder, escalate.
DynamoDB tracks last-ping and acknowledgment per item so reminders aren’t duplicate spam.
The watcher itself never calls a model. The decision is entirely deterministic.

The decision flow, per item

Fig 3. The watcher’s decision tree, per item, per daily tick. Five steps decide which of four moves applies. The rules doc holds every threshold; the watcher only enforces them.

Window chains: 90/60/30/14/3 isn’t magic, it’s in the doc

The rules doc has one short section per category. Each section names the chain in plain prose: “Legal contracts: ping at 90, 60, 30, 14, and 3 days. Insurance: 60, 30, 7. Software subscriptions: 30, 14, 3. Domain renewals: 60, 30, 14, 7, 3.” The numbers are days remaining when the alert fires. The first number is the first alert. The last number is the escalation point — if the owner hasn’t acknowledged by then, the escalation target gets pinged too.

The chains exist for a reason. A 90-day legal-contract ping gives time to negotiate with the vendor or solicit a competing quote. A 30-day insurance ping is when the broker actually has time to bind cover. A 3-day software ping is the last-chance “you forgot, please go pay this” reminder before the SaaS app locks the team out. Different categories have different mechanics; the chains reflect that.

Per-item overrides exist too. The registry sheet has an optional column called chain_override. Type a comma-separated list of days there and the watcher uses your numbers instead of the category default for that one row. This is the right escape hatch for the contract you signed knowing it has a 120-day notice period.

Four moves, always

Every item, every tick, lands in exactly one of four buckets. The names are simple on purpose.

Healthy. The expiry is more than the first window away, or the item has been acknowledged for the current chain. Do nothing. Most items, most days, are healthy.
First alert. The expiry just crossed the first window threshold and there’s no acknowledgment yet. Send a fresh alert with full context. Write a row to the ew-pings DynamoDB table marking that the first window has fired.
Reminder. A subsequent window crossed without acknowledgment. Send a follow-up that names the previous ping’s date so the owner doesn’t feel like they’re seeing the alert for the first time. Write the new ping to ew-pings.
Escalate. The final window in the chain crossed without acknowledgment. Send to the escalation target named in the rules doc — usually the owner’s manager — in addition to the owner. Mark the item as escalated in DynamoDB; the next tick will keep escalating daily until somebody acknowledges or renews. Bad timing burns pings; an escalated item is one of the few cases where daily noise is the right answer.

State that makes the decision deterministic

The watcher reads two DynamoDB tables every tick. ew-pings records every alert that’s gone out: (item_id, chain_index, ping_date, dispatched_via). ew-ack records every acknowledgment: (item_id, ack_date, by_user). With those two tables, the move-decision logic is a few dozen lines of Python and zero magic. A given item with a given expiry, a given window chain, and a given ack/ping history always produces the same move. Re-running the tick produces no extra pings (because the state in DDB shows what already fired).

Renewing an item is an explicit reset of both tables for that item: rows for the old chain are kept for audit, but a new chain starts fresh against the new expiry date. Part 5 covers the renewal flow in detail.

Why the daily tick uses no model

The watcher could call a model on the tick to write a smarter alert message, or to decide whether to ping at all. It doesn’t. Two reasons. First, the daily tick should be the one part of the system that is utterly predictable — if the rules doc says ping at 60 days and there’s no ack, the ping fires. A model in that loop introduces variance the team can’t reason about. Second, model calls cost money, and most days most items are healthy, so the call would be wasted nine days out of ten.

Bedrock fires elsewhere — on the inbound parsing lane in Part 2, and on the monthly summary mentioned in Part 6. Not on the daily tick. The watcher itself is plain Python that reads a doc and writes events.

Next post: how an alert finds the right person, how quiet hours and holidays are honored, and what an acknowledgment actually does to the chain.

All posts