How a number that looks off gets flagged

Key takeaways

The checks run on the gathered figures before the report is written, all in plain Python.
Three kinds of flag: a stale source, a figure out of its normal range, and a total that doesn’t reconcile.
A flagged figure is shown but clearly marked unverified — it never appears as a plain fact.
Everything flagged is collected into a Needs a look section at the top of the report.
Every flag is written to wr-flags so you can see how often a source misbehaves.

Three kinds of flag

Fig 5. Three kinds of check, three ways a figure can look off. A stale source, a figure out of its normal range, and two numbers that don’t reconcile. Each flag is shown in the report, written to wr-flags, and never stated as plain fact.

Check 1: a source that didn’t update (the most common)

The quietest failure is a source that simply didn’t refresh. The bank export didn’t land this week, so the cash figures are last week’s. The point-of-sale was offline Saturday, so the weekend is missing. The danger isn’t a loud error — it’s that a missing source reads as a zero or a stale repeat, and a zero looks like a real number. Check 1 compares each source’s last sync time against the week the report covers. If a source hasn’t synced through the end of the week, or a figure comes back blank where this business never has a true zero (a shop that always sells something can’t have a $0 sales day), the figure is marked unverified and the source is named in the flag.

This is the check that pairs with the completeness gate from Part 4. The gate tries to hold the send until everything is in; if the data still isn’t complete by the last retry, this check makes sure the partial figure is labelled rather than reported straight.

Check 2: a figure far outside its normal range

Some bad numbers are present, not missing — they’re just wrong. A tool double-counts a batch and sales come back at ten times a normal week. A currency column gets read in cents and every figure is off by a hundred. Check 2 catches these by comparing each figure against its own recent history: the trailing several weeks set a normal range, and anything more than the configured number of standard deviations outside it — or a multiple of last week that no real week ever reaches — trips the flag. The figure is still shown, but next to it the report says what was expected: “$184,000 this week — the four-week average is $18,000; this looks off, please check the source.”

The threshold is in the config doc, so a business with genuinely spiky weeks (seasonal, event-driven) can widen the range and not get flagged every busy week. The goal is to catch the impossible, not to second-guess a real good week.

Check 3: numbers that should agree but don’t

The strongest check is one number against another. Sales recorded in the sales sheet should roughly match the cash that came in through Stripe and the bank. The parts of a total — sales by category, or by location — should sum to the total. Check 3 compares the pairs that ought to agree and flags any gap bigger than a small tolerance. When sales say $20,000 but cash-in says $14,000, that’s not necessarily an error — it could be timing, refunds, or an unpaid invoice — but it’s exactly the kind of thing the owner should see. Both figures are shown side by side with the gap named, and the builder makes no attempt to pick which one is “right.”

Reconciliation is the check that most often surfaces something real rather than a glitch — a refund batch nobody logged, an invoice that was sent but not paid, a category that stopped reporting. It’s the difference between a report that lists numbers and one that notices when they disagree.

What a flag does to the report

Every flag from all three checks is collected into a Needs a look section that sits at the very top of the report, above the summary. Each entry names the figure, the check that tripped, what was expected, and what was actually there. The flagged figure still appears in the numbers table below, but marked — a small “unverified” tag — so it’s never mistaken for a clean fact. Crucially, a flagged figure is also withheld from the writer’s facts list in Part 3: the model is never handed a number that failed a check, so the summary paragraph can’t accidentally state a bad figure as if the week really went that way.

Every flag is also written to the wr-flags DynamoDB table with the run date, the figure, the check, the expected value, and the actual value. Over a few months that table tells its own story — the bank export that’s late one Monday in three, the point-of-sale that drops every long weekend. A source that keeps tripping the same flag is a source to fix at the source, and the table is the evidence for that conversation.

Next post: the cost breakdown. The whole pipeline above runs in coffee-money territory at SMB volume; Part 6 explains exactly where the dollars go and why the one model call a week is the biggest single line.

All posts