How a comment gets checked

Key takeaways

One webhook brings in every item — comments, reviews, and posts arrive the same way.
Each item is cleaned and stored once, so there is exactly one record per item.
A fast rule pass in plain Python checks banned words, links, length, and known-good authors.
Most items are settled by the rule pass with no model call at all.
Only the borderline middle moves on to the checker in Part 3.

Three steps before the checker

Fig 2. Three steps converge on one saved item. The webhook brings it in, the clean-and-store step makes exactly one record, and the rule pass settles the easy calls. Only the borderline middle moves on to the checker.

Step 1: the webhook brings it in

Every platform that lets people post — your community page, the comment plugin under your blog, your review pages — can fire a webhook: a small message sent to a web address you control whenever something new appears. You point each platform’s webhook at one Lambda Function URL. A Function URL is just a plain web address that runs a small piece of code; there’s no heavy gateway in front of it, which keeps the cost near zero. The first thing that code does is check the signature on the message, so only your real platforms can post items — not a random script that found the address. The raw payload is written to S3 exactly as received, so there’s always an untouched copy to fall back on.

One item, one message. A comment, a review, and a post all look the same once they’re in: an author, some text, maybe a link or two, and a note about which area of your site they belong to.

Step 2: clean and store, exactly once

Raw webhook payloads are messy — HTML tags, tracking junk, odd spacing, the same item sometimes delivered twice. A small Lambda strips the HTML, normalizes the text, and pulls out the parts that matter: the author, the list of links, the length, and the area. It writes one record to the cm-items table in DynamoDB, keyed by a stable item id from the platform. If the same item arrives twice (platforms sometimes retry), the second write lands on the same key and nothing is duplicated. The original payload stays in S3, so if anyone ever asks “what exactly did this person post?” the answer is one click away.

From here on, the rest of the system works off that one clean record. There is exactly one row per item, and any moderator can read it without learning a new tool.

Step 3: the fast rule pass

Before any model is asked to read anything, a fast check runs in plain Python against the house-rules lists. Is the author on the known-good allow list — a long-time member, a verified customer, a teammate? Then pass it; trusted people don’t need a model reading their every word. Does the text trip a banned word from your list? Does it carry a link from a domain you’ve blocked? Is it empty, or a wall of ten thousand characters that’s obviously a paste-bomb? Each of those is a clear signal.

The rule pass ends with one of three labels. Pass — clearly safe, publish it. Hold — obvious spam or a blocked link, keep it out of view and queue it (but never delete it). Borderline — the rules can’t settle it, so hand it to the checker. The whole pass is a few dozen lines of code and costs effectively nothing. At typical volume it settles most items right here, which is the whole point: the model should only ever read the items a simple rule genuinely can’t.

Why everything funnels to one record

Three steps in, but only one place the rest of the system looks: the saved item record. That’s a deliberate constraint. If the webhook code, the cleaner, and the rule pass each kept their own copy of the truth, every “why was this held?” question would mean checking three places. Funneling everything through one record means there is exactly one row per item, carrying its rule-pass result forward, and the checker in Part 3 reads from that and nothing else.

Next post: how the borderline middle gets a verdict — how the checker asks the model, what it gets back, and how it lands on one of three calls.

All posts