Part 2 of 7 · Content moderator series ~4 min read

How a comment gets checked

The moderator only checks what reaches it. So the first job is making sure every new comment, review, or post actually arrives — and that the easy calls get made before any model is asked to look. There are three steps an item passes through before the checker ever sees it: it comes in through a webhook, it gets cleaned and stored, and it runs through a fast rule pass. The rule pass settles most items on its own. Only what it can’t settle moves on.

Key takeaways

  • One webhook brings in every item — comments, reviews, and posts arrive the same way.
  • Each item is cleaned and stored once, so there is exactly one record per item.
  • A fast rule pass in plain Python checks banned words, links, length, and known-good authors.
  • Most items are settled by the rule pass with no model call at all.
  • Only the borderline middle moves on to the checker in Part 3.

Three steps before the checker

Three intake steps funnel into one saved item A diagram with three vertical step columns at the top and a single saved-item record at the bottom. Step one, Webhook in: a new comment, review, or post fires a webhook from the platform; it lands on a Lambda Function URL that verifies the signature, so only your real platforms can post items; the raw payload is written to S3. Step two, Clean and store: a small Lambda strips HTML, normalizes the text, pulls out the author, the link list, and the area, and writes one record to the DynamoDB cm-items table keyed by item id, with the original kept in S3 for audit. Step three, Rule pass: a fast deterministic check in plain Python runs against the house-rules lists — is the author on the known-good allow list, does the text trip a banned word, does it carry a link from a blocked domain, is it empty or absurdly long; clearly safe items are marked pass, obvious spam is marked hold, and anything the rules can't settle is marked borderline and handed to the checker. All three steps converge on the same saved item record, which carries the rule-pass result forward. A note at the bottom: the rule pass settles most items for almost nothing — only the borderline middle costs a model call. Step 1 · arrives Webhook in • Platform fires a webhook per item • Function URL verifies signature • Raw payload written to S3 • Only real platforms can post items Step 2 · normalize Clean and store • Strip HTML, normalize text • Pull author, links, and the area • One row in cm-items table • Original kept in S3 for audit Step 3 · fast check Rule pass • Known-good author? banned word? link? • Plain Python, no model • Clear safe → pass, obvious spam → hold • Can't settle → borderline to checker Saved item record (one per item) id · author · text · links · area · platform · rule-pass result · S3 link stored in cm-items — the checker reads from here to checker, borderline only The rule pass settles most items for almost nothing — only the borderline middle costs a model call.
Fig 2. Three steps converge on one saved item. The webhook brings it in, the clean-and-store step makes exactly one record, and the rule pass settles the easy calls. Only the borderline middle moves on to the checker.

Step 1: the webhook brings it in

Every platform that lets people post — your community page, the comment plugin under your blog, your review pages — can fire a webhook: a small message sent to a web address you control whenever something new appears. You point each platform’s webhook at one Lambda Function URL. A Function URL is just a plain web address that runs a small piece of code; there’s no heavy gateway in front of it, which keeps the cost near zero. The first thing that code does is check the signature on the message, so only your real platforms can post items — not a random script that found the address. The raw payload is written to S3 exactly as received, so there’s always an untouched copy to fall back on.

One item, one message. A comment, a review, and a post all look the same once they’re in: an author, some text, maybe a link or two, and a note about which area of your site they belong to.

Step 2: clean and store, exactly once

Raw webhook payloads are messy — HTML tags, tracking junk, odd spacing, the same item sometimes delivered twice. A small Lambda strips the HTML, normalizes the text, and pulls out the parts that matter: the author, the list of links, the length, and the area. It writes one record to the cm-items table in DynamoDB, keyed by a stable item id from the platform. If the same item arrives twice (platforms sometimes retry), the second write lands on the same key and nothing is duplicated. The original payload stays in S3, so if anyone ever asks “what exactly did this person post?” the answer is one click away.

From here on, the rest of the system works off that one clean record. There is exactly one row per item, and any moderator can read it without learning a new tool.

Step 3: the fast rule pass

Before any model is asked to read anything, a fast check runs in plain Python against the house-rules lists. Is the author on the known-good allow list — a long-time member, a verified customer, a teammate? Then pass it; trusted people don’t need a model reading their every word. Does the text trip a banned word from your list? Does it carry a link from a domain you’ve blocked? Is it empty, or a wall of ten thousand characters that’s obviously a paste-bomb? Each of those is a clear signal.

The rule pass ends with one of three labels. Pass — clearly safe, publish it. Hold — obvious spam or a blocked link, keep it out of view and queue it (but never delete it). Borderline — the rules can’t settle it, so hand it to the checker. The whole pass is a few dozen lines of code and costs effectively nothing. At typical volume it settles most items right here, which is the whole point: the model should only ever read the items a simple rule genuinely can’t.

Why everything funnels to one record

Three steps in, but only one place the rest of the system looks: the saved item record. That’s a deliberate constraint. If the webhook code, the cleaner, and the rule pass each kept their own copy of the truth, every “why was this held?” question would mean checking three places. Funneling everything through one record means there is exactly one row per item, carrying its rule-pass result forward, and the checker in Part 3 reads from that and nothing else.

Next post: how the borderline middle gets a verdict — how the checker asks the model, what it gets back, and how it lands on one of three calls.

All posts