Part 3 of 7 · Content moderator series ~5 min read

How a comment gets a verdict

The rule pass from Part 2 already settled the easy items. What’s left is the borderline middle — the things a simple word list can’t judge. Is “this is garbage” a rule-breaking attack or just a blunt opinion? Is a link to a real article spam, or useful? For those, the checker asks Bedrock Haiku 4.5 to read the item against your house rules and return a verdict. The decision is a few clear steps, and the model always says why and how sure it is.

Key takeaways

  • Only borderline items reach the model — the rule pass already settled the rest.
  • House rules live in the Drive doc; a rep can edit them without a deploy.
  • The model returns a verdict, a confidence score, and the exact rule the content may break.
  • Low confidence never auto-acts — it routes the item to a human instead.
  • The model never deletes anything. Its strongest call is “hold for review.”

The decision flow, per item

Decision flow per borderline item at the checker A vertical decision flow diagram. At the top, an input box "Borderline item" with the item's author, text, links, and area carried from the rule pass. Below that, a step "Load the house rules" — reads the rules doc mirrored to S3, including the per-area rules and the worked examples from past moderator corrections. Below that, a check "Trusted author short-circuit?" — if the author is on the known-good list for this area, route to "Pass" without a model call. If not, continue. The next step "Ask Bedrock Haiku 4.5" — sends the item text, the house rules, and a few worked examples, asking for a verdict as JSON only: one of pass, hold, or send-to-human, plus a confidence score from zero to one and the exact rule cited. The next decision "Confidence high enough?" — compares the score against the per-area threshold from the rules doc; if the verdict is confident pass, route to "Pass"; if confident hold for a clear break, route to "Hold"; a severe high-confidence break routes to "Hold + notify" so a moderator is paged at once rather than batched. If confidence is below the threshold, or the model itself returned send-to-human, route to "Send to human" — queue it with the reason and the rule for a person to decide. Each terminal box — Pass, Send to human, Hold, Hold + notify — emits an event to EventBridge with the verdict and the item context. A note at the bottom: the rules doc holds every rule and threshold; the model only applies them — change a rule in the doc and the next item uses the new wording. Borderline item author · text · links · area Step 1 Load the house rules rules + worked examples Step 2 Trusted author? read allow list for area Step 3 Ask Bedrock Haiku 4.5 verdict + confidence + rule Step 4 Which verdict came back? confident pass → pass severe break → hold + notify Step 5 Confidence over threshold? per-area cutoff from doc Pass publish it Send to human borderline, with reason Hold clear break, queued Hold + notify severe, paged now trusted pass severe below clear break The rules doc holds every rule and threshold — change a rule and the next item uses the new wording.
Fig 3. The checker’s decision tree, per borderline item. A trusted author short-circuits to pass; everyone else gets a model verdict with confidence and a rule. Low confidence routes to a human. The rules doc holds every rule; the model only applies them.

House rules: “no spam, no attacks” isn’t magic, it’s in the doc

The rules doc has one short section per area of your site. Each section names the rules in plain prose: “Comments: no promotional links from unknown sites, no personal attacks on other members, no off-topic reselling. Reviews: opinions are fine even if harsh, but no naming staff, no made-up claims. Posts: no recruiting, no spam.” Each rule says what it means and whether breaking it is a hold (clear) or a send-to-human (judgment call). The model is handed this doc verbatim; it doesn’t carry its own idea of what your community allows.

Rules differ by area for a reason. A harsh review is a customer being honest — you want it. The same words aimed at another member under a post are an attack — you don’t. The doc lets you say so, in your own words, and a rep can change the wording any time without a developer. Each section also sets a confidence threshold — how sure the model has to be before its call is acted on automatically rather than sent to a person.

What the model actually returns

The prompt to Bedrock Haiku 4.5 is short and strict: “Here is a comment, here are the house rules for this area, and here are a few past calls a human corrected. Return JSON only. Give a verdict of pass, hold, or send-to-human; a confidence score from 0 to 1; and the exact rule the content may break, quoted from the doc. Do not invent a rule that isn’t there. If you are unsure, return send-to-human.”

That last line matters. The model is told that “I’m not sure” is a perfectly good answer, and unsure means a person looks. There is no pressure to force a confident call. The confidence score is then checked against the per-area threshold: a confident pass publishes, a confident hold queues the item, and anything below the line goes to a human regardless of which way it leaned. The model citing the exact rule is what makes the review card in Part 4 useful — the moderator sees not just “held” but “held because of this rule.”

The calls, and why “hold + notify” exists

Most items land in one of three calls: pass (publish), hold (a clear break — keep it out of view and add it to the review queue for the next batch), or send to a human (borderline — queue it with the reason). There’s a fourth, rarer terminal for severity: hold + notify. A clear, severe break — a credible threat, a doxxing post, hate speech — is still held rather than deleted, but a moderator is paged at once instead of waiting for the next batched digest. Holding the item keeps the decision reversible; the immediate page just gets a human looking sooner. Even here, no content comes down without a person.

Why the model doesn’t read everything

The checker could send every item to the model and skip the rule pass entirely. It doesn’t, for two reasons. First, the rule pass is free and instant, and most items are clearly safe or clearly spam — spending a model call on them is waste. Second, the items a simple rule can settle are exactly the items where a model adds nothing; the model earns its place only on the genuinely ambiguous middle. Keeping the cheap check in front means the bill stays a couple of dollars a month even as the page gets busy.

Next post: how a held or flagged item actually reaches a moderator — who gets which area, how quiet hours and batching keep the pings sane, and the four guardrails on every review card.

All posts