Part 4 of 7 · Photo tagger series ~5 min read

How a bad photo gets flagged

Not every file that lands in the folder is a good product photo. Somebody drops a screenshot by mistake. A photo comes out too dark to use. A shot of an empty box gets uploaded with the real ones. The wrong thing slips into the batch. The tagger’s job here is simple to say and important to get right: catch those before they become a tidy-looking draft that sails through review. It does that in two layers — a plain quality gate before any model, and the model’s own check on whether the photo even looks like a product.

Key takeaways

  • Layer one is plain code: too dark, too blurry, too small, or odd shape — rejected before any model.
  • Layer two is the model’s not-a-product check: screenshots, receipts, people, empty boxes.
  • Low confidence on the fields is treated as a flag, not a draft to clean up later.
  • A flagged photo goes to a flagged folder with a reason; a person decides what to do.
  • Nothing flagged ever reaches the store — flagging is a stop, not a slow-down.

Two layers between a photo and a draft

Two layers and two checks between a photo and a clean draft A horizontal flow diagram. On the far left, a "Photo in" box: a new photo that has just been resized and is about to be judged. Four gates sit in a row to the right, each drawn as a vertical bar. Gate 1: Quality gate — plain code measures brightness, sharpness, size, and shape against the thresholds in the rules doc; a photo that is too dark, too blurry, too small, or an odd banner shape is rejected here with no model call. Gate 2: Not-a-product — the vision model is asked whether the image is a clean product shot; it sets a flag if it sees a screenshot, a receipt, a person, an empty box, or anything that isn't an item for sale. Gate 3: Confidence — the per-field confidence scores from the draft are compared to the threshold; if the model wasn't sure what the item is, what colour it is, or which category it fits, that counts as a flag rather than a draft to fix later. Gate 4: Route the flag — the photo is moved to a flagged folder in S3, a row is written with the reason (too dark, screenshot, low confidence, and so on), and the owner is notified so a person decides. After all four gates, a photo that passed every one becomes a clean draft for review; a photo that failed any gate becomes a flagged item for a human. A note at the bottom: a flag is a full stop, not a maybe — nothing flagged is written to the store. Photo in resized, about to be judged Gate 1 Quality gate bright enough? sharp enough? big enough? plain code, no model Gate 2 Not a product model checks: a real item? screenshot, receipt, person, empty box? Gate 3 Confidence check scores above threshold? unsure = flag, not a draft to fix Gate 4 Route the flag move to flagged folder + write the reason, notify owner Passed every gate — clean draft for review · failed any gate — flagged for a human flagged folder in S3 · reason recorded · owner sees a short notice every flag logged to DDB pt-audit — the trail says why it stopped A flag is a full stop, not a maybe — nothing flagged is ever written to the store.
Fig 4. Two layers, four gates, between a photo and a clean draft. Plain code catches the unusable ones; the model catches the wrong-looking ones; low confidence is treated as a flag; the flag is routed to a human with a reason. Nothing flagged reaches the store.

Gate 1: the quality gate (plain code, no model)

This is the same gate introduced in Part 2, and it runs first because it’s the cheapest. Plain code measures a few things about the photo and compares them to the thresholds in the rules doc. Brightness: is the average light level reasonable, or is the photo nearly black or blown-out white? Sharpness: does the image have real detail, or is it a smear? Size: is it big enough to be a real product shot, or a tiny thumbnail? Shape: is it a sensible rectangle, or a long thin banner that’s obviously not a product photo? A photo that fails any of these is rejected here, with the exact reason recorded, and no model is ever called on it. The thresholds are numbers in a doc, so a shop with a darker product style can loosen the brightness floor without a deploy.

Gate 2: is this even a product?

A photo can be perfectly bright and sharp and still be the wrong thing. A screenshot of an order confirmation is a clean, crisp image. So is a photo of a receipt, a photo of a person, or a photo of an empty box. The quality gate has no way to know those are wrong — they pass every measurement. So the reader’s prompt (from Part 3) explicitly asks the model: is this a clean product shot of an item for sale? If not, set the not-a-product flag and say what it looks like instead. When that flag comes back set, the photo is treated as flagged no matter how confident the other fields look.

This is the one place the model is used as a check rather than a drafter, and it’s worth it: spotting “this is a screenshot, not a mug” is exactly the kind of judgement plain code can’t make and a vision model can.

Gate 3: low confidence is a flag, not a fixer-upper

Part 3 showed the model returns a confidence score on each field. Gate 3 uses those scores as a safety check, not just a hint for the reviewer. If the model wasn’t sure what the item is, couldn’t tell the colour, or couldn’t place it in any of your categories, the photo is flagged for a human rather than presented as a clean draft with a few weak spots. The line between “needs a closer look” and “flag it” is a threshold in the rules doc, so a cautious shop can set it strict and a high-volume shop can set it looser.

The reason this matters: a draft that looks finished is the one a busy owner approves without reading. By turning genuine uncertainty into a flag instead of a tidy-looking draft, the system makes sure the photos that need human eyes actually get them.

Gate 4: route the flag to a person

A flagged photo doesn’t just disappear. Gate 4 moves it to a flagged folder in S3, writes a row with the reason (“too dark,” “screenshot,” “low confidence on colour”), and notifies the owner with a short notice — not a full review card, just “three photos were flagged today, here’s why.” A person then decides: retake the photo, delete it, or override the flag and tag it by hand. Nothing about a flag is silent, and nothing flagged is ever written to the store.

Why a flag is a stop, not a slow-down

It would be easy to let a borderline photo through with a “maybe” label and hope the reviewer catches it. The design says no on purpose. The whole value of the tagger is that the owner can trust the drafts — glance, approve, move on. The moment a few bad drafts slip through, the owner has to start reading every one carefully again, and the time savings evaporate. Treating a flag as a hard stop keeps the clean drafts genuinely clean, so the fast path stays fast. The cost of a flag is one photo a human has to look at; the cost of a false “looks fine” is the owner’s trust in the whole system.

Next post: how a listing gets approved — the three actions on every clean draft card, and how each one updates the store, the draft state, and the audit trail.

All posts