Part 1 of 7 · Photo tagger series ~5 min read

A product photo tagger on AWS for a few dollars a month

An online shop takes a hundred photos of new stock and then sits down to the part nobody enjoys: typing a title for each one, writing alt text so the listing is readable for shoppers using a screen reader, picking tags and a category, and writing a short description. It is hours of repetitive work, and the results drift — one person writes “blue mug,” the next writes “Ceramic Coffee Cup — Navy,” and the catalog gets harder to search. This post walks through the design of a small tagger that looks at each new photo, drafts those details in one consistent voice, and hands them to the owner to approve. It never publishes on its own, and it flags any photo that looks wrong.

Key takeaways

  • Two ways a photo gets in: a Drive folder and a direct S3 drop. A new file starts the work.
  • A cheap deterministic step resizes the photo and runs quality checks before any model looks at it.
  • One Bedrock vision call drafts five fields: title, alt text, tags, category, and a short description.
  • Every draft waits for a human. Approve writes it to your store; nothing goes live on its own.
  • Designed on AWS for about $2.40/month at typical small-business volume.

The whole system on one page

Before any code, here’s the shape of what we’re designing.

System architecture: two sources, three pieces inside AWS At the top, three external boxes in a row. Far left, "New photos" — a Google Drive folder where new product photos land, plus a direct S3 drop folder; either one adds a photo to be tagged. Centre, "Style and rules" — a Drive folder with a style doc covering title format, tag list, category list, and the words to prefer or avoid, plus the quality thresholds for rejecting a bad photo. Far right, "Owner" — the shop owner or merchandiser who reviews each draft; review cards land in their email or a simple web page. Each connects via an arrow to the AWS account container below. New photos have an outgoing arrow into AWS. Style and rules feed in to ground every draft. The owner receives a review card with the photo, the drafted title, alt text, tags, category, description, and Approve, Edit, and Reject buttons. Inside the AWS account are three components in a row, mirroring the layout above. On the left, the Photo intake — receives a photo from either source, resizes it, runs plain quality checks, and rejects anything too dark, blurry, or small. In the middle, the Reader — calls Bedrock vision once to read the photo and draft the five listing fields, each with a confidence score. On the right, the Review — shows the draft to the owner, writes the approved fields to the store or an export sheet, and flags anything wrong for a human. Internal arrows flow left to right. A note at the bottom reads: nothing reaches the store without a human approval — the tagger drafts, the owner decides. New photos Drive folder or S3 drop Style and rules format, tags, categories Owner where drafts land photos in grounds draft to review AWS account Photo intake resize, quality check, reject bad images Reader drafts five fields: title, alt, tags, cat, desc Review owner approves, writes to the store photo draft Nothing reaches the store without a human approval — the tagger drafts, the owner decides.
Fig 1. Two sources outside, three pieces inside AWS. Photos flow in from a Drive folder or a direct S3 drop. The Reader drafts five listing fields. Review shows the draft to the owner and writes the approved fields to the store.

What you set up once (the outside)

  • New photos. A Google Drive folder where your team drops new product shots, or a direct S3 drop folder if you already upload to AWS. Either way, a new file is the trigger — the moment a photo lands, the tagger picks it up. You don’t fill in a form or press a button; you just put the photo where it goes. Part 2 covers both lanes and the resize-and-check step that runs before anything else.
  • A style and rules folder. Two short Google Docs in a Drive folder. The style doc covers how a title should read (for example, “Material + Product + Colour”), the list of tags and categories you actually use, and the words to prefer or avoid (your brand says “navy,” not “dark blue”). The rules doc holds the quality thresholds — how dark, how blurry, or how small a photo can be before it’s rejected — and the list of things that mean “this isn’t a product shot,” like a screenshot or a receipt. A rep can edit either doc without a deploy.
  • Owner. The person who reviews each draft — usually the shop owner or whoever runs the catalog. Each draft lands as a card with the photo and the five drafted fields, plus three buttons: Approve, Edit, and Reject. The card arrives by email and on a simple web page, so the owner can clear a batch in a few minutes.

What runs on every photo (the inside)

  • The photo intake. When a photo lands, a small Lambda makes a smaller copy of it (a full-size photo is far bigger than the model needs), then runs plain quality checks: is it bright enough, sharp enough, and large enough to be a real product shot? Too dark, too blurry, or too small, and the photo is rejected straight away with a reason — no model is called, because there’s no point reading an image that’s clearly unusable. Good photos move on to the reader.
  • The reader. One call to Bedrock Claude Haiku 4.5 with vision (a model that can look at a picture, not just text). It reads the photo and drafts five fields: a clear title, alt text that describes the item for a shopper using a screen reader, suggested tags, a category, and a short description — all in the voice your style doc sets. The model also marks how confident it is in each field and says if the photo doesn’t look like a clean product shot. The reader never invents a detail it can’t see; if it can’t tell the colour, it says so rather than guessing.
  • Review. The draft is stored and shown to the owner as a card. Approve writes the fields to your store (or an export sheet you import later) and archives the draft. Edit opens a form pre-filled with the draft so the owner can fix a word and then approve. Reject sends the photo to a flagged folder with a reason. Anything the reader marked low-confidence or wrong-looking is held back for a human instead of slipping through. Every action is logged, so a mistake can be undone and the trail is clear.

In plain words

Your team photographs a new ceramic mug in navy. The photo lands in the Drive folder at 2pm. The intake shrinks it and checks it — bright, sharp, big enough — so it passes. The reader looks at it and drafts: title “Ceramic Mug — Navy,” alt text “Navy ceramic coffee mug with a curved handle, shown on a white background,” tags “mug, ceramic, navy, kitchen, drinkware,” category “Drinkware,” and a two-line description in your house voice. The owner gets a card at 2:01pm, glances at it, fixes “navy” to “midnight” because that’s the actual product name, and taps Approve. The fields are written to the store. The whole thing took the owner ten seconds instead of three minutes, and the next twenty mugs read the same way.

The cost of running this is about $2.40 a month at SMB volume. The cost of not running it is the afternoon someone loses typing listings by hand, the inconsistent titles that hurt search, and the listings that quietly ship with no alt text at all — which both shoppers using screen readers and search engines notice.

Design rules that shaped every decision

  • Nothing reaches the store without a human approval. The tagger drafts; the owner decides.
  • Cheap checks first. A bad photo is rejected by plain code before any model is paid to read it.
  • The model never invents what it can’t see. Unsure means “say so,” not “guess.”
  • Wrong-looking images are flagged for a human, not drafted as if they were fine.
  • The style lives in a doc. Changing a tag list or a title format doesn’t need a deploy.
  • Every action is logged. Approve, edit, or reject — the trail is there to audit later.

Why this shape

Most shops handle photo listings in one of two ways: somebody types every field by hand, or somebody pastes the same template and barely edits it. The hand-typing is slow and drifts — ten people produce ten styles, and the catalog gets harder to search every week. The pasted template is fast and useless — every mug gets the same generic blurb and no real alt text, so the listing is worse than if someone had thought about it for thirty seconds.

The setup above puts a careful first draft in front of the owner for every photo, written in one consistent voice from the shop’s own style doc. The owner is still in charge — they read, fix, and approve — but they start from something good instead of a blank field. The boring 90% is done; the human spends their time on the 10% that actually needs judgement. And because a wrong or low-quality photo is caught before it ever becomes a draft, the owner isn’t wading through nonsense to find the real work.

The next four posts walk through each piece in turn: how a product photo gets read, how the details get drafted, how a bad photo gets flagged, and how a listing gets approved. One diagram per post. A cost breakdown and a final engineering reference at the end.

All posts