How a product photo gets read

Key takeaways

Two intake lanes feed one queue: a Drive folder and a direct S3 drop.
A drive-sync Lambda mirrors new Drive files to S3 every few minutes.
Every photo is resized to a small copy before anything else — the model doesn’t need the full file.
Plain quality checks (too dark, too blurry, too small) reject bad photos with no model call.
Only a photo that passes the checks moves on to the reader in Part 3.

Two lanes, then one resize-and-check

Fig 2. Two lanes converge on one resize-and-check step. The Drive lane and the S3-drop lane both end in the same S3 PUT event; the intake Lambda shrinks the photo and runs plain quality checks. Only a photo that passes becomes a ready-photo record for the reader.

Lane 1: the Drive folder

The simplest lane for most teams. Drop a photo into the shared Drive folder and walk away. A small Lambda — drive-sync — runs every few minutes, looks for files it hasn’t seen before, and copies each one to s3://pt-photo-drop/ using the Google Drive API (its credentials live in Secrets Manager). The copy into S3 fires an S3 PUT event, and that event is what actually starts the tagger. The Drive folder stays the place humans interact with; S3 is where the work happens.

This lane covers the common case: a team that photographs products on a phone or camera, dumps the shots into a shared Drive folder, and doesn’t want to learn anything new. They already use Drive; nothing changes for them.

Lane 2: the direct S3 drop

Some shops already upload images to AWS — their store pulls product images from an S3 bucket, or they have a build step that puts photos there. For them, the Drive hop is pointless. Lane 2 lets a photo be uploaded straight to the same s3://pt-photo-drop/ bucket (or a prefix in it), and the exact same S3 PUT event starts the work. There’s no second code path to maintain — both lanes end at the same event, so everything downstream is identical.

A shop can use one lane, the other, or both. A team might drop phone photos in Drive and have their web build push studio shots straight to S3; both kinds of photo flow through the same checks and the same reader.

Lane 3: resize, then check — before any model

This is the step that earns its keep. When the S3 PUT event fires, a small intake Lambda loads the photo and does two things. First, it makes a smaller copy — a product photo straight off a phone can be several megabytes and many millions of pixels, and the model reads it just as well at a fraction of that size. The smaller copy is cheaper to send, faster to process, and is what every later step uses; the original is kept untouched.

Second, it runs plain quality checks against the thresholds in the rules doc. Is the photo bright enough, or is it nearly black? Is it sharp, or is it a blur? Is it large enough to be a real product shot, or a tiny thumbnail? Is the shape sensible, or is it a long thin banner that’s clearly not a product? These are simple measurements — no model, no AI, just arithmetic on the pixels. A photo that fails is rejected right here, with a reason written to its record (“too dark”), and it never reaches a model. Part 4 goes deeper on flagging; the point for now is that the obvious rejects are caught before anyone pays to read them.

A photo that passes becomes a ready-photo record: the small copy in S3, plus a DynamoDB row with the photo id, where it came from, its size, and its quality result. That record is what the reader picks up in the next post.

Why a record, not an immediate model call

The intake step doesn’t call the model itself — it writes a record and lets the next step pick it up. That’s deliberate. Photos arrive in bursts: a team uploads forty shots at once after a photoshoot. If each upload tried to call the model the instant it landed, a big batch could hit rate limits or pile up. Instead, ready-photo records go onto a queue, and the reader pulls them at a steady pace. A burst of forty photos becomes forty calm records that get read one after another, and a failure on one photo never blocks the rest.

Next post: how the reader takes a ready photo, calls Bedrock vision once, and drafts the five listing fields — title, alt text, tags, category, and description — each with a confidence score.

All posts