Part 7 of 7 · Photo tagger series ~8 min read

Engineering reference: the photo tagger architecture

Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the S3 and SQS event wiring, the resize step, the DynamoDB schemas, and the Function URL review flow. Read alongside the previous six posts; this one’s the build sheet.

Region and account shape

Default region: ap-southeast-1 (Singapore). Bedrock cross-Region inference and S3 event notifications are all in good shape there, and it keeps data close for an Asia-Pacific SMB. A second region for resilience isn’t worth the extra setup at this volume — the failure mode for a shop is a draft arriving a few minutes late, not a regional outage. One AWS account dedicated to the tagger (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.

Topology

AWS topology of the photo tagger A topology diagram with three regions stacked vertically inside one AWS account boundary. Top region: ingress. Three boxes show how a photo enters and is prepared — a drive-sync Lambda triggered every few minutes by EventBridge Scheduler that mirrors new Drive files to s3://pt-photo-drop/, the same drop bucket reached by a direct S3 upload, and an intake Lambda triggered by the S3 PUT event that resizes the photo with Pillow, runs deterministic quality checks, writes a small copy to s3://pt-resized/, and enqueues a ready-photo message on SQS. Middle region: processing. The reader Lambda consumes from the SQS queue; it loads the small copy and the style doc from s3://pt-rules-source/, calls Bedrock Claude Haiku 4.5 with vision once to draft five fields with confidence scores and a not-a-product flag, then writes a draft row to DynamoDB pt-drafts and routes the outcome — draft ready, needs review, or flagged. Failures land on an SQS dead-letter queue. Bottom region: review and approval. A notify Lambda sends the owner a review card via SES outbound and a simple web page; the owner's Approve, Edit, and Reject button clicks hit a Function URL Lambda ack-handler that, on approve or edit, writes the listing fields to the store API or an export sheet, archives the draft, and on reject moves the photo to s3://pt-flagged/. CloudWatch Logs collects from every Lambda at 7-day retention. Across the right edge: a small box labelled AWS Budgets alarm at $25 monthly threshold, posting to SNS topic pt-cost-alarm. A note at the bottom: nothing reaches the store without a human approval — and every interaction is logged to pt-audit. Ingress Lambda · drive-sync every few min Drive API → s3://pt-photo-drop/ new files only S3 · pt-photo-drop also direct upload event: S3 PUT notify: intake versioning on Lambda · intake Pillow resize copy quality checks, s3://pt-resized/ → SQS ready queue SQS · pt-ready one message per ready photo · DLQ Processing SQS event source batch size 1 max concurrency cap target: reader Lambda + dead-letter queue Lambda · reader loads small copy + style doc from S3 Bedrock vision, drafts five fields DynamoDB · pt-drafts draft_ready needs_review flagged (per-photo outcome) Review & approval Lambda · notify builds review card, SES outbound email + simple web page flag notice too Review card photo + five fields [Approve] [Edit] [Reject] clicks → Function URL Lambda · ack-handler writes pt-audit, on approve/edit writes to store API or export sheet Nothing reaches the store without a human approval — and every interaction is logged to pt-audit.
Fig 7. AWS topology, in three regions of the diagram: ingress (two lanes plus the resize-and-check step), processing (the reader draws from SQS and drafts via Bedrock vision), review and approval (the owner’s decision writes to the store and is recorded). Every Lambda is event- or schedule-driven; nothing is synchronous-chained.

Lambda functions

All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets, Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.

  • drive-sync — EventBridge Scheduler target, fires every few minutes (rate(5 minutes)). Uses the Google Drive API (service-account credentials in Secrets Manager under pt/drive/sa) to list the watched folder, diff against a small state object, and copy any new image to s3://pt-photo-drop/<file-id>. The same pattern syncs the style and rules docs to s3://pt-rules-source/. Memory: 256 MB. Timeout: 30 s.
  • intake — S3 PUT trigger on s3://pt-photo-drop/. Loads the image, resizes it with Pillow to a bounded max edge (e.g. 1024 px) and writes the copy to s3://pt-resized/. Runs deterministic quality checks — mean luminance, a Laplacian-variance sharpness estimate, pixel dimensions, and aspect ratio — against thresholds from s3://pt-rules-source/rules.json. On pass, enqueues a ready-photo message on the pt-ready SQS queue; on fail, writes a flagged row to pt-drafts with the reason and moves the original to s3://pt-flagged/. Pillow is the standard, stable image library in 2026 and well-maintained; if HEIC inputs from newer phones become common, add pillow-heif as a decoder shim rather than swapping the library. Memory: 1024 MB (image work). Timeout: 60 s.
  • reader — SQS event source on pt-ready, batch size 1, with a reserved/maximum-concurrency cap so a burst upload can’t fan out into a Bedrock throttle. Loads the resized copy and style.json from S3, calls Bedrock Claude Haiku 4.5 with vision (anthropic.claude-haiku-4-5-20251001-v1:0 via global.anthropic.claude-haiku-4-5-20251001-v1:0) using the Converse API with an image content block, and requests structured output for the five fields plus per-field confidence and a not_a_product flag. Writes a pt-drafts row with the outcome (draft_ready, needs_review, or flagged). On any unhandled error the message is retried, then lands on the DLQ. Memory: 512 MB. Timeout: 60 s. The only Bedrock callsite in the system.
  • notify — DynamoDB Streams trigger on pt-drafts (or a small EventBridge rule on the reader’s completion). For a draft_ready or needs_review row, formats a review card and emails it via SES SendRawEmail with links to the approve/edit/reject Function URL endpoints; for a flagged row, batches a short daily flag notice instead of one email per flag. Memory: 256 MB. Timeout: 30 s.
  • ack-handler — Lambda Function URL, AuthType: NONE, with a signed-token check on every request (the token is minted into the review-card links by notify and verified here). Handles Approve, Edit, and Reject. On approve or edit, writes the listing fields to the store API (or appends a row to the export sheet via the Sheets API) and archives the draft; on reject, moves the photo to s3://pt-flagged/ with the chosen reason. Writes an audit row for every action. Memory: 256 MB. Timeout: 15 s.
  • digest — EventBridge Scheduler target, weekly. Reads pt-drafts and pt-audit for the past week; emails a short summary — photos tagged, approved, edited, rejected, and flagged — to a configured address. No Bedrock; a plain summary table. Memory: 256 MB.

Storage

  • DynamoDB · pt-drafts — one row per photo. PK photo_id; attributes: source (drive/s3), resized_key, outcome (draft_ready/needs_review/flagged), the five drafted fields, per-field confidence, not_a_product, flag_reason. On-demand. Streams enabled for notify.
  • DynamoDB · pt-ack — one row per review action. PK photo_id; sort key ack_ts; attributes: action (approved/edited/rejected), by_user, reject_reason, store_target (api/sheet). On-demand.
  • DynamoDB · pt-audit — one row per write action of any kind. PK (photo_id, ts); attributes: action, by_user, before, after. On-demand. No TTL — this is the long-term audit trail.
  • S3 · pt-photo-drop — original uploads from the Drive lane and direct upload. Versioning enabled. Lifecycle to a cheaper storage class at 30 days; expiry at 2 years.
  • S3 · pt-resized — the small bounded copies the reader actually sends to Bedrock. Lifecycle expiry at 30 days — they’re cheap to regenerate from the original if ever needed.
  • S3 · pt-rules-source — mirrored style.json and rules.json from the Drive docs. Versioning enabled.
  • S3 · pt-flagged — photos rejected by the quality gate, the not-a-product check, or a human Reject. Kept for review and possible re-queue.

Bedrock

  • Foundation model. anthropic.claude-haiku-4-5-20251001-v1:0 via the Global cross-Region inference profile global.anthropic.claude-haiku-4-5-20251001-v1:0. One callsite: reader, with a single vision request per photo via the Converse API. Claude Sonnet 4.6 (anthropic.claude-sonnet-4-6-...) is available as a per-photo escalation if a shop’s catalog has genuinely hard images (fine print on packaging, near-identical variants), gated behind a config flag — but Haiku 4.5 handles the common case and is the default.
  • Embeddings. Not used. The tagger reads a photo and writes fields; there’s nothing to retrieve. No Knowledge Base, no S3 Vectors, no Titan embeddings.
  • Quotas. Default account quotas are more than enough at SMB volume. The SQS concurrency cap on reader keeps a burst upload from spiking Bedrock requests past the per-minute limit.

Queue and event wiring

  • pt-photo-drop S3 notifications3:ObjectCreated:* on the bucket (or a prefix), target: intake Lambda. Suffix filter on common image extensions so non-image uploads are ignored.
  • pt-ready SQS queue — standard queue, visibility timeout > the reader timeout, redrive policy to pt-ready-dlq after 3 receives. The reader event-source mapping uses batch size 1 and a maximum-concurrency setting.
  • pt-ready-dlq — dead-letter queue. A CloudWatch alarm on ApproximateNumberOfMessagesVisible > 0 pages the admin; messages are re-drivable after a fix.
  • pt-drafts DynamoDB Stream — new-image view, target: notify Lambda, so a freshly written draft triggers the review card without polling.
  • Scheduler rulespt-drive-sync at rate(5 minutes)drive-sync; pt-weekly-digest at cron(0 18 ? * SUN *) in TZ → digest.

SES and the review surface

  • SES outbound for review cards and flag notices: verify a sender identity at tagger@your-company.com with DKIM and SPF on the parent domain. Out of sandbox by request.
  • The review card links carry a short-lived signed token; clicking Approve/Edit/Reject hits the ack-handler Function URL, which verifies the token before doing anything. The same Function URL backs a minimal web review page for clearing a batch in one place.
  • No inbound SES is needed — photos arrive via Drive or S3, not email — which keeps the mail setup to a single verified outbound identity.

IAM (least privilege per Lambda)

Each Lambda has its own role with policies scoped to exact ARNs. Sketch:

  • intake role: s3:GetObject on pt-photo-drop; s3:PutObject on pt-resized and pt-flagged; s3:GetObject on the rules.json key; sqs:SendMessage on pt-ready; dynamodb:PutItem on pt-drafts. No bedrock:*.
  • reader role: sqs:ReceiveMessage + DeleteMessage on pt-ready; s3:GetObject on pt-resized and the style/rules keys; bedrock:InvokeModel on the Haiku ARN (and the Sonnet ARN if the escalation flag is enabled); dynamodb:PutItem on pt-drafts.
  • notify role: dynamodb:GetItem on pt-drafts; stream read permissions; ses:SendRawEmail from the verified sender; secretsmanager:GetSecretValue on the token-signing secret.
  • ack-handler role: dynamodb:PutItem on pt-ack and pt-audit; dynamodb:UpdateItem on pt-drafts; s3:CopyObject + DeleteObject for moving to pt-flagged; secretsmanager:GetSecretValue on the store-API and Sheets-API secrets; outbound network to the store API host and sheets.googleapis.com.
  • drive-sync role: secretsmanager:GetSecretValue on the Google service-account secret; s3:PutObject on pt-photo-drop and pt-rules-source; outbound network to www.googleapis.com.

Observability and cost gates

  • CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on "error" + "throttle" + "timeout" to a CloudWatch metric for alerting.
  • Alarms: pt-ready-dlq depth > 0; reader Bedrock throttle count > 0 in 5 min (lower the concurrency cap if it fires); ack-handler token-verification failures > 5/hour (might mean the signing secret rotated).
  • X-Ray: off by default. Not worth the cost at SMB volume.
  • AWS Budgets: $25/month threshold, alarm at 80% and 100%, posts to SNS topic pt-cost-alarm subscribed to the on-call admin’s email.

Config and secrets

Service-account credentials for the Drive and Sheets APIs live in Secrets Manager under pt/drive/sa. The store-API key lives under pt/store/api; the review-link token-signing secret under pt/token/signing. The resized max edge, the quality thresholds, the confidence threshold, the store target (api or sheet), and the admin notify address all live in Parameter Store under /pt/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.

Deploy

GitHub Actions with OIDC into a deploy role — no long-lived keys — building and deploying with AWS SAM. The opinionated bits: turn on S3 versioning for pt-photo-drop and pt-rules-source so a bad upload or a bad style-doc edit can be rolled back in one click, set the reader maximum concurrency conservatively and raise it once you’ve watched real burst behaviour, and keep the resize step in its own Lambda with more memory so the image work doesn’t bloat the cheaper functions. Total deployable surface: around six Lambdas, three DynamoDB tables, four S3 buckets, one SQS queue plus its DLQ, a couple of Scheduler rules, one SES sender identity, and one Budgets alarm.

That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your shop, see Work with me.

All posts