Engineering reference: the photo tagger architecture
Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the S3 and SQS event wiring, the resize step, the DynamoDB schemas, and the Function URL review flow. Read alongside the previous six posts; this one’s the build sheet.
Region and account shape
Default region: ap-southeast-1 (Singapore). Bedrock cross-Region inference and S3 event notifications are all in good shape there, and it keeps data close for an Asia-Pacific SMB. A second region for resilience isn’t worth the extra setup at this volume — the failure mode for a shop is a draft arriving a few minutes late, not a regional outage. One AWS account dedicated to the tagger (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.
Topology
Lambda functions
All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets, Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.
drive-sync— EventBridge Scheduler target, fires every few minutes (rate(5 minutes)). Uses the Google Drive API (service-account credentials in Secrets Manager underpt/drive/sa) to list the watched folder, diff against a small state object, and copy any new image tos3://pt-photo-drop/<file-id>. The same pattern syncs the style and rules docs tos3://pt-rules-source/. Memory: 256 MB. Timeout: 30 s.intake— S3 PUT trigger ons3://pt-photo-drop/. Loads the image, resizes it with Pillow to a bounded max edge (e.g. 1024 px) and writes the copy tos3://pt-resized/. Runs deterministic quality checks — mean luminance, a Laplacian-variance sharpness estimate, pixel dimensions, and aspect ratio — against thresholds froms3://pt-rules-source/rules.json. On pass, enqueues a ready-photo message on thept-readySQS queue; on fail, writes aflaggedrow topt-draftswith the reason and moves the original tos3://pt-flagged/. Pillow is the standard, stable image library in 2026 and well-maintained; if HEIC inputs from newer phones become common, addpillow-heifas a decoder shim rather than swapping the library. Memory: 1024 MB (image work). Timeout: 60 s.reader— SQS event source onpt-ready, batch size 1, with a reserved/maximum-concurrency cap so a burst upload can’t fan out into a Bedrock throttle. Loads the resized copy andstyle.jsonfrom S3, calls Bedrock Claude Haiku 4.5 with vision (anthropic.claude-haiku-4-5-20251001-v1:0viaglobal.anthropic.claude-haiku-4-5-20251001-v1:0) using the Converse API with an image content block, and requests structured output for the five fields plus per-field confidence and anot_a_productflag. Writes apt-draftsrow with the outcome (draft_ready,needs_review, orflagged). On any unhandled error the message is retried, then lands on the DLQ. Memory: 512 MB. Timeout: 60 s. The only Bedrock callsite in the system.notify— DynamoDB Streams trigger onpt-drafts(or a small EventBridge rule on the reader’s completion). For adraft_readyorneeds_reviewrow, formats a review card and emails it via SESSendRawEmailwith links to the approve/edit/reject Function URL endpoints; for aflaggedrow, batches a short daily flag notice instead of one email per flag. Memory: 256 MB. Timeout: 30 s.ack-handler— Lambda Function URL,AuthType: NONE, with a signed-token check on every request (the token is minted into the review-card links bynotifyand verified here). Handles Approve, Edit, and Reject. On approve or edit, writes the listing fields to the store API (or appends a row to the export sheet via the Sheets API) and archives the draft; on reject, moves the photo tos3://pt-flagged/with the chosen reason. Writes an audit row for every action. Memory: 256 MB. Timeout: 15 s.digest— EventBridge Scheduler target, weekly. Readspt-draftsandpt-auditfor the past week; emails a short summary — photos tagged, approved, edited, rejected, and flagged — to a configured address. No Bedrock; a plain summary table. Memory: 256 MB.
Storage
- DynamoDB ·
pt-drafts— one row per photo. PKphoto_id; attributes:source(drive/s3),resized_key,outcome(draft_ready/needs_review/flagged), the five drafted fields, per-fieldconfidence,not_a_product,flag_reason. On-demand. Streams enabled fornotify. - DynamoDB ·
pt-ack— one row per review action. PKphoto_id; sort keyack_ts; attributes:action(approved/edited/rejected),by_user,reject_reason,store_target(api/sheet). On-demand. - DynamoDB ·
pt-audit— one row per write action of any kind. PK(photo_id, ts); attributes:action,by_user,before,after. On-demand. No TTL — this is the long-term audit trail. - S3 ·
pt-photo-drop— original uploads from the Drive lane and direct upload. Versioning enabled. Lifecycle to a cheaper storage class at 30 days; expiry at 2 years. - S3 ·
pt-resized— the small bounded copies the reader actually sends to Bedrock. Lifecycle expiry at 30 days — they’re cheap to regenerate from the original if ever needed. - S3 ·
pt-rules-source— mirroredstyle.jsonandrules.jsonfrom the Drive docs. Versioning enabled. - S3 ·
pt-flagged— photos rejected by the quality gate, the not-a-product check, or a human Reject. Kept for review and possible re-queue.
Bedrock
- Foundation model.
anthropic.claude-haiku-4-5-20251001-v1:0via the Global cross-Region inference profileglobal.anthropic.claude-haiku-4-5-20251001-v1:0. One callsite:reader, with a single vision request per photo via the Converse API. Claude Sonnet 4.6 (anthropic.claude-sonnet-4-6-...) is available as a per-photo escalation if a shop’s catalog has genuinely hard images (fine print on packaging, near-identical variants), gated behind a config flag — but Haiku 4.5 handles the common case and is the default. - Embeddings. Not used. The tagger reads a photo and writes fields; there’s nothing to retrieve. No Knowledge Base, no S3 Vectors, no Titan embeddings.
- Quotas. Default account quotas are more than enough at SMB volume. The SQS concurrency cap on
readerkeeps a burst upload from spiking Bedrock requests past the per-minute limit.
Queue and event wiring
pt-photo-dropS3 notification —s3:ObjectCreated:*on the bucket (or a prefix), target:intakeLambda. Suffix filter on common image extensions so non-image uploads are ignored.pt-readySQS queue — standard queue, visibility timeout > the reader timeout, redrive policy topt-ready-dlqafter 3 receives. Thereaderevent-source mapping uses batch size 1 and a maximum-concurrency setting.pt-ready-dlq— dead-letter queue. A CloudWatch alarm onApproximateNumberOfMessagesVisible > 0pages the admin; messages are re-drivable after a fix.pt-draftsDynamoDB Stream — new-image view, target:notifyLambda, so a freshly written draft triggers the review card without polling.- Scheduler rules —
pt-drive-syncatrate(5 minutes)→drive-sync;pt-weekly-digestatcron(0 18 ? * SUN *)in TZ →digest.
SES and the review surface
- SES outbound for review cards and flag notices: verify a sender identity at
tagger@your-company.comwith DKIM and SPF on the parent domain. Out of sandbox by request. - The review card links carry a short-lived signed token; clicking Approve/Edit/Reject hits the
ack-handlerFunction URL, which verifies the token before doing anything. The same Function URL backs a minimal web review page for clearing a batch in one place. - No inbound SES is needed — photos arrive via Drive or S3, not email — which keeps the mail setup to a single verified outbound identity.
IAM (least privilege per Lambda)
Each Lambda has its own role with policies scoped to exact ARNs. Sketch:
- intake role:
s3:GetObjectonpt-photo-drop;s3:PutObjectonpt-resizedandpt-flagged;s3:GetObjecton therules.jsonkey;sqs:SendMessageonpt-ready;dynamodb:PutItemonpt-drafts. Nobedrock:*. - reader role:
sqs:ReceiveMessage+DeleteMessageonpt-ready;s3:GetObjectonpt-resizedand the style/rules keys;bedrock:InvokeModelon the Haiku ARN (and the Sonnet ARN if the escalation flag is enabled);dynamodb:PutItemonpt-drafts. - notify role:
dynamodb:GetItemonpt-drafts; stream read permissions;ses:SendRawEmailfrom the verified sender;secretsmanager:GetSecretValueon the token-signing secret. - ack-handler role:
dynamodb:PutItemonpt-ackandpt-audit;dynamodb:UpdateItemonpt-drafts;s3:CopyObject+DeleteObjectfor moving topt-flagged;secretsmanager:GetSecretValueon the store-API and Sheets-API secrets; outbound network to the store API host andsheets.googleapis.com. - drive-sync role:
secretsmanager:GetSecretValueon the Google service-account secret;s3:PutObjectonpt-photo-dropandpt-rules-source; outbound network towww.googleapis.com.
Observability and cost gates
- CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on
"error"+"throttle"+"timeout"to a CloudWatch metric for alerting. - Alarms:
pt-ready-dlqdepth > 0; reader Bedrock throttle count > 0 in 5 min (lower the concurrency cap if it fires); ack-handler token-verification failures > 5/hour (might mean the signing secret rotated). - X-Ray: off by default. Not worth the cost at SMB volume.
- AWS Budgets: $25/month threshold, alarm at 80% and 100%, posts to SNS topic
pt-cost-alarmsubscribed to the on-call admin’s email.
Config and secrets
Service-account credentials for the Drive and Sheets APIs live in Secrets Manager under pt/drive/sa. The store-API key lives under pt/store/api; the review-link token-signing secret under pt/token/signing. The resized max edge, the quality thresholds, the confidence threshold, the store target (api or sheet), and the admin notify address all live in Parameter Store under /pt/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.
Deploy
GitHub Actions with OIDC into a deploy role — no long-lived keys — building and deploying with AWS SAM. The opinionated bits: turn on S3 versioning for pt-photo-drop and pt-rules-source so a bad upload or a bad style-doc edit can be rolled back in one click, set the reader maximum concurrency conservatively and raise it once you’ve watched real burst behaviour, and keep the resize step in its own Lambda with more memory so the image work doesn’t bloat the cheaper functions. Total deployable surface: around six Lambdas, three DynamoDB tables, four S3 buckets, one SQS queue plus its DLQ, a couple of Scheduler rules, one SES sender identity, and one Budgets alarm.
That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your shop, see Work with me.
All posts