Engineering reference: the translation relay architecture
Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the SES inbound rule set, the SQS lanes, the DynamoDB schemas, and the masking pipeline. Read alongside the previous six posts; this one’s the build sheet.
Region and account shape
Default region: ap-southeast-1 (Singapore). SES inbound, Bedrock cross-Region inference, and Lambda Function URLs are all in good shape there, and it keeps round-trip latency low for an Asia-Pacific customer base; swap to whichever region is closest to your buyers. Bedrock calls go through the Global cross-Region inference profile, so capacity isn’t pinned to one region. One AWS account dedicated to the relay keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system. No VPC; every Lambda runs with default networking and reaches AWS service endpoints and the Google APIs over the public internet.
Topology
Lambda functions
All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB, 512 MB for the model-calling functions), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.
intake-ses— S3 PUT trigger ons3://tr-raw-mime/. Parses the MIME, extracts the plain-text body (falls back to HTML stripped to text), strips the signature, quoted history, and footers, resolves the thread by sender + normalized subject, and writes the cleaned message totr-threads. Then invokesdetectasynchronously. Memory: 256 MB. Timeout: 30 s.intake-web— Lambda Function URL,AuthType: NONEwith a per-site signed widget token verified in-handler and CORS locked to your domains. Accepts{thread_id, text}from the chat widget, applies the same cleanup, writes totr-threads, and invokesdetect. Memory: 256 MB. Timeout: 15 s.drive-sync— EventBridge Scheduler target, every 15 minutes. Uses the Google Drive + Sheets API (service-account credentials in Secrets Manager undertr/drive/sa) to export the glossary sheet as CSV and the voice note as text, writing tos3://tr-glossary-source/only when changed. Memory: 256 MB. Timeout: 30 s.detect— invoked by the intake Lambdas. Runs a cheap script-and-statistics language check; for short, mixed, or code-heavy text, calls Bedrock Haiku 4.5 as a fallback. Writeslanguageandlanguage_confidenceto the thread. If confidence stays below threshold, marks the turnlanguage_unclearfor human review instead of enqueuing. Otherwise sends a message to thetr-translateSQS queue. Memory: 512 MB. Timeout: 30 s.translate-in— SQS event source ontr-translate(batch size 1, partial-batch responses on). Loads the glossary froms3://tr-glossary-source/, masks protected terms, numbers, prices, and IDs (Part 5), calls Bedrock Haiku 4.5 for the per-sentence translation + confidence, re-runs sub-threshold passages on Bedrock Sonnet 4.6, restores the masked spans, and writes the staff-facing translation and per-passage confidence totr-threads. Logs each mask/restore swap totr-audit. Memory: 512 MB. Timeout: 60 s.translate-back— Lambda Function URL, invoked by the staff prepare action (Slack-style signed request or the internal review UI session). Masks the reply, translates into the thread’s customer language with Haiku 4.5 + Sonnet 4.6, runs the round-trip check (translate the result back to the staff language), restores terms, and stores the prepared reply + read-back on the thread. Does not send. Memory: 512 MB. Timeout: 60 s.send— Lambda Function URL, invoked only by the human approve action. Verifies the prepared-reply hash matches what the agent approved (so an edit-after-prepare can’t slip through unreviewed), then delivers: SESSendRawEmailfor an email thread, or returns the reply to the widget poll for a chat thread. Writes the final reply (both languages) totr-threadsand anaction: sentrow totr-audit. Memory: 256 MB. Timeout: 15 s.summary— EventBridge Scheduler target, weekly. Reads the week’str-threadsandtr-audit; calls Bedrock Haiku 4.5 to write a short report (volume by language, share of passages that needed Sonnet, count of human-flagged messages); emails it via SES to the configured stakeholder list. Memory: 512 MB.
Storage
- DynamoDB ·
tr-threads— one item per conversation turn. PKthread_id; sort keyturn_ts; attributes:channel(email/web),direction(in/out),customer_lang,team_lang,original_text,translated_text,passage_confidence(list),status. On-demand. - DynamoDB ·
tr-swaps— one row per masked span. PK(thread_id, turn_ts); sort keyplaceholder; attributes:kind(glossary/number/id),original,restored. On-demand. Proves no figure changed in translation. - DynamoDB ·
tr-audit— one row per action of any kind (translate, prepare, edit, approve, send, flag). PK(thread_id, ts); attributes:action,by_user,model,before,after. On-demand. No TTL — long-term audit trail. - S3 ·
tr-glossary-source— mirrored glossary CSV and voice note as plain text. Versioning enabled, so a bad glossary edit rolls back in one click. - S3 ·
tr-raw-mime— raw inbound email MIME. Lifecycle to Glacier at 30 days; expiry at 2 years.
Bedrock
- Cheap path.
anthropic.claude-haiku-4-5-20251001-v1:0via the Global cross-Region inference profileglobal.anthropic.claude-haiku-4-5-20251001-v1:0. Callsites:detectfallback,translate-in,translate-back, the round-trip check, andsummary. - Heavy path.
anthropic.claude-sonnet-4-6-20250115-v1:0viaglobal.anthropic.claude-sonnet-4-6-20250115-v1:0, called only on sub-threshold passages fromtranslate-inandtranslate-back— never on a whole message. - Embeddings. Not used. The relay translates; it doesn’t retrieve. No Knowledge Base, no S3 Vectors. (Titan Text Embeddings V2 would be the choice if a future phrase-memory cache needed semantic lookup, but exact-match caching covers the common case at lower cost.)
- Prompts. Strict and short: preserve placeholders verbatim, translate faithfully and plainly, return per-sentence confidence as JSON. Temperature near zero for determinism.
SQS lanes
tr-translate— standard queue betweendetectandtranslate-in. Decouples a burst of inbound messages from the rate of model calls, so a spike never throttles Bedrock or drops a message. Visibility timeout 90 s (six times the consumer’s typical runtime). Max receive count 3.tr-translate-dlq— dead-letter queue fortr-translate. Anything that fails three times lands here with the full context for inspection; a CloudWatch alarm on queue depth > 0 pages the on-call admin. Most DLQ entries are a malformed message or a transient Bedrock throttle, both safe to replay.
SES inbound and outbound
- Set the MX record on a dedicated subdomain (e.g.
support.your-company.com) toinbound-smtp.ap-southeast-1.amazonaws.com. - SES inbound rule set
tr-inbound-rules: one rule with recipientsupport@your-company.com→ spam scan → S3 PUT tos3://tr-raw-mime/<message-id>→ stop. The S3 PUT triggersintake-ses. - SES outbound for replies: verify a sender identity at
support@your-company.comwith DKIM and SPF on the parent domain, and set a customReply-Toso the customer’s next message threads back in. Out of sandbox by request.
IAM (least privilege per Lambda)
Each Lambda has its own role with policies scoped to exact ARNs. Sketch:
- detect role:
dynamodb:UpdateItemontr-threads;sqs:SendMessageontr-translate;bedrock:InvokeModelon the Haiku ARN only. No Sonnet, no SES. - translate-in role:
sqs:ReceiveMessage+DeleteMessageontr-translate;s3:GetObjecton the glossary bucket;bedrock:InvokeModelon the Haiku and Sonnet ARNs;dynamodb:PutItemontr-threads,tr-swaps,tr-audit. - translate-back role: same Bedrock + glossary access as translate-in;
dynamodb:PutItemontr-threads,tr-swaps,tr-audit. No SES — it cannot send, only prepare. - send role:
ses:SendRawEmailfrom the verified sender identity;dynamodb:PutItemontr-threadsandtr-audit;dynamodb:GetItemto verify the approved-reply hash. Nobedrock:*— the send path never calls a model. - drive-sync role:
secretsmanager:GetSecretValueon the Google service-account secret;s3:PutObjecton the glossary bucket; outbound network towww.googleapis.com.
Masking pipeline
The mask/restore code is a shared library imported by translate-in and translate-back, so both directions behave identically. Order matters: IDs and codes are matched first (most specific), then currency and number patterns, then glossary terms (longest match first to avoid partial hits). Each match is replaced by a typed placeholder — [[ID_1]], [[NUM_2]], [[TERM_3]] — and recorded in an in-memory map. After the model returns, the restorer validates that every placeholder it emitted still exists exactly once (a dropped or duplicated placeholder fails the turn and routes to human review), then swaps each back to the original value and writes the swap to tr-swaps. Number and ID patterns are unit-tested against a fixture of locale edge cases (comma/period decimals, hash-prefixed orders, alphanumeric SKUs) so a regex change can’t silently weaken the guardrail.
Observability and cost gates
- CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on
"error"+"throttle"+"placeholder_mismatch"to a CloudWatch metric for alerting. - Alarms:
tr-translate-dlqdepth > 0;translate-inerror rate > 1% in 24h;placeholder_mismatch> 0 (a masking bug is a money bug); Bedrock throttle count rising. - X-Ray: off by default. Not worth the cost at SMB volume.
- AWS Budgets: $25/month threshold, alarm at 80% and 100%, posts to SNS topic
tr-cost-alarmsubscribed to the on-call admin’s email.
Config and secrets
Google service-account credentials for Drive and Sheets live in Secrets Manager under tr/drive/sa. The widget signing key is under tr/widget/key; the SES sender identity lives in IAM and the verified-domain config. The team’s working language, the confidence thresholds for Sonnet escalation and for human-flagging, the list of glossary categories, and the customer-facing “from” name all live in Parameter Store under /tr/config/. Lambdas fetch config on cold start and cache it for the lifetime of the execution environment.
Deploy
GitHub Actions with OIDC into a deploy role (no long-lived keys) and AWS SAM for the stack. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for tr-glossary-source so a bad glossary edit can be rolled back in one click, and gate deploys on the masking library’s unit tests passing — that test suite is the thing standing between a regex tweak and a changed price in a customer’s inbox. Total deployable surface: around eight Lambdas, three DynamoDB tables, two S3 buckets, one SQS queue plus its DLQ, one SES rule set, a few Function URLs, and one Budgets alarm.
That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.
All posts