Engineering reference: the translation relay architecture

Region and account shape

Default region: ap-southeast-1 (Singapore). SES inbound, Bedrock cross-Region inference, and Lambda Function URLs are all in good shape there, and it keeps round-trip latency low for an Asia-Pacific customer base; swap to whichever region is closest to your buyers. Bedrock calls go through the Global cross-Region inference profile, so capacity isn’t pinned to one region. One AWS account dedicated to the relay keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system. No VPC; every Lambda runs with default networking and reaches AWS service endpoints and the Google APIs over the public internet.

Topology

Fig 7. AWS topology, in three regions of the diagram: ingress (two channels in plus the glossary sync), detect and translate-in (the queue decouples bursts from model calls), translate-back and send (the human approves and the reply ships). Every Lambda is event-, queue-, or request-driven; nothing is synchronous-chained across model calls.

Lambda functions

All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB, 512 MB for the model-calling functions), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.

intake-ses — S3 PUT trigger on s3://tr-raw-mime/. Parses the MIME, extracts the plain-text body (falls back to HTML stripped to text), strips the signature, quoted history, and footers, resolves the thread by sender + normalized subject, and writes the cleaned message to tr-threads. Then invokes detect asynchronously. Memory: 256 MB. Timeout: 30 s.
intake-web — Lambda Function URL, AuthType: NONE with a per-site signed widget token verified in-handler and CORS locked to your domains. Accepts {thread_id, text} from the chat widget, applies the same cleanup, writes to tr-threads, and invokes detect. Memory: 256 MB. Timeout: 15 s.
drive-sync — EventBridge Scheduler target, every 15 minutes. Uses the Google Drive + Sheets API (service-account credentials in Secrets Manager under tr/drive/sa) to export the glossary sheet as CSV and the voice note as text, writing to s3://tr-glossary-source/ only when changed. Memory: 256 MB. Timeout: 30 s.
detect — invoked by the intake Lambdas. Runs a cheap script-and-statistics language check; for short, mixed, or code-heavy text, calls Bedrock Haiku 4.5 as a fallback. Writes language and language_confidence to the thread. If confidence stays below threshold, marks the turn language_unclear for human review instead of enqueuing. Otherwise sends a message to the tr-translate SQS queue. Memory: 512 MB. Timeout: 30 s.
translate-in — SQS event source on tr-translate (batch size 1, partial-batch responses on). Loads the glossary from s3://tr-glossary-source/, masks protected terms, numbers, prices, and IDs (Part 5), calls Bedrock Haiku 4.5 for the per-sentence translation + confidence, re-runs sub-threshold passages on Bedrock Sonnet 4.6, restores the masked spans, and writes the staff-facing translation and per-passage confidence to tr-threads. Logs each mask/restore swap to tr-audit. Memory: 512 MB. Timeout: 60 s.
translate-back — Lambda Function URL, invoked by the staff prepare action (Slack-style signed request or the internal review UI session). Masks the reply, translates into the thread’s customer language with Haiku 4.5 + Sonnet 4.6, runs the round-trip check (translate the result back to the staff language), restores terms, and stores the prepared reply + read-back on the thread. Does not send. Memory: 512 MB. Timeout: 60 s.
send — Lambda Function URL, invoked only by the human approve action. Verifies the prepared-reply hash matches what the agent approved (so an edit-after-prepare can’t slip through unreviewed), then delivers: SES SendRawEmail for an email thread, or returns the reply to the widget poll for a chat thread. Writes the final reply (both languages) to tr-threads and an action: sent row to tr-audit. Memory: 256 MB. Timeout: 15 s.
summary — EventBridge Scheduler target, weekly. Reads the week’s tr-threads and tr-audit; calls Bedrock Haiku 4.5 to write a short report (volume by language, share of passages that needed Sonnet, count of human-flagged messages); emails it via SES to the configured stakeholder list. Memory: 512 MB.

Storage

DynamoDB · tr-threads — one item per conversation turn. PK thread_id; sort key turn_ts; attributes: channel (email/web), direction (in/out), customer_lang, team_lang, original_text, translated_text, passage_confidence (list), status. On-demand.
DynamoDB · tr-swaps — one row per masked span. PK (thread_id, turn_ts); sort key placeholder; attributes: kind (glossary/number/id), original, restored. On-demand. Proves no figure changed in translation.
DynamoDB · tr-audit — one row per action of any kind (translate, prepare, edit, approve, send, flag). PK (thread_id, ts); attributes: action, by_user, model, before, after. On-demand. No TTL — long-term audit trail.
S3 · tr-glossary-source — mirrored glossary CSV and voice note as plain text. Versioning enabled, so a bad glossary edit rolls back in one click.
S3 · tr-raw-mime — raw inbound email MIME. Lifecycle to Glacier at 30 days; expiry at 2 years.

Bedrock

Cheap path. anthropic.claude-haiku-4-5-20251001-v1:0 via the Global cross-Region inference profile global.anthropic.claude-haiku-4-5-20251001-v1:0. Callsites: detect fallback, translate-in, translate-back, the round-trip check, and summary.
Heavy path. anthropic.claude-sonnet-4-6-20250115-v1:0 via global.anthropic.claude-sonnet-4-6-20250115-v1:0, called only on sub-threshold passages from translate-in and translate-back — never on a whole message.
Embeddings. Not used. The relay translates; it doesn’t retrieve. No Knowledge Base, no S3 Vectors. (Titan Text Embeddings V2 would be the choice if a future phrase-memory cache needed semantic lookup, but exact-match caching covers the common case at lower cost.)
Prompts. Strict and short: preserve placeholders verbatim, translate faithfully and plainly, return per-sentence confidence as JSON. Temperature near zero for determinism.

SQS lanes

tr-translate — standard queue between detect and translate-in. Decouples a burst of inbound messages from the rate of model calls, so a spike never throttles Bedrock or drops a message. Visibility timeout 90 s (six times the consumer’s typical runtime). Max receive count 3.
tr-translate-dlq — dead-letter queue for tr-translate. Anything that fails three times lands here with the full context for inspection; a CloudWatch alarm on queue depth > 0 pages the on-call admin. Most DLQ entries are a malformed message or a transient Bedrock throttle, both safe to replay.

SES inbound and outbound

Set the MX record on a dedicated subdomain (e.g. support.your-company.com) to inbound-smtp.ap-southeast-1.amazonaws.com.
SES inbound rule set tr-inbound-rules: one rule with recipient support@your-company.com → spam scan → S3 PUT to s3://tr-raw-mime/<message-id> → stop. The S3 PUT triggers intake-ses.
SES outbound for replies: verify a sender identity at support@your-company.com with DKIM and SPF on the parent domain, and set a custom Reply-To so the customer’s next message threads back in. Out of sandbox by request.

IAM (least privilege per Lambda)

Each Lambda has its own role with policies scoped to exact ARNs. Sketch:

detect role: dynamodb:UpdateItem on tr-threads; sqs:SendMessage on tr-translate; bedrock:InvokeModel on the Haiku ARN only. No Sonnet, no SES.
translate-in role: sqs:ReceiveMessage + DeleteMessage on tr-translate; s3:GetObject on the glossary bucket; bedrock:InvokeModel on the Haiku and Sonnet ARNs; dynamodb:PutItem on tr-threads, tr-swaps, tr-audit.
translate-back role: same Bedrock + glossary access as translate-in; dynamodb:PutItem on tr-threads, tr-swaps, tr-audit. No SES — it cannot send, only prepare.
send role: ses:SendRawEmail from the verified sender identity; dynamodb:PutItem on tr-threads and tr-audit; dynamodb:GetItem to verify the approved-reply hash. No bedrock:* — the send path never calls a model.
drive-sync role: secretsmanager:GetSecretValue on the Google service-account secret; s3:PutObject on the glossary bucket; outbound network to www.googleapis.com.

Masking pipeline

The mask/restore code is a shared library imported by translate-in and translate-back, so both directions behave identically. Order matters: IDs and codes are matched first (most specific), then currency and number patterns, then glossary terms (longest match first to avoid partial hits). Each match is replaced by a typed placeholder — [[ID_1]], [[NUM_2]], [[TERM_3]] — and recorded in an in-memory map. After the model returns, the restorer validates that every placeholder it emitted still exists exactly once (a dropped or duplicated placeholder fails the turn and routes to human review), then swaps each back to the original value and writes the swap to tr-swaps. Number and ID patterns are unit-tested against a fixture of locale edge cases (comma/period decimals, hash-prefixed orders, alphanumeric SKUs) so a regex change can’t silently weaken the guardrail.

Observability and cost gates

CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on "error" + "throttle" + "placeholder_mismatch" to a CloudWatch metric for alerting.
Alarms: tr-translate-dlq depth > 0; translate-in error rate > 1% in 24h; placeholder_mismatch > 0 (a masking bug is a money bug); Bedrock throttle count rising.
X-Ray: off by default. Not worth the cost at SMB volume.
AWS Budgets: $25/month threshold, alarm at 80% and 100%, posts to SNS topic tr-cost-alarm subscribed to the on-call admin’s email.

Config and secrets

Google service-account credentials for Drive and Sheets live in Secrets Manager under tr/drive/sa. The widget signing key is under tr/widget/key; the SES sender identity lives in IAM and the verified-domain config. The team’s working language, the confidence thresholds for Sonnet escalation and for human-flagging, the list of glossary categories, and the customer-facing “from” name all live in Parameter Store under /tr/config/. Lambdas fetch config on cold start and cache it for the lifetime of the execution environment.

Deploy

GitHub Actions with OIDC into a deploy role (no long-lived keys) and AWS SAM for the stack. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for tr-glossary-source so a bad glossary edit can be rolled back in one click, and gate deploys on the masking library’s unit tests passing — that test suite is the thing standing between a regex tweak and a changed price in a customer’s inbox. Total deployable surface: around eight Lambdas, three DynamoDB tables, two S3 buckets, one SQS queue plus its DLQ, one SES rule set, a few Function URLs, and one Budgets alarm.

That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.

All posts