Engineering reference: the menu sync architecture
Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the SES inbound rule set, EventBridge config, the DynamoDB schemas, the channel-adapter contract, and the Slack interactive flow. Read alongside the previous six posts; this one’s the build sheet.
Region and account shape
Default region: ap-southeast-1 (Singapore). SES inbound, Bedrock cross-Region inference, and EventBridge are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for a restaurant is one channel showing a stale price for an hour, not a regional outage. One AWS account dedicated to the sync (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.
Topology
Lambda functions
All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.
menu-sync— triggered on sheet change (Drive push notification to a Function URL, with a fallback EventBridge Scheduler poll every 5 minutes). Uses the Google Drive API + Sheets API (service-account credentials in Secrets Manager underms/drive/sa) to export the master menu sheet as JSON and write tos3://ms-menu-source/menu.jsononly if the sheet has changed since the last sync. The same pattern mirrors the rules and voice docs tos3://ms-rules-source/. Memory: 256 MB. Timeout: 30 s.quick-edit— Lambda Function URL serving a small static phone page plus its write endpoint. Gated behind a staff PIN from Parameter Store. Writes sold-out toggles and in-limit price changes straight to the Drive sheet via the Sheets API, then lets the normalmenu-syncmirror + planner flow run. Memory: 256 MB. Timeout: 15 s.intake-ses-parser— S3 PUT trigger ons3://ms-raw-mime/. Parses MIME, extracts the price-list attachment, runs Textract viaStartDocumentTextDetection+StartDocumentAnalysis(asynchronously, to handle multi-page lists and tables). On Textract completion (via SNS notification), reads the structured text and calls Bedrock Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0viaglobal.anthropic.claude-haiku-4-5-20251001-v1:0) to match each line to a menu item by name and propose updated prices. Posts the proposal to Slack via the incoming webhook with Approve/Edit/Discard buttons. For DOCX attachments (Textract doesn’t accept them), falls back topython-docx; XLSX usesopenpyxl. Both packages are stable and widely used in 2026, though their maintenance velocity is light — for a price-list path that only runs a few times a month, that’s acceptable. If extraction precision becomes a concern, the active community forkpython-docx-ossis a drop-in alternative. Memory: 512 MB. Timeout: 60 s.planner— S3 event trigger onmenu.jsonPUT (plus a daily EventBridge Scheduler audit run that re-diffs every channel to catch drift). Readss3://ms-menu-source/menu.jsonand the rules and voice docs. For each item-and-channel pair, computes the diff againstms-state, checks the change against the auto-sync limits, and decides on a move. Emits one event per pair that needs action:ms.push,ms.hold, orms.flag, with the item-and-channel context as the event payload. In-sync pairs emit nothing. Memory: 512 MB. Timeout: 60 s. No Bedrock calls.publisher— EventBridge rule on the three move events. Resolves the channel list, runs the approval gate (push passes through; hold sends a Slack approval card and waits), formats the change from the voice template, attaches an idempotency key built from(item_id, channel, value_hash), and calls the channel adapter. Writes the result toms-stateafter the adapter responds. On a hold, the approval card’s Approve button re-invokespublisherwith the same payload. Memory: 256 MB. Timeout: 30 s.fix-handler— Lambda Function URL, public withAuthType: NONE; verifies a Slack signature on the request body. Triggered by Slack interactive button clicks (Retry/Edit/Skip) on flagged rejections. Writes toms-stateandms-audit; on retry, re-invokes the adapter with the same idempotency key; on edit, updates the Drive sheet (or the channel’s voice template) via the Sheets API and resends; on skip, marks the pair skipped. Memory: 256 MB. Timeout: 15 s.digest— EventBridge Scheduler target, weekly Sunday 6pm. Readsms-stateandms-auditfor the past week; sends a digest to a configured Slack channel listing changes pushed, held items awaiting approval, and every channel currently out of step. No Bedrock; the message is a plain summary table. Memory: 256 MB.summary— EventBridge Scheduler target, monthly on the first Monday at 9am. Reads the past month’sms-stateandms-audit; calls Bedrock Haiku 4.5 to write a one-paragraph owner narrative (how many changes flowed automatically, how many needed a tap, which channel rejects most); emails it via SES. Memory: 512 MB.
Storage
- DynamoDB ·
ms-state— one row per item per channel. PK(item_id, channel); attributes:last_value,last_push_status(pushed/rejected/skipped),last_push_at,reject_reason,idempotency_key. On-demand. No TTL — this is the live picture of what each channel shows. - DynamoDB ·
ms-audit— one row per write action of any kind. PK(item_id, ts); attributes:channel,action(push/hold-approved/retry/edit/skip),by_user,before,after. On-demand. No TTL — this is the long-term audit trail. - S3 ·
ms-menu-source— mirrored JSON from the master menu sheet. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 7 years. - S3 ·
ms-rules-source— mirrored rules and voice docs as plain text. Versioning enabled. - S3 ·
ms-raw-mime— raw inbound MIME from forwarded price lists. Lifecycle to Glacier at 30 days; expiry at 7 years. - S3 ·
ms-pdf-out— generated printable menu PDFs, one per regenerate, with the latest under a stable key the website links to.
Channel adapters
An adapter is a small module implementing one contract: apply(item, formatted_value, idempotency_key) -> {status, reason} and read(item) -> current_value. Each adapter encapsulates one channel’s auth, units, and field set.
adapter-website— writes to the site’s content store (a headless CMS API or a JSON file ins3://ms-menu-source/site/the site reads at build/render). Accepts long descriptions and dollar prices.adapter-platform-*— one per online-order platform. Calls that platform’s menu API (OAuth token in Secrets Manager underms/channel/<name>), converts prices to integer cents, enforces the platform’s name/description length caps, and maps categories to the platform’s taxonomy. These are the adapters most likely to return a rejection.adapter-pdf— regenerates the printable menu (a small HTML-to-PDF render in Lambda) and writes it tos3://ms-pdf-out/. Always accepts; the “rejection” case here is a render error, which is flagged the same way.adapter-qr— updates the QR-code landing page (the same JSON the website reads, or a dedicated key). Accepts the full field set.
Bedrock
- Foundation model.
anthropic.claude-haiku-4-5-20251001-v1:0via the Global cross-Region inference profileglobal.anthropic.claude-haiku-4-5-20251001-v1:0. Two callsites:intake-ses-parserfor the supplier-price matching, andsummaryfor the monthly owner narrative. The heavieranthropic.claude-sonnet-4-6profile is wired but unused at SMB volume; it’s the upgrade path only if price-list matching across a very large menu needs deeper reasoning. - Embeddings. Not used. The menu is structured rows; deterministic comparison beats vector retrieval here. No Knowledge Base, no S3 Vectors. (If supplier lists ever need fuzzy item matching at scale, Amazon Titan Text Embeddings V2 at 1024-dim over S3 Vectors is the path — not needed today.)
- Quotas. Default account quotas are more than enough at SMB volume. The planner doesn’t call Bedrock; the parsing lane fires a few times a month at most.
EventBridge config
ms-move-rule— rule on the default bus matchingms.push,ms.hold,ms.flag. Target:publisherLambda.ms-menu-sync-poll— EventBridge Scheduler,rate(5 minutes), fallback if a Drive push notification is missed. Target:menu-syncLambda.ms-daily-audit—cron(0 4 * * ? *)in TZ. Target:plannerLambda, forced full re-diff of every channel to catch silent drift.ms-weekly-digest—cron(0 18 ? * SUN *)in TZ. Target:digestLambda.ms-monthly-summary—cron(0 9 ? * 2#1 *)(first Monday at 9am) in TZ. Target:summaryLambda.
SES inbound and outbound
- Set the MX record on a dedicated subdomain (e.g.
prices.your-restaurant.com) toinbound-smtp.ap-southeast-1.amazonaws.com. - SES inbound rule set
ms-inbound-rules: one rule with recipientprices@your-restaurant.com→ spam scan → S3 PUT tos3://ms-raw-mime/<message-id>→ stop. The S3 PUT triggersintake-ses-parser. - SES outbound for the digest and summary emails: verify a sender identity at
menu@your-restaurant.comwith DKIM and SPF on the parent domain. Out of sandbox by request.
IAM (least privilege per Lambda)
Each Lambda has its own role with policies scoped to exact ARNs. Sketch:
- planner role:
s3:GetObjecton the menu, rules, and voice keys;dynamodb:Query+GetItemonms-state;events:PutEventson the default bus. Nobedrock:*. - publisher role:
secretsmanager:GetSecretValueon the channel and Slack secrets;dynamodb:PutItemonms-state;s3:PutObjectonms-pdf-outand the site key; outbound network access tohooks.slack.comand each channel’s API host. - fix-handler role:
dynamodb:PutItemonms-stateandms-audit;secretsmanager:GetSecretValueon the Sheets-API and channel secrets; outbound network access tosheets.googleapis.comand channel hosts;dynamodb:Queryfor state lookup. - intake-ses-parser role:
s3:GetObjectonms-raw-mime;textract:StartDocumentTextDetection+StartDocumentAnalysis;bedrock:InvokeModelon the Haiku ARN;secretsmanager:GetSecretValueon the Slack webhook. - menu-sync and quick-edit roles:
secretsmanager:GetSecretValueon the Google service-account secret;s3:PutObjecton the menu and rules buckets; outbound network towww.googleapis.com.
Slack interactive flow
The Slack incoming webhook is the simplest delivery surface but doesn’t support interactive button responses. So the approval and flag messages are posted via the chat.postMessage Web API instead, with Block Kit blocks containing the action buttons. Button clicks are sent by Slack to the configured Interactivity request URL, which is the fix-handler Function URL (held approvals route to the publisher via the same handler). fix-handler verifies the Slack signing secret on the inbound request, parses the action_id (retry, edit, skip, approve, reject), opens a modal if needed (Edit opens a modal; Retry/Skip/Approve are one-tap), and processes the response when the modal is submitted.
The Slack app needs chat:write, im:write, and the Interactivity URL configured. The bot token lives in Secrets Manager under ms/slack/bot-token. The signing secret is ms/slack/signing-secret.
Observability and cost gates
- CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on
"error"+"throttle"+"timeout"to a CloudWatch metric for alerting. - Alarms: publisher rejection rate > 5% in 24h (a channel might have changed its API); planner failures > 0 in a day; fix-handler signature-verification failures > 5/hour (might mean the Slack secret rotated).
- X-Ray: off by default. Not worth the cost at SMB volume.
- AWS Budgets: $25/month threshold, alarm at 80% and 100%, posts to SNS topic
ms-cost-alarmsubscribed to the on-call admin’s email and Slack.
Config and secrets
Service-account credentials for Drive and Sheets APIs live in Secrets Manager under ms/drive/sa. Each channel’s API token lives under ms/channel/<name>. Slack bot token, signing secret, and webhook URL all under ms/slack/*. SES sender identity lives in IAM and the verified-domain config. The configured timezone, auto-sync limits, approval thresholds, staff PIN, and channel list all live in Parameter Store under /ms/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.
Deploy
GitHub Actions + OIDC + AWS SAM, no long-lived keys — the CI role is assumed via an OIDC trust policy scoped to the repo. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for both ms-menu-source and ms-rules-source so a bad Drive edit can be rolled back in one click, and keep each channel adapter behind a feature flag so a misbehaving platform can be paused without a redeploy. Total deployable surface: around nine Lambdas, two DDB tables, four S3 buckets, one EventBridge rule on the default bus (plus the Scheduler rules), one SES rule set, and one Budgets alarm.
That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.
All posts