Part 7 of 7 · Menu sync series ~8 min read

Engineering reference: the menu sync architecture

Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the SES inbound rule set, EventBridge config, the DynamoDB schemas, the channel-adapter contract, and the Slack interactive flow. Read alongside the previous six posts; this one’s the build sheet.

Region and account shape

Default region: ap-southeast-1 (Singapore). SES inbound, Bedrock cross-Region inference, and EventBridge are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for a restaurant is one channel showing a stale price for an hour, not a regional outage. One AWS account dedicated to the sync (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.

Topology

AWS topology of the menu sync A topology diagram with three regions stacked vertically inside one AWS account boundary. Top region: ingress. Three boxes show the three intake lanes — a Drive sheet sync via the menu-sync Lambda triggered on change that mirrors the menu JSON to s3://ms-menu-source/, an SES inbound rule set with action S3 PUT to s3://ms-raw-mime/ plus the parser Lambda intake-ses-parser that runs Textract on price lists and Bedrock Haiku 4.5 to match items and propose prices for Slack approval, and a quick-edit Function URL Lambda that writes sold-out and small price changes straight to the sheet. Middle region: change processing. The planner Lambda is triggered when the menu mirror lands in S3; it reads s3://ms-menu-source/menu.json, computes the diff against DynamoDB ms-state, looks up limits in s3://ms-rules-source/rules.txt, and emits one of three events to the EventBridge default bus per item-and-channel that needs an action: ms.push, ms.hold, or ms.flag. Bottom region: publish and fix. The publisher Lambda is triggered by an EventBridge rule on those three event types; it resolves the channels, runs the approval gate for holds, formats via s3://ms-rules-source/voice.txt, calls the channel adapter with an idempotency key, and writes the result to DynamoDB ms-state. Slack interactive button clicks land on a Function URL Lambda fix-handler that updates ms-state and ms-audit and, on edit, updates the menu sheet via the Google Sheets API. CloudWatch Logs collects from every Lambda at 7-day retention. Across the right edge: a small box labelled AWS Budgets alarm at $25 monthly threshold, posting to SNS topic ms-cost-alarm. A note at the bottom: one master menu, every channel in step — and every interaction is logged to ms-audit. Ingress Lambda · menu-sync on sheet change Sheets API → s3://ms-menu-source/ menu.json SES inbound rule set ms-inbound-rules action: S3 PUT s3://ms-raw-mime/ trigger: intake-ses-parser Lambda · quick-edit Function URL phone page writes sold-out + small price → Sheets API Master menu sheet canonical store · mirrored to S3 Change processing S3 event trigger PUT menu.json in ms-menu-source target: planner Lambda + scheduled audit run Lambda · planner reads JSON from S3 + rules.txt + voice.txt diffs vs ms-state, picks one of four moves EventBridge default bus ms.push ms.hold ms.flag (in sync → no event) Publish & fix Lambda · publisher resolves channels, approval, format; calls channel adapter with idempotency key Slack interactive cards with [Retry] [Edit] [Skip] button clicks → Function URL Lambda · fix-handler writes ms-state, ms-audit, and on edit updates the Sheet via Sheets API One master menu, every channel in step — and every interaction is logged to ms-audit.
Fig 7. AWS topology, in three regions of the diagram: ingress (three lanes into the menu), change processing (the planner diffing and emitting events), publish and fix (the update lands and a rejection’s fix is recorded). Every Lambda is event- or trigger-driven; nothing is synchronous-chained.

Lambda functions

All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.

  • menu-sync — triggered on sheet change (Drive push notification to a Function URL, with a fallback EventBridge Scheduler poll every 5 minutes). Uses the Google Drive API + Sheets API (service-account credentials in Secrets Manager under ms/drive/sa) to export the master menu sheet as JSON and write to s3://ms-menu-source/menu.json only if the sheet has changed since the last sync. The same pattern mirrors the rules and voice docs to s3://ms-rules-source/. Memory: 256 MB. Timeout: 30 s.
  • quick-edit — Lambda Function URL serving a small static phone page plus its write endpoint. Gated behind a staff PIN from Parameter Store. Writes sold-out toggles and in-limit price changes straight to the Drive sheet via the Sheets API, then lets the normal menu-sync mirror + planner flow run. Memory: 256 MB. Timeout: 15 s.
  • intake-ses-parser — S3 PUT trigger on s3://ms-raw-mime/. Parses MIME, extracts the price-list attachment, runs Textract via StartDocumentTextDetection + StartDocumentAnalysis (asynchronously, to handle multi-page lists and tables). On Textract completion (via SNS notification), reads the structured text and calls Bedrock Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0 via global.anthropic.claude-haiku-4-5-20251001-v1:0) to match each line to a menu item by name and propose updated prices. Posts the proposal to Slack via the incoming webhook with Approve/Edit/Discard buttons. For DOCX attachments (Textract doesn’t accept them), falls back to python-docx; XLSX uses openpyxl. Both packages are stable and widely used in 2026, though their maintenance velocity is light — for a price-list path that only runs a few times a month, that’s acceptable. If extraction precision becomes a concern, the active community fork python-docx-oss is a drop-in alternative. Memory: 512 MB. Timeout: 60 s.
  • planner — S3 event trigger on menu.json PUT (plus a daily EventBridge Scheduler audit run that re-diffs every channel to catch drift). Reads s3://ms-menu-source/menu.json and the rules and voice docs. For each item-and-channel pair, computes the diff against ms-state, checks the change against the auto-sync limits, and decides on a move. Emits one event per pair that needs action: ms.push, ms.hold, or ms.flag, with the item-and-channel context as the event payload. In-sync pairs emit nothing. Memory: 512 MB. Timeout: 60 s. No Bedrock calls.
  • publisher — EventBridge rule on the three move events. Resolves the channel list, runs the approval gate (push passes through; hold sends a Slack approval card and waits), formats the change from the voice template, attaches an idempotency key built from (item_id, channel, value_hash), and calls the channel adapter. Writes the result to ms-state after the adapter responds. On a hold, the approval card’s Approve button re-invokes publisher with the same payload. Memory: 256 MB. Timeout: 30 s.
  • fix-handler — Lambda Function URL, public with AuthType: NONE; verifies a Slack signature on the request body. Triggered by Slack interactive button clicks (Retry/Edit/Skip) on flagged rejections. Writes to ms-state and ms-audit; on retry, re-invokes the adapter with the same idempotency key; on edit, updates the Drive sheet (or the channel’s voice template) via the Sheets API and resends; on skip, marks the pair skipped. Memory: 256 MB. Timeout: 15 s.
  • digest — EventBridge Scheduler target, weekly Sunday 6pm. Reads ms-state and ms-audit for the past week; sends a digest to a configured Slack channel listing changes pushed, held items awaiting approval, and every channel currently out of step. No Bedrock; the message is a plain summary table. Memory: 256 MB.
  • summary — EventBridge Scheduler target, monthly on the first Monday at 9am. Reads the past month’s ms-state and ms-audit; calls Bedrock Haiku 4.5 to write a one-paragraph owner narrative (how many changes flowed automatically, how many needed a tap, which channel rejects most); emails it via SES. Memory: 512 MB.

Storage

  • DynamoDB · ms-state — one row per item per channel. PK (item_id, channel); attributes: last_value, last_push_status (pushed/rejected/skipped), last_push_at, reject_reason, idempotency_key. On-demand. No TTL — this is the live picture of what each channel shows.
  • DynamoDB · ms-audit — one row per write action of any kind. PK (item_id, ts); attributes: channel, action (push/hold-approved/retry/edit/skip), by_user, before, after. On-demand. No TTL — this is the long-term audit trail.
  • S3 · ms-menu-source — mirrored JSON from the master menu sheet. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 7 years.
  • S3 · ms-rules-source — mirrored rules and voice docs as plain text. Versioning enabled.
  • S3 · ms-raw-mime — raw inbound MIME from forwarded price lists. Lifecycle to Glacier at 30 days; expiry at 7 years.
  • S3 · ms-pdf-out — generated printable menu PDFs, one per regenerate, with the latest under a stable key the website links to.

Channel adapters

An adapter is a small module implementing one contract: apply(item, formatted_value, idempotency_key) -> {status, reason} and read(item) -> current_value. Each adapter encapsulates one channel’s auth, units, and field set.

  • adapter-website — writes to the site’s content store (a headless CMS API or a JSON file in s3://ms-menu-source/site/ the site reads at build/render). Accepts long descriptions and dollar prices.
  • adapter-platform-* — one per online-order platform. Calls that platform’s menu API (OAuth token in Secrets Manager under ms/channel/<name>), converts prices to integer cents, enforces the platform’s name/description length caps, and maps categories to the platform’s taxonomy. These are the adapters most likely to return a rejection.
  • adapter-pdf — regenerates the printable menu (a small HTML-to-PDF render in Lambda) and writes it to s3://ms-pdf-out/. Always accepts; the “rejection” case here is a render error, which is flagged the same way.
  • adapter-qr — updates the QR-code landing page (the same JSON the website reads, or a dedicated key). Accepts the full field set.

Bedrock

  • Foundation model. anthropic.claude-haiku-4-5-20251001-v1:0 via the Global cross-Region inference profile global.anthropic.claude-haiku-4-5-20251001-v1:0. Two callsites: intake-ses-parser for the supplier-price matching, and summary for the monthly owner narrative. The heavier anthropic.claude-sonnet-4-6 profile is wired but unused at SMB volume; it’s the upgrade path only if price-list matching across a very large menu needs deeper reasoning.
  • Embeddings. Not used. The menu is structured rows; deterministic comparison beats vector retrieval here. No Knowledge Base, no S3 Vectors. (If supplier lists ever need fuzzy item matching at scale, Amazon Titan Text Embeddings V2 at 1024-dim over S3 Vectors is the path — not needed today.)
  • Quotas. Default account quotas are more than enough at SMB volume. The planner doesn’t call Bedrock; the parsing lane fires a few times a month at most.

EventBridge config

  • ms-move-rule — rule on the default bus matching ms.push, ms.hold, ms.flag. Target: publisher Lambda.
  • ms-menu-sync-poll — EventBridge Scheduler, rate(5 minutes), fallback if a Drive push notification is missed. Target: menu-sync Lambda.
  • ms-daily-auditcron(0 4 * * ? *) in TZ. Target: planner Lambda, forced full re-diff of every channel to catch silent drift.
  • ms-weekly-digestcron(0 18 ? * SUN *) in TZ. Target: digest Lambda.
  • ms-monthly-summarycron(0 9 ? * 2#1 *) (first Monday at 9am) in TZ. Target: summary Lambda.

SES inbound and outbound

  • Set the MX record on a dedicated subdomain (e.g. prices.your-restaurant.com) to inbound-smtp.ap-southeast-1.amazonaws.com.
  • SES inbound rule set ms-inbound-rules: one rule with recipient prices@your-restaurant.com → spam scan → S3 PUT to s3://ms-raw-mime/<message-id> → stop. The S3 PUT triggers intake-ses-parser.
  • SES outbound for the digest and summary emails: verify a sender identity at menu@your-restaurant.com with DKIM and SPF on the parent domain. Out of sandbox by request.

IAM (least privilege per Lambda)

Each Lambda has its own role with policies scoped to exact ARNs. Sketch:

  • planner role: s3:GetObject on the menu, rules, and voice keys; dynamodb:Query + GetItem on ms-state; events:PutEvents on the default bus. No bedrock:*.
  • publisher role: secretsmanager:GetSecretValue on the channel and Slack secrets; dynamodb:PutItem on ms-state; s3:PutObject on ms-pdf-out and the site key; outbound network access to hooks.slack.com and each channel’s API host.
  • fix-handler role: dynamodb:PutItem on ms-state and ms-audit; secretsmanager:GetSecretValue on the Sheets-API and channel secrets; outbound network access to sheets.googleapis.com and channel hosts; dynamodb:Query for state lookup.
  • intake-ses-parser role: s3:GetObject on ms-raw-mime; textract:StartDocumentTextDetection + StartDocumentAnalysis; bedrock:InvokeModel on the Haiku ARN; secretsmanager:GetSecretValue on the Slack webhook.
  • menu-sync and quick-edit roles: secretsmanager:GetSecretValue on the Google service-account secret; s3:PutObject on the menu and rules buckets; outbound network to www.googleapis.com.

Slack interactive flow

The Slack incoming webhook is the simplest delivery surface but doesn’t support interactive button responses. So the approval and flag messages are posted via the chat.postMessage Web API instead, with Block Kit blocks containing the action buttons. Button clicks are sent by Slack to the configured Interactivity request URL, which is the fix-handler Function URL (held approvals route to the publisher via the same handler). fix-handler verifies the Slack signing secret on the inbound request, parses the action_id (retry, edit, skip, approve, reject), opens a modal if needed (Edit opens a modal; Retry/Skip/Approve are one-tap), and processes the response when the modal is submitted.

The Slack app needs chat:write, im:write, and the Interactivity URL configured. The bot token lives in Secrets Manager under ms/slack/bot-token. The signing secret is ms/slack/signing-secret.

Observability and cost gates

  • CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on "error" + "throttle" + "timeout" to a CloudWatch metric for alerting.
  • Alarms: publisher rejection rate > 5% in 24h (a channel might have changed its API); planner failures > 0 in a day; fix-handler signature-verification failures > 5/hour (might mean the Slack secret rotated).
  • X-Ray: off by default. Not worth the cost at SMB volume.
  • AWS Budgets: $25/month threshold, alarm at 80% and 100%, posts to SNS topic ms-cost-alarm subscribed to the on-call admin’s email and Slack.

Config and secrets

Service-account credentials for Drive and Sheets APIs live in Secrets Manager under ms/drive/sa. Each channel’s API token lives under ms/channel/<name>. Slack bot token, signing secret, and webhook URL all under ms/slack/*. SES sender identity lives in IAM and the verified-domain config. The configured timezone, auto-sync limits, approval thresholds, staff PIN, and channel list all live in Parameter Store under /ms/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.

Deploy

GitHub Actions + OIDC + AWS SAM, no long-lived keys — the CI role is assumed via an OIDC trust policy scoped to the repo. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for both ms-menu-source and ms-rules-source so a bad Drive edit can be rolled back in one click, and keep each channel adapter behind a feature flag so a misbehaving platform can be paused without a redeploy. Total deployable surface: around nine Lambdas, two DDB tables, four S3 buckets, one EventBridge rule on the default bus (plus the Scheduler rules), one SES rule set, and one Budgets alarm.

That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.

All posts