Engineering reference: the shipping notifier architecture
Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the carrier webhook flow, the SES inbound rule set, EventBridge Scheduler config, and the DynamoDB schemas. Read alongside the previous six posts; this one’s the build sheet.
Region and account shape
Default region: ap-southeast-1 (Singapore). SES inbound and outbound, Bedrock cross-Region inference, and EventBridge Scheduler are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for an SMB is a customer missing a shipping update, not a regional outage. One AWS account dedicated to the notifier (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.
Topology
Lambda functions
All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.
drive-sync— EventBridge Scheduler target, fires every few minutes. Uses the Google Drive API + Sheets API (service-account credentials in Secrets Manager undersn/drive/sa) to export the order list sheet as CSV and write tos3://sn-orders-source/orders.csvonly if the sheet has changed since the last sync. Same pattern syncs the rules and voice docs tos3://sn-rules-source/. Memory: 256 MB. Timeout: 30 s.webhook-handler— Lambda Function URL, public withAuthType: NONE; verifies a carrier-specific HMAC signature or shared secret on the request body (secret in Secrets Manager undersn/carrier/secret). Parses the carrier’s tracking-update payload, matches the tracking number to an order via a GSI on the order list cache, and updates the status field in the Drive sheet via the Sheets API. Unmatched tracking numbers are written to ansn-unmatchedlist for the weekly digest. Memory: 256 MB. Timeout: 15 s.intake-ses-parser— S3 PUT trigger ons3://sn-raw-mime/. Parses MIME, extracts the text body of the forwarded carrier email, and calls Bedrock Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0viaglobal.anthropic.claude-haiku-4-5-20251001-v1:0) to extract the tracking number and the new status. Posts the proposal to Slack viachat.postMessagewith Approve/Edit/Discard buttons. Memory: 512 MB. Timeout: 30 s.notifier— EventBridge Scheduler target, every 30 minutes during the day (the schedule expression runs inTZ_NAMEset to the SMB’s timezone, e.g.Asia/Singapore). Readss3://sn-orders-source/orders.csvand the rules and voice docs. For each row, compares current status to last-sent, reads state fromsn-sendsandsn-prefs, decides on a move. Emits one event per row that needs an update:sn.shipped,sn.out_for_delivery,sn.delivered, orsn.delayed, with the order context as the event payload. Orders with nothing new emit nothing. Memory: 512 MB. Timeout: 60 s. No Bedrock calls.sender— EventBridge rule on the four move events. Resolves contact, checks quiet hours and the unsubscribe flag, formats the update from the voice template, and ships via SESSendRawEmailfrom the verified sending identity. On a quiet-hours defer, creates a one-off EventBridge Scheduler rule that re-invokessenderat the start of the next sending window. On asn.delayedevent, adds the owner as a recipient. Writes a row tosn-sendsafter a successful send. Memory: 256 MB. Timeout: 30 s.unsub-handler— Lambda Function URL, public withAuthType: NONE; the unsubscribe link carries a signed token so only the real recipient can opt out their own order. Writes the opt-out tosn-prefsand an audit row tosn-audit. Returns a small confirmation page. Memory: 256 MB. Timeout: 15 s.digest— EventBridge Scheduler target, weekly Sunday 6pm. Readssn-sendsand thesn-unmatchedlist for the past week; sends a digest message to a configured Slack channel summarizing updates sent, orders in flight, and any unmatched tracking numbers. No Bedrock; the message is a plain summary table. Memory: 256 MB.summary— EventBridge Scheduler target, monthly on the first Monday at 9am. Reads the past month’ssn-sendsandsn-audit; calls Bedrock Haiku 4.5 to write a one-paragraph narrative (orders shipped, delivered, average days in transit, how many ran late); emails it via SES to the configured stakeholder list. Memory: 512 MB.
Storage
- DynamoDB ·
sn-sends— one row per update sent. PK(order_id, status); attributes:sent_date,sent_via(customer/owner),recipient,move(shipped/out_for_delivery/delivered/delayed). On-demand. No TTL. - DynamoDB ·
sn-prefs— one row per order’s notification preference. PKorder_id; attributes:unsubscribed(bool),mute_until(date, optional),updated_by. On-demand. - DynamoDB ·
sn-audit— one row per write action of any kind. PK(order_id, ts); attributes:action,days_late(if delay),before,after. On-demand. No TTL — this is the long-term audit trail. - DynamoDB ·
sn-unmatched— webhook tracking updates that matched no order. PKtracking_no; attributes:status,received_at,raw. On-demand. TTL 30 days. - S3 ·
sn-orders-source— mirrored CSV from the Drive order list sheet. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 3 years. - S3 ·
sn-rules-source— mirrored rules and voice docs as plain text. Versioning enabled. - S3 ·
sn-raw-mime— raw inbound MIME from forwarded carrier emails. Lifecycle to Glacier at 30 days; expiry at 1 year.
Bedrock
- Foundation model.
anthropic.claude-haiku-4-5-20251001-v1:0via the Global cross-Region inference profileglobal.anthropic.claude-haiku-4-5-20251001-v1:0. Two callsites:intake-ses-parserfor reading forwarded carrier emails, andsummaryfor the monthly narrative. Claude Sonnet 4.6 isn’t used here — neither task needs the heavier reasoning, so Haiku 4.5 is the right cost-for-quality fit. - Embeddings. Not used. The order list is structured rows; deterministic lookup beats vector retrieval here. No Knowledge Base, no S3 Vectors.
- Quotas. Default account quotas are more than enough at SMB volume. The notifier itself doesn’t call Bedrock; the parsing lane fires a few times a month at most.
EventBridge Scheduler config
sn-status-check—rate(30 minutes)with a FlexibleTimeWindow; target:notifierLambda. A daytime-only window is enforced in code, not the schedule, so a late-night delivered scan still gets deferred rather than dropped.sn-drive-sync—rate(5 minutes). Target:drive-syncLambda.sn-weekly-digest—cron(0 18 ? * SUN *)in TZ. Target:digestLambda.sn-monthly-summary—cron(0 9 ? * 2#1 *)(first Monday at 9am) in TZ. Target:summaryLambda.- One-off rules — created on the fly by
senderwhen a quiet-hours defer is needed. Useat(YYYY-MM-DDTHH:MM:SS)expressions with--action-after-completion DELETEso the rule self-cleans.
SES inbound and outbound
- Set the MX record on a dedicated subdomain (e.g.
tracking.your-company.com) toinbound-smtp.ap-southeast-1.amazonaws.com. - SES inbound rule set
sn-inbound-rules: one rule with recipienttracking@your-company.com→ spam scan → S3 PUT tos3://sn-raw-mime/<message-id>→ stop. The S3 PUT triggersintake-ses-parser. - SES outbound for the customer updates: verify a sending identity at
orders@your-company.comwith DKIM and SPF on the parent domain, and a list-unsubscribe header on every message. Out of sandbox by request.
IAM (least privilege per Lambda)
Each Lambda has its own role with policies scoped to exact ARNs. Sketch:
- notifier role:
s3:GetObjecton the orders, rules, and voice keys;dynamodb:Query+GetItemonsn-sends,sn-prefs;events:PutEventson the default bus. Nobedrock:*. - sender role:
events:CreateSchedulefor the deferred-send one-offs;ses:SendRawEmailfrom the verified sending identity;dynamodb:PutItemonsn-sends;s3:GetObjecton the voice template;secretsmanager:GetSecretValueonly if a per-tenant sender config is used. - webhook-handler role:
secretsmanager:GetSecretValueon the carrier secret;secretsmanager:GetSecretValueon the Sheets-API service-account secret; outbound network access tosheets.googleapis.com;dynamodb:PutItemonsn-unmatched. - intake-ses-parser role:
s3:GetObjectonsn-raw-mime;bedrock:InvokeModelon the Haiku ARN;secretsmanager:GetSecretValueon the Slack bot token. - unsub-handler role:
dynamodb:PutItemonsn-prefsandsn-audit;secretsmanager:GetSecretValueon the token-signing secret. - drive-sync role:
secretsmanager:GetSecretValueon the Google service-account secret;s3:PutObjecton the orders and rules buckets; outbound network towww.googleapis.com.
Carrier webhook flow
Most carriers (and aggregators like a tracking API) can POST a tracking-update payload to a URL on each scan event. The webhook-handler Function URL is that URL. On each request it verifies the carrier’s signature (an HMAC over the body with a shared secret, or a bearer token, depending on the carrier), parses the tracking number and the new status, normalizes the carrier’s status vocabulary into the system’s five stages (a small mapping table in the rules doc handles carrier-specific status names), and updates the matching order’s status in the Drive sheet via the Sheets API.
Because the webhook writes the carrier’s authoritative status with no human in the loop, the only safety checks are the signature verification and the tracking-number match. A request that fails the signature is rejected with a 401 and logged. A request whose tracking number matches no order is written to sn-unmatched so a person can reconcile it from the weekly digest, rather than being silently dropped.
Observability and cost gates
- CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on
"error"+"throttle"+"timeout"to a CloudWatch metric for alerting. - Alarms: notifier Lambda failures > 0 in an hour (the check is the one piece that has to run); sender failure rate > 1% in 24h; webhook signature-verification failures > 5/hour (might mean the carrier secret rotated).
- X-Ray: off by default. Not worth the cost at SMB volume.
- AWS Budgets: $20/month threshold, alarm at 80% and 100%, posts to SNS topic
sn-cost-alarmsubscribed to the on-call admin’s email and Slack.
Config and secrets
Service-account credentials for Drive and Sheets APIs live in Secrets Manager under sn/drive/sa (one service account with scopes for both APIs). Carrier webhook secrets live under sn/carrier/*. Slack bot token lives under sn/slack/bot-token. The unsubscribe token-signing secret is under sn/unsub/signing-secret. The configured timezone, quiet-hours window, expected-delivery windows per carrier, owner contact, and grace setting all live in Parameter Store under /sn/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.
Deploy
GitHub Actions + OIDC + AWS SAM, no long-lived keys. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for both sn-orders-source and sn-rules-source so a bad Drive edit can be rolled back in one click, and version the EventBridge Scheduler timezone setting so you don’t accidentally start running the check in UTC after a CI rotation. Total deployable surface: around eight Lambdas, four DDB tables, three S3 buckets, one EventBridge rule on the default bus (plus the Scheduler rules), one SES rule set, and one Budgets alarm.
That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.
All posts