Part 7 of 7 · Shipping notifier series ~8 min read

Engineering reference: the shipping notifier architecture

Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the carrier webhook flow, the SES inbound rule set, EventBridge Scheduler config, and the DynamoDB schemas. Read alongside the previous six posts; this one’s the build sheet.

Region and account shape

Default region: ap-southeast-1 (Singapore). SES inbound and outbound, Bedrock cross-Region inference, and EventBridge Scheduler are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for an SMB is a customer missing a shipping update, not a regional outage. One AWS account dedicated to the notifier (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.

Topology

AWS topology of the shipping notifier A topology diagram with three regions stacked vertically inside one AWS account boundary. Top region: ingress. Three boxes show the three intake lanes — a Drive sheet sync via the drive-sync Lambda triggered every few minutes by EventBridge Scheduler that mirrors the order CSV to s3://sn-orders-source/, a carrier webhook handled by the webhook-handler Lambda Function URL that verifies a shared secret, matches the tracking number to an order, and updates the status directly, and an SES inbound rule set with action S3 PUT to s3://sn-raw-mime/ plus the parser Lambda intake-ses-parser that runs Bedrock Haiku 4.5 to read a forwarded carrier email and propose a status change for Slack approval. Middle region: scheduled processing. The notifier Lambda is triggered every 30 minutes during the day by EventBridge Scheduler; it reads s3://sn-orders-source/orders.csv, iterates rows, compares current status to last-sent, reads send and preference state from DynamoDB, and emits one of four events to the EventBridge default bus per order that needs an update: sn.shipped, sn.out_for_delivery, sn.delivered, or sn.delayed. Bottom region: dispatch and acknowledgment. The sender Lambda is triggered by an EventBridge rule on those four event types; it resolves the contact, checks quiet hours and the unsubscribe flag, fetches the update template from s3://sn-rules-source/voice.txt, sends the email via SES outbound, and writes a row to DynamoDB sn-sends. Unsubscribe link clicks land on a Function URL Lambda unsub-handler that writes the opt-out to sn-prefs. CloudWatch Logs collects from every Lambda at 7-day retention. Across the right edge: a small box labelled AWS Budgets alarm at $20 monthly threshold, posting to SNS topic sn-cost-alarm. A note at the bottom: every update is sent once — and every send and delay is logged to sn-audit. Ingress Lambda · drive-sync every few min Sheets API → s3://sn-orders-source/ orders.csv webhook-handler URL Function URL verify shared secret match tracking no. update status direct SES inbound · Haiku S3 PUT sn-raw-mime intake-ses-parser reads carrier email → Slack proposal Drive order list canonical store · mirrored to S3 Scheduled processing EventBridge Scheduler rate(30 minutes) in TZ_NAME target: notifier Lambda + deferred one-offs Lambda · notifier reads CSV from S3 + rules.txt + voice.txt compares status, picks one of five moves EventBridge default bus sn.shipped sn.out_for_delivery sn.delivered · sn.delayed (nothing → no event) Dispatch & acknowledgment Lambda · sender resolves contact, quiet hours, unsub; SES outbound email, owner cc on delays Customer email order, status, tracking + unsub link unsub click → Function URL Lambda · unsub-handler writes sn-prefs, sn-audit; the next check skips this customer’s sends Every update is sent once — and every send and delay is logged to sn-audit.
Fig 7. AWS topology, in three regions of the diagram: ingress (three lanes into the order list), scheduled processing (the recurring check emitting events), dispatch and acknowledgment (the update ships and the customer’s unsubscribe is recorded). Every Lambda is event- or schedule-driven; nothing is synchronous-chained.

Lambda functions

All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.

  • drive-sync — EventBridge Scheduler target, fires every few minutes. Uses the Google Drive API + Sheets API (service-account credentials in Secrets Manager under sn/drive/sa) to export the order list sheet as CSV and write to s3://sn-orders-source/orders.csv only if the sheet has changed since the last sync. Same pattern syncs the rules and voice docs to s3://sn-rules-source/. Memory: 256 MB. Timeout: 30 s.
  • webhook-handler — Lambda Function URL, public with AuthType: NONE; verifies a carrier-specific HMAC signature or shared secret on the request body (secret in Secrets Manager under sn/carrier/secret). Parses the carrier’s tracking-update payload, matches the tracking number to an order via a GSI on the order list cache, and updates the status field in the Drive sheet via the Sheets API. Unmatched tracking numbers are written to an sn-unmatched list for the weekly digest. Memory: 256 MB. Timeout: 15 s.
  • intake-ses-parser — S3 PUT trigger on s3://sn-raw-mime/. Parses MIME, extracts the text body of the forwarded carrier email, and calls Bedrock Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0 via global.anthropic.claude-haiku-4-5-20251001-v1:0) to extract the tracking number and the new status. Posts the proposal to Slack via chat.postMessage with Approve/Edit/Discard buttons. Memory: 512 MB. Timeout: 30 s.
  • notifier — EventBridge Scheduler target, every 30 minutes during the day (the schedule expression runs in TZ_NAME set to the SMB’s timezone, e.g. Asia/Singapore). Reads s3://sn-orders-source/orders.csv and the rules and voice docs. For each row, compares current status to last-sent, reads state from sn-sends and sn-prefs, decides on a move. Emits one event per row that needs an update: sn.shipped, sn.out_for_delivery, sn.delivered, or sn.delayed, with the order context as the event payload. Orders with nothing new emit nothing. Memory: 512 MB. Timeout: 60 s. No Bedrock calls.
  • sender — EventBridge rule on the four move events. Resolves contact, checks quiet hours and the unsubscribe flag, formats the update from the voice template, and ships via SES SendRawEmail from the verified sending identity. On a quiet-hours defer, creates a one-off EventBridge Scheduler rule that re-invokes sender at the start of the next sending window. On a sn.delayed event, adds the owner as a recipient. Writes a row to sn-sends after a successful send. Memory: 256 MB. Timeout: 30 s.
  • unsub-handler — Lambda Function URL, public with AuthType: NONE; the unsubscribe link carries a signed token so only the real recipient can opt out their own order. Writes the opt-out to sn-prefs and an audit row to sn-audit. Returns a small confirmation page. Memory: 256 MB. Timeout: 15 s.
  • digest — EventBridge Scheduler target, weekly Sunday 6pm. Reads sn-sends and the sn-unmatched list for the past week; sends a digest message to a configured Slack channel summarizing updates sent, orders in flight, and any unmatched tracking numbers. No Bedrock; the message is a plain summary table. Memory: 256 MB.
  • summary — EventBridge Scheduler target, monthly on the first Monday at 9am. Reads the past month’s sn-sends and sn-audit; calls Bedrock Haiku 4.5 to write a one-paragraph narrative (orders shipped, delivered, average days in transit, how many ran late); emails it via SES to the configured stakeholder list. Memory: 512 MB.

Storage

  • DynamoDB · sn-sends — one row per update sent. PK (order_id, status); attributes: sent_date, sent_via (customer/owner), recipient, move (shipped/out_for_delivery/delivered/delayed). On-demand. No TTL.
  • DynamoDB · sn-prefs — one row per order’s notification preference. PK order_id; attributes: unsubscribed (bool), mute_until (date, optional), updated_by. On-demand.
  • DynamoDB · sn-audit — one row per write action of any kind. PK (order_id, ts); attributes: action, days_late (if delay), before, after. On-demand. No TTL — this is the long-term audit trail.
  • DynamoDB · sn-unmatched — webhook tracking updates that matched no order. PK tracking_no; attributes: status, received_at, raw. On-demand. TTL 30 days.
  • S3 · sn-orders-source — mirrored CSV from the Drive order list sheet. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 3 years.
  • S3 · sn-rules-source — mirrored rules and voice docs as plain text. Versioning enabled.
  • S3 · sn-raw-mime — raw inbound MIME from forwarded carrier emails. Lifecycle to Glacier at 30 days; expiry at 1 year.

Bedrock

  • Foundation model. anthropic.claude-haiku-4-5-20251001-v1:0 via the Global cross-Region inference profile global.anthropic.claude-haiku-4-5-20251001-v1:0. Two callsites: intake-ses-parser for reading forwarded carrier emails, and summary for the monthly narrative. Claude Sonnet 4.6 isn’t used here — neither task needs the heavier reasoning, so Haiku 4.5 is the right cost-for-quality fit.
  • Embeddings. Not used. The order list is structured rows; deterministic lookup beats vector retrieval here. No Knowledge Base, no S3 Vectors.
  • Quotas. Default account quotas are more than enough at SMB volume. The notifier itself doesn’t call Bedrock; the parsing lane fires a few times a month at most.

EventBridge Scheduler config

  • sn-status-checkrate(30 minutes) with a FlexibleTimeWindow; target: notifier Lambda. A daytime-only window is enforced in code, not the schedule, so a late-night delivered scan still gets deferred rather than dropped.
  • sn-drive-syncrate(5 minutes). Target: drive-sync Lambda.
  • sn-weekly-digestcron(0 18 ? * SUN *) in TZ. Target: digest Lambda.
  • sn-monthly-summarycron(0 9 ? * 2#1 *) (first Monday at 9am) in TZ. Target: summary Lambda.
  • One-off rules — created on the fly by sender when a quiet-hours defer is needed. Use at(YYYY-MM-DDTHH:MM:SS) expressions with --action-after-completion DELETE so the rule self-cleans.

SES inbound and outbound

  • Set the MX record on a dedicated subdomain (e.g. tracking.your-company.com) to inbound-smtp.ap-southeast-1.amazonaws.com.
  • SES inbound rule set sn-inbound-rules: one rule with recipient tracking@your-company.com → spam scan → S3 PUT to s3://sn-raw-mime/<message-id> → stop. The S3 PUT triggers intake-ses-parser.
  • SES outbound for the customer updates: verify a sending identity at orders@your-company.com with DKIM and SPF on the parent domain, and a list-unsubscribe header on every message. Out of sandbox by request.

IAM (least privilege per Lambda)

Each Lambda has its own role with policies scoped to exact ARNs. Sketch:

  • notifier role: s3:GetObject on the orders, rules, and voice keys; dynamodb:Query + GetItem on sn-sends, sn-prefs; events:PutEvents on the default bus. No bedrock:*.
  • sender role: events:CreateSchedule for the deferred-send one-offs; ses:SendRawEmail from the verified sending identity; dynamodb:PutItem on sn-sends; s3:GetObject on the voice template; secretsmanager:GetSecretValue only if a per-tenant sender config is used.
  • webhook-handler role: secretsmanager:GetSecretValue on the carrier secret; secretsmanager:GetSecretValue on the Sheets-API service-account secret; outbound network access to sheets.googleapis.com; dynamodb:PutItem on sn-unmatched.
  • intake-ses-parser role: s3:GetObject on sn-raw-mime; bedrock:InvokeModel on the Haiku ARN; secretsmanager:GetSecretValue on the Slack bot token.
  • unsub-handler role: dynamodb:PutItem on sn-prefs and sn-audit; secretsmanager:GetSecretValue on the token-signing secret.
  • drive-sync role: secretsmanager:GetSecretValue on the Google service-account secret; s3:PutObject on the orders and rules buckets; outbound network to www.googleapis.com.

Carrier webhook flow

Most carriers (and aggregators like a tracking API) can POST a tracking-update payload to a URL on each scan event. The webhook-handler Function URL is that URL. On each request it verifies the carrier’s signature (an HMAC over the body with a shared secret, or a bearer token, depending on the carrier), parses the tracking number and the new status, normalizes the carrier’s status vocabulary into the system’s five stages (a small mapping table in the rules doc handles carrier-specific status names), and updates the matching order’s status in the Drive sheet via the Sheets API.

Because the webhook writes the carrier’s authoritative status with no human in the loop, the only safety checks are the signature verification and the tracking-number match. A request that fails the signature is rejected with a 401 and logged. A request whose tracking number matches no order is written to sn-unmatched so a person can reconcile it from the weekly digest, rather than being silently dropped.

Observability and cost gates

  • CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on "error" + "throttle" + "timeout" to a CloudWatch metric for alerting.
  • Alarms: notifier Lambda failures > 0 in an hour (the check is the one piece that has to run); sender failure rate > 1% in 24h; webhook signature-verification failures > 5/hour (might mean the carrier secret rotated).
  • X-Ray: off by default. Not worth the cost at SMB volume.
  • AWS Budgets: $20/month threshold, alarm at 80% and 100%, posts to SNS topic sn-cost-alarm subscribed to the on-call admin’s email and Slack.

Config and secrets

Service-account credentials for Drive and Sheets APIs live in Secrets Manager under sn/drive/sa (one service account with scopes for both APIs). Carrier webhook secrets live under sn/carrier/*. Slack bot token lives under sn/slack/bot-token. The unsubscribe token-signing secret is under sn/unsub/signing-secret. The configured timezone, quiet-hours window, expected-delivery windows per carrier, owner contact, and grace setting all live in Parameter Store under /sn/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.

Deploy

GitHub Actions + OIDC + AWS SAM, no long-lived keys. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for both sn-orders-source and sn-rules-source so a bad Drive edit can be rolled back in one click, and version the EventBridge Scheduler timezone setting so you don’t accidentally start running the check in UTC after a CI rotation. Total deployable surface: around eight Lambdas, four DDB tables, three S3 buckets, one EventBridge rule on the default bus (plus the Scheduler rules), one SES rule set, and one Budgets alarm.

That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.

All posts