Part 7 of 7 · Loyalty tracker series ~8 min read

Engineering reference: the loyalty tracker architecture

Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the SES outbound config, EventBridge Scheduler config, the DynamoDB schemas, and the conditional-write pattern that keeps balances exact. Read alongside the previous six posts; this one’s the build sheet.

Region and account shape

Default region: ap-southeast-1 (Singapore). SES outbound, Bedrock Global cross-Region inference, and EventBridge Scheduler are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for a shop is a balance email that goes out a few minutes late, not a regional outage. One AWS account dedicated to the tracker (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.

Topology

AWS topology of the loyalty tracker A topology diagram with three regions stacked vertically inside one AWS account boundary. Top region: ingress. Three boxes show the three intake surfaces — a sign-up Function URL that writes new members to the member store and sends an SES confirm email, a POS webhook Function URL that receives each completed sale from the till and invokes the earn function, and a drive-sync Lambda triggered every 15 minutes by EventBridge Scheduler that mirrors the rules and voice docs to s3://loy-rules-source/. Middle region: per-sale and scheduled processing. The earn function is invoked by the POS webhook; it finds or creates the member in DynamoDB, reads rules.txt from S3, computes base and bonus points, applies a conditional UpdateItem to the balance, writes a row to loy-ledger, and on crossing a tier emits a loy.reward_earned event to the EventBridge default bus; healthy adds emit nothing. Scheduled jobs sit alongside: a nightly balance-email job, a weekly lapsing sweep, and a monthly summary, all on EventBridge Scheduler. Bottom region: dispatch and redemption. The email-dispatch Lambda is triggered by loy.reward_earned and by the scheduled email jobs; it reads the voice template from s3://loy-rules-source/voice.txt, sends via SES outbound respecting quiet hours and the unsubscribe list, and logs to loy-ledger. The redeem-handler Function URL backs the staff redeem desk: it reads the balance, applies a conditional UpdateItem that only subtracts when the balance is still high enough, writes a redemption row to loy-ledger, and supports a one-tap reverse. CloudWatch Logs collects from every Lambda at 7-day retention. Across the right edge: a small box labelled AWS Budgets alarm at $15 monthly threshold, posting to SNS topic loy-cost-alarm. A note at the bottom: every points change is logged to loy-ledger and is reversible. Ingress FnURL · sign-up form + quick-add writes member, loy-members SES confirm email FnURL · POS webhook loy-pos-webhook sale from the till phone, total, items invokes earn function Lambda · drive-sync every 15 min rules + voice docs → s3://loy-rules-source/ rules.txt, voice.txt DynamoDB member store loy-members · keyed by phone Per-sale & scheduled processing EventBridge Scheduler nightly balance email weekly lapsing sweep monthly summary + drive-sync rate Lambda · earn reads rules.txt + member from DDB conditional UpdateItem, picks one of four moves EventBridge default bus loy.reward_earned loy.held_for_review loy.nightly_email (add points → no event) Dispatch & redemption Lambda · email-dispatch reads voice.txt, quiet hours, opt-out; SES outbound; logs loy-ledger Redeem desk staff page with [Confirm] button taps → Function URL Lambda · redeem-handler conditional subtract, writes loy-ledger, one-tap reverse, loy-audit on admin acts Every points change is logged to loy-ledger — and is reversible.
Fig 7. AWS topology, in three regions of the diagram: ingress (sign-up, the POS webhook, and the rules sync), per-sale and scheduled processing (the earn function and the timed jobs emitting events), dispatch and redemption (emails go out and staff redeem with a confirm). Every Lambda is event-, request-, or schedule-driven; nothing is synchronous-chained.

Lambda functions

All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.

  • loy-pos-webhook — Lambda Function URL, AuthType: NONE; verifies a shared-secret HMAC header that the POS includes on each request. Receives one completed sale (phone or member id, basket total, line items), validates the payload, and invokes earn synchronously. If the till can only export a file, a small companion script reads the daily sales export and posts each row to this URL. Memory: 256 MB. Timeout: 15 s.
  • earn — the per-sale points engine. Finds or creates the member in loy-members; reads rules.txt from s3://loy-rules-source/ (cached for the lifetime of the execution environment); computes base + bonus points; runs the four-move decision from Part 3. On add and add + reward, applies a conditional UpdateItem with ADD balance :pts so concurrent sales never lose points; writes a loy-ledger row; and on a tier crossing emits loy.reward_earned to the default bus. Hold for review writes to loy-review and emits loy.held_for_review. Memory: 256 MB. Timeout: 15 s. No Bedrock calls.
  • signup — Lambda Function URL backing both the web form and the staff quick-add. Writes a member to loy-members with PutItem + a attribute_not_exists(phone) condition so a re-submit of the same phone updates rather than clobbers; on a supplied email, sends an SES double opt-in confirm and only sets opted_in = true when the confirm link is tapped (a second tiny Function URL path). Memory: 256 MB. Timeout: 15 s.
  • redeem-handler — Lambda Function URL, AuthType: NONE; verifies a staff session token. Backs the redeem desk: GetItem the member, return claimable rewards; on Confirm, apply a conditional UpdateItem (ADD balance :neg with ConditionExpression balance >= :cost) so the subtract only succeeds when funds remain, which also blocks double-redeem; write the redemption to loy-ledger. A reverse action adds the points back and writes a matching reversed ledger row plus a loy-audit row. Memory: 256 MB. Timeout: 15 s.
  • email-dispatch — EventBridge rule on loy.reward_earned and the scheduled email events. Reads the matching template from voice.txt, fills placeholders (balance, points to next reward, reward name), checks the quiet-hours window and the opt-out flag, and sends via SES SendEmail. Writes a send record. Memory: 256 MB. Timeout: 30 s.
  • nightly-balance — EventBridge Scheduler target, nightly. Queries loy-ledger for members whose balance changed today, and for each emits a loy.nightly_email event that email-dispatch turns into a short balance email. Batches recipients to stay within SES rate limits. Memory: 256 MB.
  • lapsing-sweep — EventBridge Scheduler target, weekly. Scans loy-members for active members whose last_seen is older than the lapsing window in rules.txt and whose visit count clears the regular threshold; writes them to loy-lapsing, stamps each as lapsing, and applies the configured default action (auto-nudge or owner-review). Bounded by max_nudges. No Bedrock. Memory: 256 MB.
  • summary — EventBridge Scheduler target, monthly on the first Monday at 9am. Reads the past month’s loy-ledger and loy-audit; calls Bedrock Haiku 4.5 to write a one-paragraph plain-English owner narrative (joins, rewards claimed, win-back results); emails it via SES to the owner. Memory: 512 MB.
  • drive-sync — EventBridge Scheduler target, every 15 minutes. Uses the Google Drive API (service-account credentials in Secrets Manager under loy/drive/sa) to export the rules and voice docs as plain text and write to s3://loy-rules-source/ only if they changed since the last sync. Also writes a daily backup of loy-members to S3. Memory: 256 MB. Timeout: 30 s.

Storage

  • DynamoDB · loy-members — one row per member. PK phone; attributes: name, email, opted_in, balance (number), tier, visit_count, last_seen, joined, status (active/rested). On-demand. No TTL.
  • DynamoDB · loy-ledger — one row per points change of any kind. PK (phone, ts); attributes: points_change (signed), reason (earn/bonus-item/reward-redeem/win-back-bonus/reversed), sale_id or reward, by_staff. On-demand. No TTL — this is the long-term audit of every point.
  • DynamoDB · loy-audit — one row per admin or lifecycle action (reverse, rest, bonus-grant, override). PK (phone, ts); attributes: action, by_user, before, after. On-demand. No TTL.
  • DynamoDB · loy-review — held-for-review sales awaiting a staff decision. PK (phone, sale_id); attributes: total, computed_points, flag_reason, status. On-demand.
  • DynamoDB · loy-lapsing — the current lapsing list. PK phone; attributes: weeks_away, nudge_count, last_nudge. On-demand.
  • S3 · loy-rules-source — mirrored rules and voice docs as plain text. Versioning enabled, so a bad Drive edit rolls back in one click.
  • S3 · loy-backups — daily export of loy-members and loy-ledger. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 7 years.

Bedrock

  • Foundation model. anthropic.claude-haiku-4-5-20251001-v1:0 via the Global cross-Region inference profile global.anthropic.claude-haiku-4-5-20251001-v1:0. One callsite: summary for the monthly owner narrative. Claude Sonnet 4.6 (anthropic.claude-sonnet-4-6-20250930-v1:0) is wired but unused — the summary is a short, low-stakes paragraph that Haiku handles well, so the heavier model isn’t justified here.
  • Embeddings. Not used. Balances are structured numbers; an exact lookup beats vector retrieval here. No Knowledge Base, no S3 Vectors.
  • Quotas. Default account quotas are more than enough at SMB volume. The earn and redeem paths don’t call Bedrock at all; summary fires once a month.

EventBridge Scheduler config

  • loy-nightly-balancecron(0 19 * * ? *) in the shop’s timezone. Target: nightly-balance Lambda.
  • loy-drive-syncrate(15 minutes). Target: drive-sync Lambda.
  • loy-lapsing-sweepcron(0 9 ? * MON *) in TZ. Target: lapsing-sweep Lambda.
  • loy-monthly-summarycron(0 9 ? * 2#1 *) (first Monday at 9am) in TZ. Target: summary Lambda.
  • Quiet hoursemail-dispatch enforces the configured window; events that arrive inside it are deferred with a one-off at(...) schedule that re-invokes dispatch at the next allowed minute, created with --action-after-completion DELETE so the rule self-cleans.

SES outbound

  • Verify a sender identity at rewards@your-shop.com with DKIM and SPF on the parent domain. Out of the SES sandbox by request before launch.
  • One configuration set with event publishing for bounces and complaints to an SNS topic; a small handler flips opted_in = false on a hard bounce or a complaint so the shop stops mailing bad or unhappy addresses.
  • Every marketing-style email (nightly balance, win-back nudge) carries a one-tap unsubscribe link backed by a tiny Function URL path; the reward-earned email is transactional and always sends.

IAM (least privilege per Lambda)

Each Lambda has its own role with policies scoped to exact ARNs. Sketch:

  • earn role: s3:GetObject on the rules key; dynamodb:GetItem + UpdateItem + PutItem on loy-members, loy-ledger, and loy-review; events:PutEvents on the default bus. No bedrock:*.
  • redeem-handler role: dynamodb:GetItem + UpdateItem on loy-members (the conditional subtract); dynamodb:PutItem on loy-ledger and loy-audit; secretsmanager:GetSecretValue on the staff-session signing key. No ses:*, no bedrock:*.
  • email-dispatch role: s3:GetObject on the voice key; ses:SendEmail from the verified sender identity; dynamodb:GetItem on loy-members for the opt-out check; dynamodb:PutItem on loy-ledger; scheduler:CreateSchedule for quiet-hours defers.
  • signup role: dynamodb:PutItem + UpdateItem on loy-members; ses:SendEmail for the double opt-in confirm only.
  • drive-sync role: secretsmanager:GetSecretValue on the Google service-account secret; s3:PutObject on loy-rules-source and loy-backups; dynamodb:Scan on loy-members for the backup; outbound network to www.googleapis.com.

Concurrency and correctness

The one rule that matters most: a balance is never read-modified-written in application code. Every change is a conditional UpdateItem with an atomic ADD on the balance attribute, so two sales, or a sale and a redeem, applied in the same instant both land without losing each other. Redeems carry a ConditionExpression balance >= :cost; if the condition fails, DynamoDB rejects the write and the handler returns “not enough points” rather than overdrawing. The same condition makes double-redeem impossible: the second identical subtract fails its condition. Idempotency keys on the POS webhook (the till’s sale_id) make a retried sale a no-op, so a flaky network can’t double-credit a customer.

Observability and cost gates

  • CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on "error" + "throttle" + "condition_failed" to a metric for alerting.
  • Alarms: earn-function errors > 0 in an hour (a dropped sale is a lost point); redeem condition-failures spiking (might mean a stale staff screen or a bug); SES bounce rate > 2% (protect sender reputation).
  • X-Ray: off by default. Not worth the cost at SMB volume.
  • AWS Budgets: $15/month threshold, alarm at 80% and 100%, posts to SNS topic loy-cost-alarm subscribed to the owner’s email.

Config and secrets

Service-account credentials for the Drive API live in Secrets Manager under loy/drive/sa. The POS shared secret, the staff-session signing key, and the SES configuration live under loy/*. The configured timezone, quiet-hours window, lapsing window, max_nudges, and the regular threshold all live in Parameter Store under /loy/config/ — the operational knobs that don’t belong in the owner-editable rules doc. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.

Deploy

GitHub Actions with OIDC (no long-lived keys) and AWS SAM. The opinionated bits: turn on S3 versioning for loy-rules-source so a bad Drive edit can be rolled back in one click, version the EventBridge Scheduler timezone setting so you don’t accidentally start running the nightly job in UTC after a CI rotation, and keep the POS webhook’s shared secret out of the repo (it lives only in Secrets Manager, injected at deploy). Total deployable surface: around nine Lambdas, five DDB tables, two S3 buckets, one EventBridge rule on the default bus (plus the Scheduler rules), one SES configuration set, and one Budgets alarm.

That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.

All posts