Engineering reference: the loyalty tracker architecture
Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the SES outbound config, EventBridge Scheduler config, the DynamoDB schemas, and the conditional-write pattern that keeps balances exact. Read alongside the previous six posts; this one’s the build sheet.
Region and account shape
Default region: ap-southeast-1 (Singapore). SES outbound, Bedrock Global cross-Region inference, and EventBridge Scheduler are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for a shop is a balance email that goes out a few minutes late, not a regional outage. One AWS account dedicated to the tracker (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.
Topology
Lambda functions
All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.
loy-pos-webhook— Lambda Function URL,AuthType: NONE; verifies a shared-secret HMAC header that the POS includes on each request. Receives one completed sale (phone or member id, basket total, line items), validates the payload, and invokesearnsynchronously. If the till can only export a file, a small companion script reads the daily sales export and posts each row to this URL. Memory: 256 MB. Timeout: 15 s.earn— the per-sale points engine. Finds or creates the member inloy-members; readsrules.txtfroms3://loy-rules-source/(cached for the lifetime of the execution environment); computes base + bonus points; runs the four-move decision from Part 3. On add and add + reward, applies a conditionalUpdateItemwithADD balance :ptsso concurrent sales never lose points; writes aloy-ledgerrow; and on a tier crossing emitsloy.reward_earnedto the default bus. Hold for review writes toloy-reviewand emitsloy.held_for_review. Memory: 256 MB. Timeout: 15 s. No Bedrock calls.signup— Lambda Function URL backing both the web form and the staff quick-add. Writes a member toloy-memberswithPutItem+ aattribute_not_exists(phone)condition so a re-submit of the same phone updates rather than clobbers; on a supplied email, sends an SES double opt-in confirm and only setsopted_in = truewhen the confirm link is tapped (a second tiny Function URL path). Memory: 256 MB. Timeout: 15 s.redeem-handler— Lambda Function URL,AuthType: NONE; verifies a staff session token. Backs the redeem desk:GetItemthe member, return claimable rewards; on Confirm, apply a conditionalUpdateItem(ADD balance :negwithConditionExpression balance >= :cost) so the subtract only succeeds when funds remain, which also blocks double-redeem; write the redemption toloy-ledger. Areverseaction adds the points back and writes a matchingreversedledger row plus aloy-auditrow. Memory: 256 MB. Timeout: 15 s.email-dispatch— EventBridge rule onloy.reward_earnedand the scheduled email events. Reads the matching template fromvoice.txt, fills placeholders (balance, points to next reward, reward name), checks the quiet-hours window and the opt-out flag, and sends via SESSendEmail. Writes a send record. Memory: 256 MB. Timeout: 30 s.nightly-balance— EventBridge Scheduler target, nightly. Queriesloy-ledgerfor members whose balance changed today, and for each emits aloy.nightly_emailevent thatemail-dispatchturns into a short balance email. Batches recipients to stay within SES rate limits. Memory: 256 MB.lapsing-sweep— EventBridge Scheduler target, weekly. Scansloy-membersfor active members whoselast_seenis older than the lapsing window inrules.txtand whose visit count clears the regular threshold; writes them toloy-lapsing, stamps each as lapsing, and applies the configured default action (auto-nudge or owner-review). Bounded bymax_nudges. No Bedrock. Memory: 256 MB.summary— EventBridge Scheduler target, monthly on the first Monday at 9am. Reads the past month’sloy-ledgerandloy-audit; calls Bedrock Haiku 4.5 to write a one-paragraph plain-English owner narrative (joins, rewards claimed, win-back results); emails it via SES to the owner. Memory: 512 MB.drive-sync— EventBridge Scheduler target, every 15 minutes. Uses the Google Drive API (service-account credentials in Secrets Manager underloy/drive/sa) to export the rules and voice docs as plain text and write tos3://loy-rules-source/only if they changed since the last sync. Also writes a daily backup ofloy-membersto S3. Memory: 256 MB. Timeout: 30 s.
Storage
- DynamoDB ·
loy-members— one row per member. PKphone; attributes:name,email,opted_in,balance(number),tier,visit_count,last_seen,joined,status(active/rested). On-demand. No TTL. - DynamoDB ·
loy-ledger— one row per points change of any kind. PK(phone, ts); attributes:points_change(signed),reason(earn/bonus-item/reward-redeem/win-back-bonus/reversed),sale_idorreward,by_staff. On-demand. No TTL — this is the long-term audit of every point. - DynamoDB ·
loy-audit— one row per admin or lifecycle action (reverse, rest, bonus-grant, override). PK(phone, ts); attributes:action,by_user,before,after. On-demand. No TTL. - DynamoDB ·
loy-review— held-for-review sales awaiting a staff decision. PK(phone, sale_id); attributes:total,computed_points,flag_reason,status. On-demand. - DynamoDB ·
loy-lapsing— the current lapsing list. PKphone; attributes:weeks_away,nudge_count,last_nudge. On-demand. - S3 ·
loy-rules-source— mirrored rules and voice docs as plain text. Versioning enabled, so a bad Drive edit rolls back in one click. - S3 ·
loy-backups— daily export ofloy-membersandloy-ledger. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 7 years.
Bedrock
- Foundation model.
anthropic.claude-haiku-4-5-20251001-v1:0via the Global cross-Region inference profileglobal.anthropic.claude-haiku-4-5-20251001-v1:0. One callsite:summaryfor the monthly owner narrative. Claude Sonnet 4.6 (anthropic.claude-sonnet-4-6-20250930-v1:0) is wired but unused — the summary is a short, low-stakes paragraph that Haiku handles well, so the heavier model isn’t justified here. - Embeddings. Not used. Balances are structured numbers; an exact lookup beats vector retrieval here. No Knowledge Base, no S3 Vectors.
- Quotas. Default account quotas are more than enough at SMB volume. The earn and redeem paths don’t call Bedrock at all;
summaryfires once a month.
EventBridge Scheduler config
loy-nightly-balance—cron(0 19 * * ? *)in the shop’s timezone. Target:nightly-balanceLambda.loy-drive-sync—rate(15 minutes). Target:drive-syncLambda.loy-lapsing-sweep—cron(0 9 ? * MON *)in TZ. Target:lapsing-sweepLambda.loy-monthly-summary—cron(0 9 ? * 2#1 *)(first Monday at 9am) in TZ. Target:summaryLambda.- Quiet hours —
email-dispatchenforces the configured window; events that arrive inside it are deferred with a one-offat(...)schedule that re-invokes dispatch at the next allowed minute, created with--action-after-completion DELETEso the rule self-cleans.
SES outbound
- Verify a sender identity at
rewards@your-shop.comwith DKIM and SPF on the parent domain. Out of the SES sandbox by request before launch. - One configuration set with event publishing for bounces and complaints to an SNS topic; a small handler flips
opted_in = falseon a hard bounce or a complaint so the shop stops mailing bad or unhappy addresses. - Every marketing-style email (nightly balance, win-back nudge) carries a one-tap unsubscribe link backed by a tiny Function URL path; the reward-earned email is transactional and always sends.
IAM (least privilege per Lambda)
Each Lambda has its own role with policies scoped to exact ARNs. Sketch:
- earn role:
s3:GetObjecton the rules key;dynamodb:GetItem+UpdateItem+PutItemonloy-members,loy-ledger, andloy-review;events:PutEventson the default bus. Nobedrock:*. - redeem-handler role:
dynamodb:GetItem+UpdateItemonloy-members(the conditional subtract);dynamodb:PutItemonloy-ledgerandloy-audit;secretsmanager:GetSecretValueon the staff-session signing key. Noses:*, nobedrock:*. - email-dispatch role:
s3:GetObjecton the voice key;ses:SendEmailfrom the verified sender identity;dynamodb:GetItemonloy-membersfor the opt-out check;dynamodb:PutItemonloy-ledger;scheduler:CreateSchedulefor quiet-hours defers. - signup role:
dynamodb:PutItem+UpdateItemonloy-members;ses:SendEmailfor the double opt-in confirm only. - drive-sync role:
secretsmanager:GetSecretValueon the Google service-account secret;s3:PutObjectonloy-rules-sourceandloy-backups;dynamodb:Scanonloy-membersfor the backup; outbound network towww.googleapis.com.
Concurrency and correctness
The one rule that matters most: a balance is never read-modified-written in application code. Every change is a conditional UpdateItem with an atomic ADD on the balance attribute, so two sales, or a sale and a redeem, applied in the same instant both land without losing each other. Redeems carry a ConditionExpression balance >= :cost; if the condition fails, DynamoDB rejects the write and the handler returns “not enough points” rather than overdrawing. The same condition makes double-redeem impossible: the second identical subtract fails its condition. Idempotency keys on the POS webhook (the till’s sale_id) make a retried sale a no-op, so a flaky network can’t double-credit a customer.
Observability and cost gates
- CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on
"error"+"throttle"+"condition_failed"to a metric for alerting. - Alarms: earn-function errors > 0 in an hour (a dropped sale is a lost point); redeem condition-failures spiking (might mean a stale staff screen or a bug); SES bounce rate > 2% (protect sender reputation).
- X-Ray: off by default. Not worth the cost at SMB volume.
- AWS Budgets: $15/month threshold, alarm at 80% and 100%, posts to SNS topic
loy-cost-alarmsubscribed to the owner’s email.
Config and secrets
Service-account credentials for the Drive API live in Secrets Manager under loy/drive/sa. The POS shared secret, the staff-session signing key, and the SES configuration live under loy/*. The configured timezone, quiet-hours window, lapsing window, max_nudges, and the regular threshold all live in Parameter Store under /loy/config/ — the operational knobs that don’t belong in the owner-editable rules doc. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.
Deploy
GitHub Actions with OIDC (no long-lived keys) and AWS SAM. The opinionated bits: turn on S3 versioning for loy-rules-source so a bad Drive edit can be rolled back in one click, version the EventBridge Scheduler timezone setting so you don’t accidentally start running the nightly job in UTC after a CI rotation, and keep the POS webhook’s shared secret out of the repo (it lives only in Secrets Manager, injected at deploy). Total deployable surface: around nine Lambdas, five DDB tables, two S3 buckets, one EventBridge rule on the default bus (plus the Scheduler rules), one SES configuration set, and one Budgets alarm.
That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.
All posts