Engineering reference: the weekly report builder architecture

Region and account shape

Default region: ap-southeast-1 (Singapore). Bedrock cross-Region inference, SES outbound, and EventBridge Scheduler are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for an SMB is a report landing an hour late, not a regional outage. One AWS account dedicated to the builder (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.

Topology

Fig 7. AWS topology, in three regions of the diagram: ingress (three source kinds into one bucket), scheduled processing (the weekly builder run computing, checking, and writing), dispatch and lookup (the report ships and the full rows stay one link away). Every Lambda is schedule- or request-driven; nothing is synchronous-chained.

Lambda functions

All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.

source-sync — EventBridge Scheduler target, fires every 30 minutes. Uses the Google Drive API + Sheets API (service-account credentials in Secrets Manager under wr/google/sa) to export the hand-kept registry sheet as CSV and mirror any tool CSVs in the configured Drive folder to s3://wr-source-data/, writing only if a source has changed since the last sync. The same pattern syncs the config and voice docs to s3://wr-config-source/. Memory: 256 MB. Timeout: 30 s.
pos-sync — EventBridge Scheduler target, scheduled per the POS’s roll-up cadence (typically nightly plus a morning catch-up). Pulls the point-of-sale daily summary from wherever it lands (an SFTP drop, a vendor API, or a Drive file) and writes it to s3://wr-source-data/pos/. Kept separate from source-sync because POS integrations vary the most and benefit from their own retry and timeout tuning. Memory: 256 MB. Timeout: 60 s.
builder — EventBridge Scheduler target, weekly Monday 7am in the owner’s timezone (the schedule expression runs in TZ_NAME, e.g. Asia/Singapore). Reads every source from s3://wr-source-data/ and the config and voice docs. Normalizes into one figure set; computes this week, last week, and the four-week average per figure; runs the three look-off checks; builds the facts list (flagged figures withheld); calls Bedrock Haiku 4.5 once for the summary; verifies every number in the draft against the figure set, dropping any unmatched sentence; assembles the report. Hands the assembled report to the send step in the same invocation. Memory: 512 MB. Timeout: 120 s. Exactly one Bedrock call per run.
send step — runs inside the builder invocation (not a separate function) once the report is assembled. Resolves recipients from the config doc, confirms the owner timezone from Parameter Store, runs the completeness check, composes the HTML email, and ships via SES SendRawEmail from the verified sender identity. On an incomplete week, instead of sending it creates a one-off EventBridge Scheduler rule that re-invokes builder in retry mode a couple of hours later. Writes a row to wr-runs after a successful send and any flags to wr-flags.
lookup-handler — Lambda Function URL, public with AuthType: NONE; verifies a short-lived signed token embedded in the email footer link. Triggered when the owner clicks “show the rows behind this figure.” Reads the relevant source slice from s3://wr-source-data/ for the reported week and returns it as a simple HTML table. Read-only; writes nothing. Memory: 256 MB. Timeout: 15 s.

Storage

DynamoDB · wr-runs — one row per weekly send. PK (owner_id, week_start); attributes: sent_at, recipients, figures (the reported figure set as a map), summary_text, dropped_sentences count. On-demand. No TTL — this is what next week’s comparison reads.
DynamoDB · wr-flags — one row per flagged figure. PK (owner_id, week_start); sort key figure_check; attributes: check (stale/out_of_range/reconcile), figure, expected, actual, source. On-demand. No TTL — the long-term record of which sources misbehave.
S3 · wr-source-data — mirrored source CSVs and the POS roll-ups. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 7 years.
S3 · wr-config-source — mirrored config and voice docs as plain text. Versioning enabled.
S3 · wr-reports — the assembled HTML report for each week, kept for reference and for the lookup link. Versioning enabled. Lifecycle to Glacier at 90 days.

Bedrock

Foundation model. anthropic.claude-haiku-4-5-20251001-v1:0 via the Global cross-Region inference profile global.anthropic.claude-haiku-4-5-20251001-v1:0. One callsite: builder, for the weekly summary paragraph. The heavier claude-sonnet-4-6 isn’t used — turning a short facts list into a paragraph is well within Haiku’s range and doesn’t justify the cost.
Grounding contract. The prompt hands the model only the computed facts list (statements with numbers attached) and instructs it to introduce no figure not in that list. The output is then number-checked in code against the figure set; any sentence whose number doesn’t match the set is dropped before send. The model never sees raw source data and never sources a number.
Embeddings. Not used. The numbers are structured rows; deterministic arithmetic beats vector retrieval here. No Knowledge Base, no S3 Vectors.
Quotas. Default account quotas are far more than enough — one call a week per owner.

EventBridge Scheduler config

wr-weekly-run — cron(0 7 ? * 2 *) (Monday 7am) in the owner’s timezone. Target: builder Lambda.
wr-source-sync — rate(30 minutes). Target: source-sync Lambda.
wr-pos-sync — per the POS cadence, e.g. cron(30 1 * * ? *) plus a morning catch-up. Target: pos-sync Lambda.
One-off retry rules — created on the fly by the send step when the completeness check holds a send. Use at(YYYY-MM-DDTHH:MM:SS) expressions in TZ with --action-after-completion DELETE so the rule self-cleans.

SES outbound

Verify a sender identity at reports@your-company.com with DKIM and SPF on the parent domain so the weekly email lands in the inbox, not spam.
The send step uses SendRawEmail so the report can be a full multipart HTML message with the numbers table inline.
Out of the SES sandbox by request before go-live; the recipient list is small and static, so this is a one-time step.

IAM (least privilege per Lambda)

Each Lambda has its own role with policies scoped to exact ARNs. Sketch:

builder role: s3:GetObject on wr-source-data, wr-config-source; s3:PutObject on wr-reports; dynamodb:Query + GetItem + PutItem on wr-runs and wr-flags; bedrock:InvokeModel on the Haiku ARN; ses:SendRawEmail from the verified sender; scheduler:CreateSchedule for retry one-offs; ssm:GetParameter on /wr/config/*.
source-sync role: secretsmanager:GetSecretValue on the Google service-account secret; s3:PutObject on wr-source-data and wr-config-source; outbound network to www.googleapis.com.
pos-sync role: secretsmanager:GetSecretValue on the POS credential secret; s3:PutObject on wr-source-data; outbound network to the POS endpoint only.
lookup-handler role: s3:GetObject on wr-source-data and wr-reports; ssm:GetParameter on the link-signing key. Read-only — no write permissions, no Bedrock, no SES.

The grounding flow, in code

The contract that makes the report trustworthy is enforced in three places, not one. First, the gather step is the only thing that ever computes a figure; everything downstream reads from its output, never from raw sources. Second, the facts list handed to Bedrock is a closed set — flagged figures are excluded, so the model can’t even mention a number that failed a check. Third, the post-model number-check scans the draft, extracts every numeric token, and matches it against the figure set within a small rounding tolerance; an unmatched token drops its whole sentence and increments the dropped_sentences counter on the run.

The counter matters operationally: a run with a non-zero drop count is logged at WARN, and a sustained pattern of drops means the prompt or the model is drifting and should be reviewed. In steady state the count is zero — Haiku 4.5 handed a tight facts list and told to describe only it rarely strays — but the system is built so that “rarely” never reaches the owner as a wrong number.

Observability and cost gates

CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on "error" + "throttle" + "timeout" + "dropped_sentences" to a CloudWatch metric for alerting.
Alarms: builder failures > 0 on a Monday (the weekly run is the one piece that has to succeed); send failures > 0; dropped_sentences > 0 for two consecutive weeks (the model may be drifting); a source that trips the stale check three weeks running (fix it at the source).
X-Ray: off by default. Not worth the cost at SMB volume.
AWS Budgets: $15/month threshold, alarm at 80% and 100%, posts to SNS topic wr-cost-alarm subscribed to the admin’s email.

Config and secrets

Service-account credentials for the Drive, Sheets, and Calendar APIs live in Secrets Manager under wr/google/sa (one service account, read-only scopes). POS credentials live under wr/pos/*. The configured timezone, the recipient list, the notable-change thresholds, the look-off check thresholds, and the link-signing key all live in Parameter Store under /wr/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment. Nothing in the system has write access to any source — every Google and POS scope is read-only, which is the single most important guardrail: the builder can never alter the numbers it reports on.

Deploy

GitHub Actions with OIDC into a deploy role (no long-lived keys), and AWS SAM for the stack. The opinionated bits: turn on S3 versioning for wr-source-data and wr-config-source so a bad Drive edit can be rolled back in one click; version the EventBridge Scheduler timezone setting so a CI rotation can’t silently start running the weekly job in UTC; and keep every Google and POS credential scoped read-only so the builder is structurally incapable of writing to a source. Total deployable surface: around five Lambdas, two DDB tables, three S3 buckets, a handful of Scheduler rules, one verified SES identity, and one Budgets alarm.

That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.

All posts