Part 7 of 7 · Weekly report builder series ~8 min read

Engineering reference: the weekly report builder architecture

Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, EventBridge Scheduler config, the DynamoDB schemas, and the grounding contract that keeps the model from ever sourcing a number. Read alongside the previous six posts; this one’s the build sheet.

Region and account shape

Default region: ap-southeast-1 (Singapore). Bedrock cross-Region inference, SES outbound, and EventBridge Scheduler are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for an SMB is a report landing an hour late, not a regional outage. One AWS account dedicated to the builder (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.

Topology

AWS topology of the weekly report builder A topology diagram with three regions stacked vertically inside one AWS account boundary. Top region: ingress. Three boxes show the three source kinds — a hand-kept Google Sheet synced via the source-sync Lambda triggered every 30 minutes by EventBridge Scheduler that mirrors the sheet CSV to s3://wr-source-data/, a tool CSV (Stripe or bank) mirrored to the same bucket on change, and a point-of-sale daily roll-up pulled to the same bucket on a schedule. All three converge on the wr-source-data bucket. Middle region: scheduled processing. The builder Lambda is triggered weekly on Monday at 7am in the owner's timezone by EventBridge Scheduler; it reads all sources from s3://wr-source-data/, computes this week, last week, and the four-week average per figure, runs the look-off checks, builds the facts list, calls Bedrock Haiku 4.5 once to write the summary paragraph, verifies every number in the draft against the figure set, and assembles the report. Bottom region: dispatch and lookup. The builder hands the assembled report to the send step, which resolves recipients, confirms the owner timezone, checks completeness, composes the HTML email with the summary, the numbers table, and any flags, and sends via SES outbound; it writes a row to DynamoDB wr-runs and any flags to wr-flags. A footer link in the email hits a Function URL Lambda lookup-handler that returns the full source rows behind any figure on demand. CloudWatch Logs collects from every Lambda at 7-day retention. Across the right edge: a small box labelled AWS Budgets alarm at $15 monthly threshold, posting to SNS topic wr-cost-alarm. A note at the bottom: every number in the report comes from the data — and every run is logged to wr-runs. Ingress Lambda · source-sync every 30 min Sheets API → s3://wr-source-data/ sales-sheet.csv Tool CSV mirror Stripe / bank export mirror on change s3://wr-source-data/ via source-sync Lambda · pos-sync scheduled pull POS daily roll-up totals + top items → wr-source-data wr-source-data bucket mirrored sources · versioned Scheduled processing EventBridge Scheduler cron(0 7 ? * 2 *) in TZ_NAME target: builder Lambda + deferred retries Lambda · builder reads sources from S3 computes + checks, one Haiku call, verifies every number Bedrock Haiku 4.5 facts list in paragraph out one call per run (no number sourcing) Dispatch & lookup Send step resolves readers, timezone, complete?; composes HTML, SES outbound Report email summary + table + Needs a look footer link → Function URL Lambda · lookup-handler returns the full source rows behind a figure; writes wr-runs and wr-flags Every number in the report comes from the data — and every run is logged to wr-runs.
Fig 7. AWS topology, in three regions of the diagram: ingress (three source kinds into one bucket), scheduled processing (the weekly builder run computing, checking, and writing), dispatch and lookup (the report ships and the full rows stay one link away). Every Lambda is schedule- or request-driven; nothing is synchronous-chained.

Lambda functions

All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.

  • source-sync — EventBridge Scheduler target, fires every 30 minutes. Uses the Google Drive API + Sheets API (service-account credentials in Secrets Manager under wr/google/sa) to export the hand-kept registry sheet as CSV and mirror any tool CSVs in the configured Drive folder to s3://wr-source-data/, writing only if a source has changed since the last sync. The same pattern syncs the config and voice docs to s3://wr-config-source/. Memory: 256 MB. Timeout: 30 s.
  • pos-sync — EventBridge Scheduler target, scheduled per the POS’s roll-up cadence (typically nightly plus a morning catch-up). Pulls the point-of-sale daily summary from wherever it lands (an SFTP drop, a vendor API, or a Drive file) and writes it to s3://wr-source-data/pos/. Kept separate from source-sync because POS integrations vary the most and benefit from their own retry and timeout tuning. Memory: 256 MB. Timeout: 60 s.
  • builder — EventBridge Scheduler target, weekly Monday 7am in the owner’s timezone (the schedule expression runs in TZ_NAME, e.g. Asia/Singapore). Reads every source from s3://wr-source-data/ and the config and voice docs. Normalizes into one figure set; computes this week, last week, and the four-week average per figure; runs the three look-off checks; builds the facts list (flagged figures withheld); calls Bedrock Haiku 4.5 once for the summary; verifies every number in the draft against the figure set, dropping any unmatched sentence; assembles the report. Hands the assembled report to the send step in the same invocation. Memory: 512 MB. Timeout: 120 s. Exactly one Bedrock call per run.
  • send step — runs inside the builder invocation (not a separate function) once the report is assembled. Resolves recipients from the config doc, confirms the owner timezone from Parameter Store, runs the completeness check, composes the HTML email, and ships via SES SendRawEmail from the verified sender identity. On an incomplete week, instead of sending it creates a one-off EventBridge Scheduler rule that re-invokes builder in retry mode a couple of hours later. Writes a row to wr-runs after a successful send and any flags to wr-flags.
  • lookup-handler — Lambda Function URL, public with AuthType: NONE; verifies a short-lived signed token embedded in the email footer link. Triggered when the owner clicks “show the rows behind this figure.” Reads the relevant source slice from s3://wr-source-data/ for the reported week and returns it as a simple HTML table. Read-only; writes nothing. Memory: 256 MB. Timeout: 15 s.

Storage

  • DynamoDB · wr-runs — one row per weekly send. PK (owner_id, week_start); attributes: sent_at, recipients, figures (the reported figure set as a map), summary_text, dropped_sentences count. On-demand. No TTL — this is what next week’s comparison reads.
  • DynamoDB · wr-flags — one row per flagged figure. PK (owner_id, week_start); sort key figure_check; attributes: check (stale/out_of_range/reconcile), figure, expected, actual, source. On-demand. No TTL — the long-term record of which sources misbehave.
  • S3 · wr-source-data — mirrored source CSVs and the POS roll-ups. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 7 years.
  • S3 · wr-config-source — mirrored config and voice docs as plain text. Versioning enabled.
  • S3 · wr-reports — the assembled HTML report for each week, kept for reference and for the lookup link. Versioning enabled. Lifecycle to Glacier at 90 days.

Bedrock

  • Foundation model. anthropic.claude-haiku-4-5-20251001-v1:0 via the Global cross-Region inference profile global.anthropic.claude-haiku-4-5-20251001-v1:0. One callsite: builder, for the weekly summary paragraph. The heavier claude-sonnet-4-6 isn’t used — turning a short facts list into a paragraph is well within Haiku’s range and doesn’t justify the cost.
  • Grounding contract. The prompt hands the model only the computed facts list (statements with numbers attached) and instructs it to introduce no figure not in that list. The output is then number-checked in code against the figure set; any sentence whose number doesn’t match the set is dropped before send. The model never sees raw source data and never sources a number.
  • Embeddings. Not used. The numbers are structured rows; deterministic arithmetic beats vector retrieval here. No Knowledge Base, no S3 Vectors.
  • Quotas. Default account quotas are far more than enough — one call a week per owner.

EventBridge Scheduler config

  • wr-weekly-runcron(0 7 ? * 2 *) (Monday 7am) in the owner’s timezone. Target: builder Lambda.
  • wr-source-syncrate(30 minutes). Target: source-sync Lambda.
  • wr-pos-sync — per the POS cadence, e.g. cron(30 1 * * ? *) plus a morning catch-up. Target: pos-sync Lambda.
  • One-off retry rules — created on the fly by the send step when the completeness check holds a send. Use at(YYYY-MM-DDTHH:MM:SS) expressions in TZ with --action-after-completion DELETE so the rule self-cleans.

SES outbound

  • Verify a sender identity at reports@your-company.com with DKIM and SPF on the parent domain so the weekly email lands in the inbox, not spam.
  • The send step uses SendRawEmail so the report can be a full multipart HTML message with the numbers table inline.
  • Out of the SES sandbox by request before go-live; the recipient list is small and static, so this is a one-time step.

IAM (least privilege per Lambda)

Each Lambda has its own role with policies scoped to exact ARNs. Sketch:

  • builder role: s3:GetObject on wr-source-data, wr-config-source; s3:PutObject on wr-reports; dynamodb:Query + GetItem + PutItem on wr-runs and wr-flags; bedrock:InvokeModel on the Haiku ARN; ses:SendRawEmail from the verified sender; scheduler:CreateSchedule for retry one-offs; ssm:GetParameter on /wr/config/*.
  • source-sync role: secretsmanager:GetSecretValue on the Google service-account secret; s3:PutObject on wr-source-data and wr-config-source; outbound network to www.googleapis.com.
  • pos-sync role: secretsmanager:GetSecretValue on the POS credential secret; s3:PutObject on wr-source-data; outbound network to the POS endpoint only.
  • lookup-handler role: s3:GetObject on wr-source-data and wr-reports; ssm:GetParameter on the link-signing key. Read-only — no write permissions, no Bedrock, no SES.

The grounding flow, in code

The contract that makes the report trustworthy is enforced in three places, not one. First, the gather step is the only thing that ever computes a figure; everything downstream reads from its output, never from raw sources. Second, the facts list handed to Bedrock is a closed set — flagged figures are excluded, so the model can’t even mention a number that failed a check. Third, the post-model number-check scans the draft, extracts every numeric token, and matches it against the figure set within a small rounding tolerance; an unmatched token drops its whole sentence and increments the dropped_sentences counter on the run.

The counter matters operationally: a run with a non-zero drop count is logged at WARN, and a sustained pattern of drops means the prompt or the model is drifting and should be reviewed. In steady state the count is zero — Haiku 4.5 handed a tight facts list and told to describe only it rarely strays — but the system is built so that “rarely” never reaches the owner as a wrong number.

Observability and cost gates

  • CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on "error" + "throttle" + "timeout" + "dropped_sentences" to a CloudWatch metric for alerting.
  • Alarms: builder failures > 0 on a Monday (the weekly run is the one piece that has to succeed); send failures > 0; dropped_sentences > 0 for two consecutive weeks (the model may be drifting); a source that trips the stale check three weeks running (fix it at the source).
  • X-Ray: off by default. Not worth the cost at SMB volume.
  • AWS Budgets: $15/month threshold, alarm at 80% and 100%, posts to SNS topic wr-cost-alarm subscribed to the admin’s email.

Config and secrets

Service-account credentials for the Drive, Sheets, and Calendar APIs live in Secrets Manager under wr/google/sa (one service account, read-only scopes). POS credentials live under wr/pos/*. The configured timezone, the recipient list, the notable-change thresholds, the look-off check thresholds, and the link-signing key all live in Parameter Store under /wr/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment. Nothing in the system has write access to any source — every Google and POS scope is read-only, which is the single most important guardrail: the builder can never alter the numbers it reports on.

Deploy

GitHub Actions with OIDC into a deploy role (no long-lived keys), and AWS SAM for the stack. The opinionated bits: turn on S3 versioning for wr-source-data and wr-config-source so a bad Drive edit can be rolled back in one click; version the EventBridge Scheduler timezone setting so a CI rotation can’t silently start running the weekly job in UTC; and keep every Google and POS credential scoped read-only so the builder is structurally incapable of writing to a source. Total deployable surface: around five Lambdas, two DDB tables, three S3 buckets, a handful of Scheduler rules, one verified SES identity, and one Budgets alarm.

That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.

All posts