Part 7 of 7 · Compliance tracker series ~8 min read

Engineering reference: the compliance tracker architecture

Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the SES inbound rule set, EventBridge Scheduler config, the DynamoDB schemas, and the Slack interactive flow. Read alongside the previous six posts; this one’s the build sheet.

Region and account shape

Default region: ap-southeast-1 (Singapore). SES inbound, Bedrock cross-Region inference, and EventBridge Scheduler are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for an SMB is somebody missing a recurring check, not a regional outage. One AWS account dedicated to the tracker (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.

Topology

AWS topology of the compliance tracker A topology diagram with three regions stacked vertically inside one AWS account boundary. Top region: ingress. Three boxes show the three intake lanes — a Drive sheet sync via the drive-sync Lambda triggered every 15 minutes by EventBridge Scheduler that mirrors the task CSV to s3://ct-tasklist-source/, a starter-pack loader that drops ready-made control rows into the sheet via the Sheets API, and an SES inbound rule set with action S3 PUT to s3://ct-raw-mime/ plus the parser Lambda intake-ses-parser that runs Textract on forwarded policies and Bedrock Haiku 4.5 to propose a task for Slack approval. Middle region: scheduled processing. The scheduler Lambda is triggered daily at 8am local by EventBridge Scheduler; it reads s3://ct-tasklist-source/tasks.csv, iterates rows, computes the next due date per task, looks up the chain in s3://ct-rules-source/rules.txt, reads done and reminder state from DynamoDB, and emits one of three events to the EventBridge default bus per task that needs an action: ct.due_now, ct.overdue, or ct.escalate. Bottom region: dispatch and completion. The dispatch Lambda is triggered by an EventBridge rule on those three event types; it resolves the owner, checks quiet hours and the holiday calendar, fetches the reminder template from s3://ct-rules-source/voice.txt, posts the message to Slack via chat.postMessage with a Done button or sends an email via SES outbound, and writes a row to DynamoDB ct-reminders. Slack interactive button clicks land on a Function URL Lambda done-handler that updates ct-done with the action (done, attach, snooze) and, on done, updates the task sheet via the Google Sheets API and stores evidence in s3://ct-evidence/. CloudWatch Logs collects from every Lambda at 7-day retention. Across the right edge: a small box labelled AWS Budgets alarm at $15 monthly threshold, posting to SNS topic ct-cost-alarm. A note at the bottom: every reminder leaves with full context — and every interaction is logged to ct-audit. Ingress Lambda · drive-sync every 15 min Sheets API → s3://ct-tasklist-source/ tasks.csv Starter-pack loader ct-starter-packs ready-made rows Sheets API append trigger: on-demand SES inbound rule set ct-inbound-rules action: S3 PUT s3://ct-raw-mime/ trigger: intake-ses-parser Drive task list canonical store · mirrored to S3 Scheduled processing EventBridge Scheduler cron(0 8 * * ? *) in TZ_NAME target: scheduler Lambda + deferred one-offs Lambda · scheduler reads CSV from S3 + rules.txt + voice.txt computes due dates, picks one of four moves EventBridge default bus ct.due_now ct.overdue ct.escalate (on-track → no event) Dispatch & completion Lambda · dispatch resolves owner, quiet hours, holidays; Slack chat.postMessage or SES outbound Slack interactive DM with [Done] [Attach] [Snooze] button clicks → Function URL Lambda · done-handler writes ct-done, ct-audit, evidence to S3; on done updates the Sheet via Sheets API Every reminder leaves with full context — and every interaction is logged to ct-audit.
Fig 7. AWS topology, in three regions of the diagram: ingress (three lanes into the task list), scheduled processing (the daily scheduler tick emitting events), dispatch and completion (the reminder ships and the owner’s response is recorded). Every Lambda is event- or schedule-driven; nothing is synchronous-chained.

Lambda functions

All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.

  • drive-sync — EventBridge Scheduler target, fires every 15 minutes. Uses the Google Drive API + Sheets API (service-account credentials in Secrets Manager under ct/drive/sa) to export the task sheet as CSV and write to s3://ct-tasklist-source/tasks.csv only if the sheet has changed since the last sync. Same pattern syncs the rules and voice docs to s3://ct-rules-source/. Memory: 256 MB. Timeout: 30 s.
  • starter-pack-loader — Function URL (admin-only, IAM-authenticated) invoked from a small internal admin page. Reads a chosen pack template from s3://ct-rules-source/packs/<pack>.json and appends its rows to the Drive task sheet via the Sheets API, pre-filling repeat rules, proof types, and a placeholder owner. Idempotent per pack (skips rows already present by name). Memory: 256 MB. Timeout: 30 s.
  • intake-ses-parser — S3 PUT trigger on s3://ct-raw-mime/. Parses MIME, extracts the attachment, runs Textract via StartDocumentTextDetection + StartDocumentAnalysis (asynchronously to handle multi-page policies). On Textract completion (via SNS notification), reads the structured text and calls Bedrock Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0 via global.anthropic.claude-haiku-4-5-20251001-v1:0) to propose a task row (name, control area, repeat rule, proof type). Posts the proposal to Slack via chat.postMessage with Approve/Edit/Discard buttons. For DOCX attachments (Textract doesn’t accept them), falls back to python-docx; XLSX uses openpyxl. Both packages are stable and widely used in 2026, though their maintenance velocity is light — for a policy-parsing path that only runs a few times a month, that’s acceptable. If extraction precision becomes a concern, the active community fork python-docx-oss is a drop-in alternative. Memory: 512 MB. Timeout: 60 s.
  • scheduler — EventBridge Scheduler target, daily at 8am local time (the schedule expression runs in TZ_NAME set to the SMB’s timezone, e.g. Asia/Singapore). Reads s3://ct-tasklist-source/tasks.csv and the rules and voice docs. For each row, computes next_due_date from the repeat rule and last-done date, reads cycle state from ct-reminders and ct-done, decides on a move. Emits one event per row that needs action: ct.due_now, ct.overdue, or ct.escalate, with the task context as the event payload. On-track tasks emit nothing. Memory: 512 MB. Timeout: 60 s. No Bedrock calls.
  • dispatch — EventBridge rule on the three move events. Resolves owner, checks quiet hours and holiday calendar, formats the reminder from the voice template, and ships via Slack chat.postMessage (bot token ct/slack/bot-token in Secrets Manager) or SES SendRawEmail. On quiet-hours or holiday defer, creates a one-off EventBridge Scheduler rule that re-invokes dispatch at the next available business minute. Writes a row to ct-reminders after a successful send. Memory: 256 MB. Timeout: 30 s.
  • done-handler — Lambda Function URL, public with AuthType: NONE; verifies a Slack signature on the request body. Triggered by Slack interactive button clicks (Done/Attach/Snooze) and by email-link clicks. Writes to ct-done and ct-audit; on done, updates the Drive sheet via the Sheets API and archives the old cycle in ct-reminders-archive. On attach, stores the uploaded file in s3://ct-evidence/ and calls Textract + Bedrock Haiku 4.5 for a one-line summary. Memory: 256 MB. Timeout: 15 s.
  • digest — EventBridge Scheduler target, weekly Sunday 6pm. Reads ct-reminders and ct-done for the past week and the task list; sends a digest message to a configured Slack channel summarizing tasks done and items coming up. No Bedrock; the message is a plain summary table. Memory: 256 MB.
  • summary — EventBridge Scheduler target, monthly on the first Monday at 9am. Reads the past month’s ct-reminders, ct-done, and ct-audit; calls Bedrock Haiku 4.5 to write a one-paragraph board narrative; emails it via SES to the configured stakeholder list. Memory: 512 MB.

Storage

  • DynamoDB · ct-reminders — one row per dispatch. PK (task_id, chain_index); attributes: reminded_date, dispatched_via (slack/email), recipient, move (due_now/overdue/escalate). On-demand. No TTL.
  • DynamoDB · ct-done — one row per completion. PK task_id; sort key done_date; attributes: action (done/attach/snooze), by_user, snooze_until (if action = snooze), cycle_due_date, next_due_date, evidence_key (if action = attach). On-demand.
  • DynamoDB · ct-audit — one row per write action of any kind. PK (task_id, ts); attributes: action, by_user, before, after. On-demand. No TTL — this is the long-term audit trail.
  • DynamoDB · ct-reminders-archive — archived cycles after a completion. Same shape as ct-reminders; PK (task_id, cycle_id, chain_index). On-demand.
  • S3 · ct-tasklist-source — mirrored CSV from the Drive task sheet. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 7 years.
  • S3 · ct-rules-source — mirrored rules and voice docs as plain text, plus the starter-pack templates. Versioning enabled.
  • S3 · ct-raw-mime — raw inbound MIME from forwarded policies. Lifecycle to Glacier at 30 days; expiry at 7 years.
  • S3 · ct-evidence — uploaded proof files (photos, signed forms, PDFs), keyed by <task_id>/<cycle>/. Versioning enabled; this is the durable evidence store, kept for 7 years.

Bedrock

  • Foundation model. anthropic.claude-haiku-4-5-20251001-v1:0 via the Global cross-Region inference profile global.anthropic.claude-haiku-4-5-20251001-v1:0. Two callsites: intake-ses-parser for the forwarded-policy parsing and done-handler/summary for the evidence summary and monthly board narrative. Claude Sonnet 4.6 (anthropic.claude-sonnet-4-6-20250930-v1:0) is wired but unused by default — reserved for the rare case where a forwarded policy is long and ambiguous enough that Haiku’s proposal needs a heavier second pass.
  • Embeddings. Not used. The task list is structured rows; deterministic lookup beats vector retrieval here. No Knowledge Base, no S3 Vectors.
  • Quotas. Default account quotas are more than enough at SMB volume. The scheduler itself doesn’t call Bedrock; the parsing and evidence lanes fire a few times a month at most.

EventBridge Scheduler config

  • ct-daily-tickcron(0 8 * * ? *) in the SMB’s timezone. Target: scheduler Lambda.
  • ct-drive-syncrate(15 minutes). Target: drive-sync Lambda.
  • ct-weekly-digestcron(0 18 ? * SUN *) in TZ. Target: digest Lambda.
  • ct-monthly-summarycron(0 9 ? * 2#1 *) (first Monday at 9am) in TZ. Target: summary Lambda.
  • One-off rules — created on the fly by dispatch when a quiet-hours or holiday defer is needed. Use at(YYYY-MM-DDTHH:MM:SS) expressions with --action-after-completion DELETE so the rule self-cleans.

SES inbound and outbound

  • Set the MX record on a dedicated subdomain (e.g. controls.your-company.com) to inbound-smtp.ap-southeast-1.amazonaws.com.
  • SES inbound rule set ct-inbound-rules: one rule with recipient controls@your-company.com → spam scan → S3 PUT to s3://ct-raw-mime/<message-id> → stop. The S3 PUT triggers intake-ses-parser.
  • SES outbound for the email-fallback reminders: verify a sender identity at tracker@your-company.com with DKIM and SPF on the parent domain. Out of sandbox by request.

IAM (least privilege per Lambda)

Each Lambda has its own role with policies scoped to exact ARNs. Sketch:

  • scheduler role: s3:GetObject on the task list, rules, and voice keys; dynamodb:Query + GetItem on ct-reminders, ct-done; events:PutEvents on the default bus. No bedrock:*.
  • dispatch role: scheduler:CreateSchedule for the deferred-reminder one-offs; secretsmanager:GetSecretValue on the Slack bot-token secret; ses:SendRawEmail from the verified sender identity; dynamodb:PutItem on ct-reminders; outbound network access to slack.com.
  • done-handler role: dynamodb:PutItem on ct-done and ct-audit; s3:PutObject on ct-evidence; textract:* + bedrock:InvokeModel for the evidence summary; secretsmanager:GetSecretValue on the Sheets-API service-account secret; outbound network access to sheets.googleapis.com; on done, dynamodb:BatchWriteItem for archiving the old cycle to ct-reminders-archive.
  • intake-ses-parser role: s3:GetObject on ct-raw-mime; textract:StartDocumentTextDetection + StartDocumentAnalysis; bedrock:InvokeModel on the Haiku ARN; secretsmanager:GetSecretValue on the Slack bot token.
  • drive-sync and starter-pack-loader roles: secretsmanager:GetSecretValue on the Google service-account secret; s3:PutObject and s3:GetObject on the task list and rules buckets; outbound network to www.googleapis.com.

Slack interactive flow

The reminder messages are posted via the chat.postMessage Web API with Block Kit blocks containing the action buttons. Button clicks are sent by Slack to the configured Interactivity request URL, which is the done-handler Function URL. done-handler verifies the Slack signing secret on the inbound request, parses the action_id (done, attach, snooze), opens a modal if needed (Attach and Snooze open modals; Done is one-tap), and processes the response when the modal is submitted. The Attach modal uses Slack’s files.upload flow so the proof file lands in S3 rather than only in Slack.

The Slack app needs chat:write, im:write, files:read, and the Interactivity URL configured. The bot token lives in Secrets Manager under ct/slack/bot-token. The signing secret is ct/slack/signing-secret.

Observability and cost gates

  • CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on "error" + "throttle" + "timeout" to a CloudWatch metric for alerting.
  • Alarms: scheduler Lambda failures > 0 in a day (the daily tick is the one piece that has to run); dispatch failure rate > 1% in 24h; done-handler signature-verification failures > 5/hour (might mean the Slack secret rotated).
  • X-Ray: off by default. Not worth the cost at SMB volume.
  • AWS Budgets: $15/month threshold, alarm at 80% and 100%, posts to SNS topic ct-cost-alarm subscribed to the on-call admin’s email and Slack.

Config and secrets

Service-account credentials for Drive, Sheets, and Calendar APIs all live in Secrets Manager under ct/drive/sa (one service account with scopes for all three APIs). Slack bot token and signing secret under ct/slack/*. SES sender identity lives in IAM and the verified-domain config. The configured timezone, holiday list reference, quiet-hours window, max_snoozes_per_cycle, and admin fallback owner all live in Parameter Store under /ct/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.

Deploy

GitHub Actions with OIDC into a deploy role (no long-lived keys), building with AWS SAM. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for ct-tasklist-source, ct-rules-source, and ct-evidence so a bad Drive edit can be rolled back in one click and proof files are never silently overwritten, and version the EventBridge Scheduler timezone setting so you don’t accidentally start running the daily tick in UTC after a CI rotation. Total deployable surface: around eight Lambdas, four DDB tables, four S3 buckets, one EventBridge rule on the default bus (plus the Scheduler rules), one SES rule set, and one Budgets alarm.

That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.

All posts