Part 7 of 7 · Expense approver series ~8 min read

Engineering reference: the expense approver architecture

Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the Textract flow, EventBridge config, the DynamoDB schemas, and the chat interactive flow. Read alongside the previous six posts; this one’s the build sheet.

Region and account shape

Default region: ap-southeast-1 (Singapore). SES inbound, Textract, Bedrock cross-Region inference, and EventBridge are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for an SMB is a claim waiting an hour for approval, not a regional outage. One AWS account dedicated to the approver (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.

Topology

AWS topology of the expense approver A topology diagram with three regions stacked vertically inside one AWS account boundary. Top region: ingress. Three boxes show the three submit lanes — a web form behind a Lambda Function URL form-handler that writes the receipt to s3://ea-receipts/ and creates a draft claim, an SES inbound rule set with action S3 PUT to s3://ea-raw-mime/ plus the parser Lambda intake-email-parser that pulls the receipt attachment, and a chat-intake Lambda triggered by a chat file-upload event that fetches the file. All three converge on the claim record in DynamoDB ea-claims after the read step runs Textract and Bedrock Haiku 4.5. Middle region: scheduled and event processing. The checker Lambda is triggered the moment a claim is submitted; it reads s3://ea-policy-source/policy.txt for the per-category limits, reads the claimant's daily category total from ea-claims, computes the outcome, and emits one of four events to the EventBridge default bus per claim: ea.clear, ea.confirm, ea.review, or ea.reject. Bottom region: routing and decision. The routing Lambda is triggered by an EventBridge rule on those four event types; it resolves the approver, checks quiet hours, fetches the message template from s3://ea-policy-source/voice.txt, posts the card to chat via webhook with Approve, Reject, and Ask buttons or sends an email via SES outbound, and updates the claim in ea-claims. Chat button clicks land on a Function URL Lambda decision-handler that updates ea-claims and ea-audit with the action and, on approve, writes a row to the payable sheet via the Google Sheets API. CloudWatch Logs collects from every Lambda at 7-day retention. Across the right edge: a small box labelled AWS Budgets alarm at $25 monthly threshold, posting to SNS topic ea-cost-alarm. A note at the bottom: a human approves every payment — and every interaction is logged to ea-audit. Ingress Lambda · form-handler Function URL receipt → s3://ea-receipts/ draft claim SES inbound rule set ea-inbound-rules action: S3 PUT s3://ea-raw-mime/ trigger: intake-email-parser Lambda · chat-intake chat file-upload event fetches the file to s3://ea-receipts/ → read step Claim record (ea-claims) read by Textract · sorted by Bedrock Policy check EventBridge claim.submitted event target: checker Lambda + deferred one-offs Lambda · checker reads policy.txt + voice.txt from S3 compares amount, picks one of four EventBridge default bus ea.clear ea.confirm ea.review ea.reject Routing & decision Lambda · routing resolves approver, quiet hours, card; chat webhook or SES outbound Chat interactive card with [Approve] [Reject] [Ask] button clicks → Function URL Lambda · decision-handler writes ea-claims, ea-audit, and on approve writes the payable sheet via API A human approves every payment — and every interaction is logged to ea-audit.
Fig 7. AWS topology, in three regions of the diagram: ingress (three lanes into the claim record), policy check (the checker emitting an outcome event), routing and decision (the card ships and the approver’s response is recorded). Every Lambda is event-driven; nothing is synchronous-chained.

Lambda functions

All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.

  • form-handler — Lambda Function URL, behind sign-in (the form posts the claimant’s identity token). Validates the form fields, writes the uploaded receipt to s3://ea-receipts/<claim-id>, creates a draft row in ea-claims, and starts the read step by invoking intake-read. Memory: 256 MB. Timeout: 15 s.
  • intake-email-parser — S3 PUT trigger on s3://ea-raw-mime/. Parses the MIME tree, extracts the receipt (PDF, image, or body text), writes it to s3://ea-receipts/, matches the claimant from the forwarding “From” address against the directory, and starts the read step. If the claimant can’t be matched, holds the claim and emails the sender to confirm. Memory: 512 MB. Timeout: 60 s.
  • chat-intake — triggered by a chat file-upload event in the configured expenses channel (events delivered to a Function URL; the handler verifies the chat signing secret). Fetches the uploaded file to s3://ea-receipts/, sets the claimant to the posting user, and starts the read step. Memory: 256 MB. Timeout: 30 s.
  • intake-read — the shared read step. Runs Textract via AnalyzeExpense (the receipt-specialized API that returns total, date, and vendor as typed fields) on the receipt in S3. Then calls Bedrock Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0 via global.anthropic.claude-haiku-4-5-20251001-v1:0) to sort the receipt into a policy category. Writes the read-back values to ea-claims and surfaces them to the claimant for confirmation. On confirm/submit, emits claim.submitted to EventBridge. Memory: 512 MB. Timeout: 60 s.
  • checker — EventBridge rule on claim.submitted. Reads s3://ea-policy-source/policy.txt and voice.txt, reads the claimant’s same-day category total from ea-claims, computes the outcome, and emits one event per claim: ea.clear, ea.confirm, ea.review, or ea.reject, with the claim context as the payload. Memory: 256 MB. Timeout: 30 s. No Bedrock calls.
  • routing — EventBridge rule on the four outcome events. Resolves the approver, checks quiet hours, picks the card shape, formats from the voice template, and ships via chat webhook (ea/chat/webhook in Secrets Manager) or SES SendRawEmail. On a quiet-hours defer, creates a one-off EventBridge Scheduler rule that re-invokes routing at the next available business minute. Updates the claim in ea-claims after a successful send. Memory: 256 MB. Timeout: 30 s.
  • decision-handler — Lambda Function URL, public with AuthType: NONE; verifies a chat signature on the request body. Triggered by chat button clicks (Approve/Reject/Ask) and by email-link clicks. Writes to ea-claims and ea-audit; on approve, appends a row to the payable sheet via the Sheets API; on ask, parks the claim in waiting and messages the claimant. Memory: 256 MB. Timeout: 15 s.
  • digest — EventBridge Scheduler target, weekly Sunday 6pm. Reads ea-claims for the past week; sends a digest to a configured chat channel summarizing what was approved, rejected, and still waiting. No Bedrock; the message is a plain summary table. Memory: 256 MB.
  • summary — EventBridge Scheduler target, monthly on the first Monday at 9am. Reads the past month’s ea-claims and ea-audit; calls Bedrock Haiku 4.5 to write a one-paragraph board narrative on spend by category; emails it via SES to the configured stakeholder list. Memory: 512 MB.

Storage

  • DynamoDB · ea-claims — one row per claim. PK claim_id; GSI on (claimant, category, date) for the same-day total query. Attributes: amount, vendor, category, status (draft/submitted/waiting/approved/rejected), outcome, approver, reason, receipt_key. On-demand.
  • DynamoDB · ea-audit — one row per write action of any kind. PK (claim_id, ts); attributes: action (approve/reject/ask/undo), by_user, before, after. On-demand. No TTL — this is the long-term audit trail.
  • S3 · ea-receipts — receipt images and PDFs, one prefix per claim. Versioning enabled. Lifecycle to a cheaper storage class at 90 days; expiry at 7 years (tax-retention friendly).
  • S3 · ea-policy-source — mirrored policy and voice docs as plain text. Versioning enabled so a bad policy edit can be rolled back in one click.
  • S3 · ea-raw-mime — raw inbound MIME from forwarded receipts. Lifecycle to a cheaper class at 30 days; expiry at 7 years.

Bedrock

  • Foundation model. anthropic.claude-haiku-4-5-20251001-v1:0 via the Global cross-Region inference profile global.anthropic.claude-haiku-4-5-20251001-v1:0. Two callsites: intake-read for the category sort, and summary for the monthly board narrative. The heavier anthropic.claude-sonnet-4-6-20250930-v1:0 is not used — sorting a receipt into one of a dozen categories doesn’t justify it; Haiku 4.5 handles it cheaply.
  • Embeddings. Not used. The policy is short structured rules; deterministic lookup beats vector retrieval here. No Knowledge Base, no S3 Vectors.
  • Quotas. Default account quotas are more than enough at SMB volume. The checker doesn’t call Bedrock; the category sort is one small call per claim.

Textract

  • API. AnalyzeExpense — the receipt-and-invoice specialized call that returns typed summary fields (total, tax, date, vendor) plus line items, which is exactly the shape a receipt needs. Synchronous for single-page receipts; the async StartExpenseAnalysis path is used only for multi-page PDFs.
  • Fallback. If AnalyzeExpense returns low confidence on the total, the read step falls back to plain DetectDocumentText and a Bedrock pass to pull the amount, then always surfaces the value to the claimant to confirm before the claim is filed.

EventBridge config

  • ea-claim-submitted — rule on claim.submitted on the default bus. Target: checker Lambda.
  • ea-outcome-routing — rule matching ea.clear, ea.confirm, ea.review, ea.reject. Target: routing Lambda.
  • ea-weekly-digest — Scheduler, cron(0 18 ? * SUN *) in TZ. Target: digest Lambda.
  • ea-monthly-summary — Scheduler, cron(0 9 ? * 2#1 *) (first Monday at 9am) in TZ. Target: summary Lambda.
  • One-off rules — created on the fly by routing when a quiet-hours defer is needed. Use at(YYYY-MM-DDTHH:MM:SS) expressions with --action-after-completion DELETE so the rule self-cleans.

SES inbound and outbound

  • Set the MX record on a dedicated subdomain (e.g. expenses.your-company.com) to inbound-smtp.ap-southeast-1.amazonaws.com.
  • SES inbound rule set ea-inbound-rules: one rule with recipient expenses@your-company.com → spam scan → S3 PUT to s3://ea-raw-mime/<message-id> → stop. The S3 PUT triggers intake-email-parser.
  • SES outbound for the email-fallback approvals and the claimant notices: verify a sender identity at expenses@your-company.com with DKIM and SPF on the parent domain. Out of sandbox by request.

IAM (least privilege per Lambda)

Each Lambda has its own role with policies scoped to exact ARNs. Sketch:

  • checker role: s3:GetObject on the policy and voice keys; dynamodb:Query + GetItem on ea-claims; events:PutEvents on the default bus. No bedrock:*.
  • routing role: events:CreateSchedule for the deferred one-offs; secretsmanager:GetSecretValue on the chat webhook secret; ses:SendRawEmail from the verified sender identity; dynamodb:UpdateItem on ea-claims; outbound network access to the chat host.
  • decision-handler role: dynamodb:PutItem on ea-audit and dynamodb:UpdateItem on ea-claims; secretsmanager:GetSecretValue on the Sheets-API service-account secret; outbound network access to sheets.googleapis.com for the payable-sheet write.
  • intake-read role: s3:GetObject on ea-receipts; textract:AnalyzeExpense + StartExpenseAnalysis; bedrock:InvokeModel on the Haiku ARN; dynamodb:UpdateItem on ea-claims; events:PutEvents.
  • intake-email-parser and chat-intake roles: s3:GetObject/PutObject on the raw-MIME and receipts buckets; secretsmanager:GetSecretValue on the chat signing secret; permission to invoke intake-read.

Chat interactive flow

The chat incoming webhook is the simplest delivery surface but doesn’t support interactive button responses. So the approval cards are posted via the chat platform’s post-message Web API instead, with interactive blocks containing the Approve/Reject/Ask buttons. Button clicks are sent by the platform to the configured interactivity request URL, which is the decision-handler Function URL. decision-handler verifies the chat signing secret on the inbound request, parses the action id (approve, reject, ask), opens a note modal if needed (Reject and Ask open modals; Approve is one-tap), and processes the response when the modal is submitted.

The chat app needs message-write and direct-message scopes and the interactivity URL configured. The bot token lives in Secrets Manager under ea/chat/bot-token. The signing secret is ea/chat/signing-secret.

Observability and cost gates

  • CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on "error" + "throttle" + "timeout" to a CloudWatch metric for alerting.
  • Alarms: checker Lambda failures > 0 (a claim that never gets an outcome is a claim that silently stalls); decision-handler signature-verification failures > 5/hour (might mean the chat secret rotated); a claim sitting in submitted for > 3 business days (an approval nobody acted on).
  • X-Ray: off by default. Not worth the cost at SMB volume.
  • AWS Budgets: $25/month threshold, alarm at 80% and 100%, posts to SNS topic ea-cost-alarm subscribed to the on-call admin’s email and chat.

Config and secrets

Service-account credentials for the Sheets API (the payable sheet write) and the Drive API (the policy/voice doc sync) live in Secrets Manager under ea/google/sa. Chat bot token and signing secret live under ea/chat/*. SES sender identity lives in IAM and the verified-domain config. The configured timezone, quiet-hours window, per-team default approvers, and admin fallback all live in Parameter Store under /ea/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment. A small drive-sync Lambda (Scheduler, every 15 minutes) mirrors the policy and voice docs to s3://ea-policy-source/ so the checker reads from S3, not Drive, on every claim.

Deploy

GitHub Actions with OIDC into a deploy role — no long-lived AWS keys — running AWS SAM. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for ea-receipts and ea-policy-source so a bad edit can be rolled back in one click, and keep the payable-sheet write idempotent (key the row on claim_id) so a retried approve can never double-pay. SAM with a single template fits the whole surface: around nine Lambdas, two DynamoDB tables, three S3 buckets, two EventBridge rules on the default bus (plus the Scheduler rules), one SES rule set, and one Budgets alarm.

That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.

All posts