Engineering reference: the expense approver architecture

Region and account shape

Default region: ap-southeast-1 (Singapore). SES inbound, Textract, Bedrock cross-Region inference, and EventBridge are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for an SMB is a claim waiting an hour for approval, not a regional outage. One AWS account dedicated to the approver (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.

Topology

Fig 7. AWS topology, in three regions of the diagram: ingress (three lanes into the claim record), policy check (the checker emitting an outcome event), routing and decision (the card ships and the approver’s response is recorded). Every Lambda is event-driven; nothing is synchronous-chained.

Lambda functions

All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.

form-handler — Lambda Function URL, behind sign-in (the form posts the claimant’s identity token). Validates the form fields, writes the uploaded receipt to s3://ea-receipts/<claim-id>, creates a draft row in ea-claims, and starts the read step by invoking intake-read. Memory: 256 MB. Timeout: 15 s.
intake-email-parser — S3 PUT trigger on s3://ea-raw-mime/. Parses the MIME tree, extracts the receipt (PDF, image, or body text), writes it to s3://ea-receipts/, matches the claimant from the forwarding “From” address against the directory, and starts the read step. If the claimant can’t be matched, holds the claim and emails the sender to confirm. Memory: 512 MB. Timeout: 60 s.
chat-intake — triggered by a chat file-upload event in the configured expenses channel (events delivered to a Function URL; the handler verifies the chat signing secret). Fetches the uploaded file to s3://ea-receipts/, sets the claimant to the posting user, and starts the read step. Memory: 256 MB. Timeout: 30 s.
intake-read — the shared read step. Runs Textract via AnalyzeExpense (the receipt-specialized API that returns total, date, and vendor as typed fields) on the receipt in S3. Then calls Bedrock Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0 via global.anthropic.claude-haiku-4-5-20251001-v1:0) to sort the receipt into a policy category. Writes the read-back values to ea-claims and surfaces them to the claimant for confirmation. On confirm/submit, emits claim.submitted to EventBridge. Memory: 512 MB. Timeout: 60 s.
checker — EventBridge rule on claim.submitted. Reads s3://ea-policy-source/policy.txt and voice.txt, reads the claimant’s same-day category total from ea-claims, computes the outcome, and emits one event per claim: ea.clear, ea.confirm, ea.review, or ea.reject, with the claim context as the payload. Memory: 256 MB. Timeout: 30 s. No Bedrock calls.
routing — EventBridge rule on the four outcome events. Resolves the approver, checks quiet hours, picks the card shape, formats from the voice template, and ships via chat webhook (ea/chat/webhook in Secrets Manager) or SES SendRawEmail. On a quiet-hours defer, creates a one-off EventBridge Scheduler rule that re-invokes routing at the next available business minute. Updates the claim in ea-claims after a successful send. Memory: 256 MB. Timeout: 30 s.
decision-handler — Lambda Function URL, public with AuthType: NONE; verifies a chat signature on the request body. Triggered by chat button clicks (Approve/Reject/Ask) and by email-link clicks. Writes to ea-claims and ea-audit; on approve, appends a row to the payable sheet via the Sheets API; on ask, parks the claim in waiting and messages the claimant. Memory: 256 MB. Timeout: 15 s.
digest — EventBridge Scheduler target, weekly Sunday 6pm. Reads ea-claims for the past week; sends a digest to a configured chat channel summarizing what was approved, rejected, and still waiting. No Bedrock; the message is a plain summary table. Memory: 256 MB.
summary — EventBridge Scheduler target, monthly on the first Monday at 9am. Reads the past month’s ea-claims and ea-audit; calls Bedrock Haiku 4.5 to write a one-paragraph board narrative on spend by category; emails it via SES to the configured stakeholder list. Memory: 512 MB.

Storage

DynamoDB · ea-claims — one row per claim. PK claim_id; GSI on (claimant, category, date) for the same-day total query. Attributes: amount, vendor, category, status (draft/submitted/waiting/approved/rejected), outcome, approver, reason, receipt_key. On-demand.
DynamoDB · ea-audit — one row per write action of any kind. PK (claim_id, ts); attributes: action (approve/reject/ask/undo), by_user, before, after. On-demand. No TTL — this is the long-term audit trail.
S3 · ea-receipts — receipt images and PDFs, one prefix per claim. Versioning enabled. Lifecycle to a cheaper storage class at 90 days; expiry at 7 years (tax-retention friendly).
S3 · ea-policy-source — mirrored policy and voice docs as plain text. Versioning enabled so a bad policy edit can be rolled back in one click.
S3 · ea-raw-mime — raw inbound MIME from forwarded receipts. Lifecycle to a cheaper class at 30 days; expiry at 7 years.

Bedrock

Foundation model. anthropic.claude-haiku-4-5-20251001-v1:0 via the Global cross-Region inference profile global.anthropic.claude-haiku-4-5-20251001-v1:0. Two callsites: intake-read for the category sort, and summary for the monthly board narrative. The heavier anthropic.claude-sonnet-4-6-20250930-v1:0 is not used — sorting a receipt into one of a dozen categories doesn’t justify it; Haiku 4.5 handles it cheaply.
Embeddings. Not used. The policy is short structured rules; deterministic lookup beats vector retrieval here. No Knowledge Base, no S3 Vectors.
Quotas. Default account quotas are more than enough at SMB volume. The checker doesn’t call Bedrock; the category sort is one small call per claim.

Textract

API. AnalyzeExpense — the receipt-and-invoice specialized call that returns typed summary fields (total, tax, date, vendor) plus line items, which is exactly the shape a receipt needs. Synchronous for single-page receipts; the async StartExpenseAnalysis path is used only for multi-page PDFs.
Fallback. If AnalyzeExpense returns low confidence on the total, the read step falls back to plain DetectDocumentText and a Bedrock pass to pull the amount, then always surfaces the value to the claimant to confirm before the claim is filed.

EventBridge config

ea-claim-submitted — rule on claim.submitted on the default bus. Target: checker Lambda.
ea-outcome-routing — rule matching ea.clear, ea.confirm, ea.review, ea.reject. Target: routing Lambda.
ea-weekly-digest — Scheduler, cron(0 18 ? * SUN *) in TZ. Target: digest Lambda.
ea-monthly-summary — Scheduler, cron(0 9 ? * 2#1 *) (first Monday at 9am) in TZ. Target: summary Lambda.
One-off rules — created on the fly by routing when a quiet-hours defer is needed. Use at(YYYY-MM-DDTHH:MM:SS) expressions with --action-after-completion DELETE so the rule self-cleans.

SES inbound and outbound

Set the MX record on a dedicated subdomain (e.g. expenses.your-company.com) to inbound-smtp.ap-southeast-1.amazonaws.com.
SES inbound rule set ea-inbound-rules: one rule with recipient expenses@your-company.com → spam scan → S3 PUT to s3://ea-raw-mime/<message-id> → stop. The S3 PUT triggers intake-email-parser.
SES outbound for the email-fallback approvals and the claimant notices: verify a sender identity at expenses@your-company.com with DKIM and SPF on the parent domain. Out of sandbox by request.

IAM (least privilege per Lambda)

Each Lambda has its own role with policies scoped to exact ARNs. Sketch:

checker role: s3:GetObject on the policy and voice keys; dynamodb:Query + GetItem on ea-claims; events:PutEvents on the default bus. No bedrock:*.
routing role: events:CreateSchedule for the deferred one-offs; secretsmanager:GetSecretValue on the chat webhook secret; ses:SendRawEmail from the verified sender identity; dynamodb:UpdateItem on ea-claims; outbound network access to the chat host.
decision-handler role: dynamodb:PutItem on ea-audit and dynamodb:UpdateItem on ea-claims; secretsmanager:GetSecretValue on the Sheets-API service-account secret; outbound network access to sheets.googleapis.com for the payable-sheet write.
intake-read role: s3:GetObject on ea-receipts; textract:AnalyzeExpense + StartExpenseAnalysis; bedrock:InvokeModel on the Haiku ARN; dynamodb:UpdateItem on ea-claims; events:PutEvents.
intake-email-parser and chat-intake roles: s3:GetObject/PutObject on the raw-MIME and receipts buckets; secretsmanager:GetSecretValue on the chat signing secret; permission to invoke intake-read.

Chat interactive flow

The chat incoming webhook is the simplest delivery surface but doesn’t support interactive button responses. So the approval cards are posted via the chat platform’s post-message Web API instead, with interactive blocks containing the Approve/Reject/Ask buttons. Button clicks are sent by the platform to the configured interactivity request URL, which is the decision-handler Function URL. decision-handler verifies the chat signing secret on the inbound request, parses the action id (approve, reject, ask), opens a note modal if needed (Reject and Ask open modals; Approve is one-tap), and processes the response when the modal is submitted.

The chat app needs message-write and direct-message scopes and the interactivity URL configured. The bot token lives in Secrets Manager under ea/chat/bot-token. The signing secret is ea/chat/signing-secret.

Observability and cost gates

CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on "error" + "throttle" + "timeout" to a CloudWatch metric for alerting.
Alarms: checker Lambda failures > 0 (a claim that never gets an outcome is a claim that silently stalls); decision-handler signature-verification failures > 5/hour (might mean the chat secret rotated); a claim sitting in submitted for > 3 business days (an approval nobody acted on).
X-Ray: off by default. Not worth the cost at SMB volume.
AWS Budgets: $25/month threshold, alarm at 80% and 100%, posts to SNS topic ea-cost-alarm subscribed to the on-call admin’s email and chat.

Config and secrets

Service-account credentials for the Sheets API (the payable sheet write) and the Drive API (the policy/voice doc sync) live in Secrets Manager under ea/google/sa. Chat bot token and signing secret live under ea/chat/*. SES sender identity lives in IAM and the verified-domain config. The configured timezone, quiet-hours window, per-team default approvers, and admin fallback all live in Parameter Store under /ea/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment. A small drive-sync Lambda (Scheduler, every 15 minutes) mirrors the policy and voice docs to s3://ea-policy-source/ so the checker reads from S3, not Drive, on every claim.

Deploy

GitHub Actions with OIDC into a deploy role — no long-lived AWS keys — running AWS SAM. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for ea-receipts and ea-policy-source so a bad edit can be rolled back in one click, and keep the payable-sheet write idempotent (key the row on claim_id) so a retried approve can never double-pay. SAM with a single template fits the whole surface: around nine Lambdas, two DynamoDB tables, three S3 buckets, two EventBridge rules on the default bus (plus the Scheduler rules), one SES rule set, and one Budgets alarm.

That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.

All posts