Engineering reference: the bill matcher architecture
Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the SES inbound rule set, EventBridge Scheduler config, the DynamoDB schemas, and the email approval flow. Read alongside the previous six posts; this one’s the build sheet.
Region and account shape
Default region: ap-southeast-1 (Singapore). SES inbound, Textract, Bedrock cross-Region inference, and EventBridge Scheduler are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for an SMB is a bill sitting unmatched for an hour, not a regional outage, and the matcher never moves money so there’s no in-flight payment to protect. One AWS account dedicated to the matcher (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.
Topology
Lambda functions
All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.
intake-reader— S3 PUT trigger ons3://bm-raw-mime/ands3://bm-uploads/. Parses MIME (for the email lane), extracts the bill PDF, runs Textract viaStartDocumentAnalysiswith theTABLESfeature (asynchronously, to handle multi-page bills). On Textract completion (via SNS notification), reads the structured tables and calls Bedrock Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0viaglobal.anthropic.claude-haiku-4-5-20251001-v1:0) to emit clean lines (supplier, bill number, PO reference, per-line item/quantity/unit price) with a confidence score per field. Writes the bill tobm-billsand enqueues a match message tobm-match-queue; low-confidence reads are written in aneeds_reviewstate and skip the queue until a human confirms. For DOCX or XLSX bills (rare), falls back topython-docx/openpyxl. Memory: 512 MB. Timeout: 60 s.portal-poll— EventBridge Scheduler target, three times a day. Signs in to each configured supplier portal (credentials in Secrets Manager underbm/portals/*), downloads any new bill PDFs, and writes them tos3://bm-uploads/whereintake-readerpicks them up. Portal sign-in flows are brittle; each portal is its own small adapter, and a failure on one portal is isolated from the others. Memory: 512 MB. Timeout: 120 s.matcher— SQS trigger onbm-match-queue. For each bill, reads the clean lines frombm-bills, pulls the matching purchase order and goods-received lines froms3://bm-orders-source/(mirrored from the Drive sheet), and the tolerances froms3://bm-rules-source/rules.txt. Resolves the PO by PO number, falling back to a supplier-plus-item match. Checks each line for item, received-quantity, and unit-price against tolerance; picks one of four outcomes; writesbm-resultsand emitsbm.matched,bm.price_variance,bm.quantity_variance, orbm.no_powith the bill and failing-line context. Memory: 512 MB. Timeout: 30 s. No Bedrock calls.approval-desk— EventBridge rule on the four outcome events. Resolves the approver (per-supplier → per-category → admin fallback), checks the spend threshold (adds a second approver above it), checks quiet hours, formats the email from the template for the outcome, and ships via SESSendRawEmailwith Approve/Query/Reject links to theack-handlerFunction URL. On a quiet-hours defer, creates a one-off EventBridge Scheduler rule that re-invokesapproval-deskat the next business minute. Writes a row tobm-queue. Memory: 256 MB. Timeout: 30 s.ack-handler— Lambda Function URL, public withAuthType: NONE; verifies a signed, single-use token embedded in the button link (HMAC keyed from a secret) so a leaked email link can’t be replayed. Triggered by Approve/Query/Reject clicks. On approve: marks the billready_for_paymentinbm-bills, closes the matched PO line, and (if dual sign-off) waits for the second approver. On query: emails the supplier via SES and setswaiting_on_supplier. On reject: marksrejectedand notifies the supplier. Always writesbm-audit. Memory: 256 MB. Timeout: 15 s.drive-sync— EventBridge Scheduler target, every 15 minutes. Uses the Google Drive + Sheets APIs (service-account credentials in Secrets Manager underbm/drive/sa) to export the orders-and-receipts sheet as CSV tos3://bm-orders-source/and the rules and templates docs tos3://bm-rules-source/, only if changed since the last sync. Memory: 256 MB. Timeout: 30 s.sweep— EventBridge Scheduler target, daily at 9am local. Readsbm-queuefor bills still unapproved past their due date (or waiting on a second sign-off) and re-surfaces them to the approver; also re-pingswaiting_on_supplierbills that have gone quiet. No Bedrock; a plain summary. Memory: 256 MB.summary— EventBridge Scheduler target, monthly on the first Monday at 9am. Reads the past month’sbm-resultsandbm-audit; calls Bedrock Haiku 4.5 to write a one-paragraph narrative (bills matched clean, value caught in variances, top suppliers by mismatch); emails it via SES to the stakeholder list. Memory: 512 MB.
Storage
- DynamoDB ·
bm-bills— one row per bill. PKbill_id; attributes:supplier,bill_number,po_ref,lines(item/qty/unit_price),state(needs_review/queued/flagged/ready_for_payment/waiting_on_supplier/rejected),total. On-demand. - DynamoDB ·
bm-results— one row per match. PKbill_id; attributes:outcome(matched/price_variance/quantity_variance/no_po),failing_lines,po_number,checked_at. On-demand. - DynamoDB ·
bm-queue— one row per pending approval. PKbill_id; sort keyapprover; attributes:sent_at,due_date,second_approver(if dual sign-off),status. On-demand. - DynamoDB ·
bm-audit— one row per write action of any kind. PK(bill_id, ts); attributes:action(approved/queried/rejected/override),by_user,reason,before,after. On-demand. No TTL — this is the long-term audit trail. - S3 ·
bm-orders-source— mirrored CSV of the purchase-order and goods-received tabs. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 7 years. - S3 ·
bm-rules-source— mirrored rules and email templates as plain text. Versioning enabled. - S3 ·
bm-raw-mime— raw inbound MIME from emailed bills. Lifecycle to Glacier at 30 days; expiry at 7 years. - S3 ·
bm-uploads— bill PDFs from the portal poll and manual upload, plus the parsed source bills kept for reference by the registry row.
Bedrock
- Foundation model.
anthropic.claude-haiku-4-5-20251001-v1:0via the Global cross-Region inference profileglobal.anthropic.claude-haiku-4-5-20251001-v1:0. Two callsites:intake-readerfor turning the Textract read into clean lines, andsummaryfor the monthly narrative. Thematchernever calls Bedrock; the three-way match is deterministic Python. - Heavier model.
anthropic.claude-sonnet-4-6-...is configured but off by default. It’s only worth switching the reader to Sonnet for suppliers with pathological multi-page layouts where Haiku’s line extraction needs a confidence bump; the per-supplier config flag lets you route just those. - Embeddings. Not used. POs and goods-received notes are structured rows; deterministic lookup beats vector retrieval here. No Knowledge Base, no S3 Vectors.
EventBridge Scheduler config
bm-drive-sync—rate(15 minutes). Target:drive-syncLambda.bm-portal-poll—cron(0 8,12,16 * * ? *)in TZ. Target:portal-pollLambda.bm-daily-sweep—cron(0 9 * * ? *)in TZ. Target:sweepLambda.bm-monthly-summary—cron(0 9 ? * 2#1 *)(first Monday at 9am) in TZ. Target:summaryLambda.- One-off rules — created on the fly by
approval-deskwhen a quiet-hours defer is needed. Useat(YYYY-MM-DDTHH:MM:SS)expressions with--action-after-completion DELETEso the rule self-cleans.
SES inbound and outbound
- Set the MX record on a dedicated subdomain (e.g.
bills.your-company.com) toinbound-smtp.ap-southeast-1.amazonaws.com. - SES inbound rule set
bm-inbound-rules: one rule with recipientbills@your-company.com→ spam scan → S3 PUT tos3://bm-raw-mime/<message-id>→ stop. The S3 PUT triggersintake-reader. - SES outbound for approver emails and supplier queries: verify a sender identity at
ap@your-company.comwith DKIM and SPF on the parent domain. Out of sandbox by request.
IAM (least privilege per Lambda)
Each Lambda has its own role with policies scoped to exact ARNs. Sketch:
- matcher role:
s3:GetObjecton the orders and rules keys;dynamodb:GetItemonbm-bills;dynamodb:PutItemonbm-results;events:PutEventson the default bus;sqs:ReceiveMessage+DeleteMessageonbm-match-queue. Nobedrock:*. - intake-reader role:
s3:GetObjectonbm-raw-mimeandbm-uploads;textract:StartDocumentAnalysis+GetDocumentAnalysis;bedrock:InvokeModelon the Haiku ARN;dynamodb:PutItemonbm-bills;sqs:SendMessageonbm-match-queue. - approval-desk role:
events:CreateSchedulefor the deferred one-offs;ses:SendRawEmailfrom the verified sender;dynamodb:PutItemonbm-queue;secretsmanager:GetSecretValueon the link-signing secret. - ack-handler role:
dynamodb:PutItem/UpdateItemonbm-billsandbm-audit;ses:SendRawEmailfor supplier query/reject notices;secretsmanager:GetSecretValueon the link-signing secret;dynamodb:Queryfor dual-sign-off state. - drive-sync and portal-poll roles:
secretsmanager:GetSecretValueon the relevant Google or portal secret;s3:PutObjecton the orders, rules, and uploads buckets; outbound network towww.googleapis.comand the configured portal hosts.
Email approval flow
The approval email is a small HTML email with three buttons rendered as links to the ack-handler Function URL. Each link carries a signed, single-use token: an HMAC over (bill_id, action, approver, nonce, expiry) keyed from bm/links/secret in Secrets Manager. ack-handler verifies the HMAC, checks the nonce hasn’t been spent (a conditional write to a small bm-nonce attribute), and checks the token hasn’t expired before acting. Approve on a flagged bill and Query both render a follow-up form (the override reason, or the supplier-query text) posted back to the same Function URL; Approve on a clean match and Reject are one click plus a confirm.
This keeps the whole approval surface to one public Function URL with no API Gateway, while staying safe against a forwarded or leaked email link — the token is single-use and time-boxed, so the worst case from a leaked link is a no-op on an already-decided bill.
Observability and cost gates
- CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on
"error"+"throttle"+"timeout"to a CloudWatch metric for alerting. - SQS DLQ:
bm-match-queuehas a dead-letter queue; a message that fails the matcher twice lands there with an alarm, so a bad read never silently blocks a bill. - Alarms: matcher failures > 0 in an hour; DLQ depth > 0; ack-handler token-verification failures > 5/hour (might mean the signing secret rotated); Textract throttles.
- X-Ray: off by default. Not worth the cost at SMB volume.
- AWS Budgets: $25/month threshold, alarm at 80% and 100%, posts to SNS topic
bm-cost-alarmsubscribed to the finance admin’s email.
Config and secrets
Service-account credentials for Drive and Sheets live in Secrets Manager under bm/drive/sa. Supplier-portal credentials under bm/portals/*. The email-link signing secret under bm/links/secret. The configured timezone, quiet-hours window, spend threshold, tolerance defaults, and admin fallback approver live in Parameter Store under /bm/config/ (with per-supplier overrides kept in the rules doc so a buyer can change them without a deploy). Lambdas fetch config on cold start and cache for the lifetime of the execution environment.
Deploy
GitHub Actions with OIDC into a deploy role (no long-lived keys) and AWS SAM. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for bm-orders-source and bm-rules-source so a bad Drive edit rolls back in one click, and give the matcher an SQS source with a DLQ so a single unreadable bill never wedges the pipeline. Total deployable surface: around eight Lambdas, four DDB tables, four S3 buckets, one SQS queue with a DLQ, one EventBridge rule on the default bus (plus the Scheduler rules), one SES rule set, and one Budgets alarm.
That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.
All posts