Engineering reference: the receipt organizer architecture
Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the SES inbound rule set, the SQS intake config, the Function URL surfaces, the DynamoDB schemas, and the Slack review flow. Read alongside the previous six posts; this one’s the build sheet.
Region and account shape
Default region: ap-southeast-1 (Singapore). SES inbound, Textract, Bedrock Global cross-Region inference, and the queue and Function URL surfaces are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for an SMB is a receipt that waits an extra hour on the queue, not a regional outage. One AWS account dedicated to the organizer (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.
Topology
Lambda functions
All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.
intake-email— S3 PUT trigger ons3://ro-raw-mime/. Parses the MIME tree, extracts the first image (JPEG/PNG/HEIC) or PDF attachment, or renders the HTML body to an image if the receipt is inline. Writes the original tos3://ro-receipts/<receipt-id>, records the forwarding sender as the submitter, and sends a message to thero-intakeSQS queue. Memory: 512 MB (HEIC decode and HTML render). Timeout: 60 s.intake-upload— Lambda Function URL,AuthType: NONE; verifies a per-device bearer token (issued from Parameter Store under/ro/upload-tokens/) before accepting the body. Serves the drag-and-drop upload page on GET and accepts a multipart file on POST. Writes the file tos3://ro-receipts/, tags the submitter, and enqueues toro-intake. Used by both the phone shortcut and the web page. Memory: 256 MB. Timeout: 30 s.reader— SQS event source onro-intake(batch size 1 for clean per-receipt retries). Runs TextractAnalyzeExpenseon the image; for multi-page PDFs uses the asyncStartExpenseAnalysis+ completion via SNS. Reads the confidence threshold froms3://ro-rules-source/rules.txt; checks for duplicates by queryingro-receiptson a(vendor, date, total)GSI. Decides filed/needs-review/duplicate/rejected. For filed and needs-review, invokes the categorizer in-process, then emitsro.filedorro.needs_review. Memory: 512 MB. Timeout: 120 s. Maximum receives 3, then toro-intake-dlq.categorizer— invoked byreader(or deployable as its own function). Applies vendor hints fromrules.txtfirst; on no match, calls Bedrock Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0viaglobal.anthropic.claude-haiku-4-5-20251001-v1:0) with the read fields and the chart of accounts, constrained to return one category from that list plus a reason and a confidence. Runs the sanity check (category in chart, score numeric, tax-to-total plausible) and the review-gate comparison. Memory: 256 MB. Timeout: 30 s.filing— EventBridge rule onro.filed. Writes one row to the expense sheet via the Google Sheets API (service-account credentials in Secrets Manager underro/google/sa): date, vendor, total, tax, category, submitter, image link. Moves the image tos3://ro-receipts/YYYY-MM/and writes the final record toro-receipts. Memory: 256 MB. Timeout: 30 s.review— EventBridge rule onro.needs_review. Posts a Slack review card viachat.postMessagewith the receipt image, the read fields, the proposed category, the model’s reason, and Approve/Correct/Reject buttons. Writes a pending row toro-review. Memory: 256 MB. Timeout: 30 s.action-handler— Lambda Function URL, public withAuthType: NONE; verifies the Slack signing secret on the request body. Triggered by Slack button clicks (Approve/Correct/Reject) and modal submissions. On approve or correct, writes the row to the sheet via the Sheets API and files the image; on a category correction, optionally appends a vendor hint torules.txt; on reject, moves the image tos3://ro-rejected/. Always writes toro-audit. Memory: 256 MB. Timeout: 15 s.digest— EventBridge Scheduler target, weekly Sunday 6pm. Readsro-receiptsandro-reviewfor the week; posts a digest to a configured Slack channel summarizing what was filed and what’s still waiting in review. No Bedrock; a plain summary table. Memory: 256 MB.summary— EventBridge Scheduler target, monthly on the first Monday at 9am. Reads the past month’sro-receiptsandro-audit; calls Bedrock Haiku 4.5 to write a one-paragraph spend-by-category note; emails it via SES to the configured stakeholder list. Memory: 512 MB.
Storage
- DynamoDB ·
ro-receipts— one row per receipt. PKreceipt_id; attributes:source,submitter,vendor,date,total,tax,category,result(filed/needs_review/duplicate/rejected),field_scores,image_key. GSI on(vendor, date, total)for the duplicate check. On-demand. - DynamoDB ·
ro-review— one row per review item. PKreceipt_id; attributes:proposed_category,reason,read_fields,status(pending/approved/corrected/rejected),slack_ts. On-demand. - DynamoDB ·
ro-audit— one row per write action of any kind. PK(receipt_id, ts); attributes:action(filed/approved/corrected/rejected),by_user,before,after. On-demand. No TTL — this is the long-term audit trail. - S3 ·
ro-receipts— the original receipt images, filed underYYYY-MM/once a record is filed. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 7 years (the usual record-retention window). - S3 ·
ro-rules-source— mirrored chart of accounts, vendor hints, tax rules, threshold, and the voice doc as plain text. Versioning enabled. - S3 ·
ro-raw-mime— raw inbound MIME from forwarded receipts. Lifecycle to Glacier at 30 days; expiry at 7 years. - S3 ·
ro-rejected— images rejected at review or read time, kept with a one-line reason for audit.
Bedrock
- Foundation model.
anthropic.claude-haiku-4-5-20251001-v1:0via the Global cross-Region inference profileglobal.anthropic.claude-haiku-4-5-20251001-v1:0. Two callsites:categorizerfor the per-receipt category pick, andsummaryfor the monthly spend narrative. Sonnet 4.6 is not used — categorizing is a small, well-framed classification job that Haiku handles cleanly. - Embeddings. Not used. The chart of accounts is a short structured list; a constrained prompt beats vector retrieval here. No Knowledge Base, no S3 Vectors.
- Quotas. Default account quotas are more than enough at SMB volume; the categorizer fires at most once per receipt, and the vendor-hint lane removes the regulars.
Textract
- Receipts and single-page images. Synchronous
AnalyzeExpenseinreader— returnsSummaryFields(vendor, date, total, tax) andLineItemGroups, each with a confidence score. - Multi-page PDFs. Async
StartExpenseAnalysis; completion notified via SNS to a small continuation inreader. Most receipts are single-page, so the sync path dominates. - Formats. JPEG, PNG, and PDF natively; HEIC from phones is converted to JPEG in
intake-email/intake-uploadbefore storage. No DOCX path — receipts are images, not documents.
Queue and Function URL config
ro-intake— standard SQS queue; visibility timeout 180 s (over the reader’s 120 s timeout); redrive toro-intake-dlqafter 3 receives. The reader’s SQS event source uses batch size 1 so one bad image can’t fail a batch.intake-uploadFunction URL —AuthType: NONE, per-device bearer token verified in code; CORS limited to the upload page origin; request body size cap enforced.action-handlerFunction URL —AuthType: NONE; Slack signing-secret verification on every request; rejects requests older than 5 minutes to block replays.
SES inbound and outbound
- Set the MX record on a dedicated subdomain (e.g.
receipts.your-company.com) toinbound-smtp.ap-southeast-1.amazonaws.com. - SES inbound rule set
ro-inbound-rules: one rule with recipientreceipts@your-company.com→ spam scan → S3 PUT tos3://ro-raw-mime/<message-id>→ stop. The S3 PUT triggersintake-email. - SES outbound for the monthly summary email: verify a sender identity at
books@your-company.comwith DKIM and SPF on the parent domain. Out of sandbox by request.
IAM (least privilege per Lambda)
Each Lambda has its own role with policies scoped to exact ARNs. Sketch:
- reader role:
s3:GetObjectonro-receiptsandro-rules-source;textract:AnalyzeExpense+StartExpenseAnalysis+GetExpenseAnalysis;dynamodb:Queryon thero-receiptsdup GSI;bedrock:InvokeModelon the Haiku ARN (when categorizer runs in-process);events:PutEventson the default bus. - filing role:
secretsmanager:GetSecretValueonro/google/sa;s3:CopyObject+PutObjectonro-receipts;dynamodb:PutItemonro-receipts; outbound network tosheets.googleapis.com. - review role:
secretsmanager:GetSecretValueon the Slack bot token;dynamodb:PutItemonro-review; outbound network toslack.com. - action-handler role:
dynamodb:PutItemonro-auditandro-review;secretsmanager:GetSecretValueon the Sheets-API and Slack signing secrets;s3:CopyObjectonro-receiptsandro-rejected; outbound network tosheets.googleapis.com. - intake-email and intake-upload roles:
s3:GetObject/PutObjecton the receipt and MIME buckets;sqs:SendMessageonro-intake;ssm:GetParameteron/ro/upload-tokens/(upload only).
Slack review flow
Review cards are posted via the chat.postMessage Web API with Block Kit blocks: an image block for the receipt, a fields block for the read values, and an actions block with the Approve, Correct, and Reject buttons. Button clicks are sent by Slack to the configured Interactivity request URL, which is the action-handler Function URL. action-handler verifies the Slack signing secret, parses the action_id (approve, correct, reject), opens a modal where needed (Correct opens a pre-filled modal; Approve and Reject are one-tap), and processes the modal submission.
The Slack app needs chat:write and files:read, plus the Interactivity URL configured. The bot token lives in Secrets Manager under ro/slack/bot-token; the signing secret under ro/slack/signing-secret.
Observability and cost gates
- CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on
"error"+"throttle"+"timeout"to a CloudWatch metric for alerting. - Alarms:
ro-intake-dlqdepth > 0 (a receipt failed to process); reader Textract failure rate > 2% in 24h; action-handler signature-verification failures > 5/hour (might mean the Slack secret rotated). - X-Ray: off by default. Not worth the cost at SMB volume.
- AWS Budgets: $30/month threshold, alarm at 80% and 100%, posts to SNS topic
ro-cost-alarmsubscribed to the on-call admin’s email and Slack.
Config and secrets
Service-account credentials for the Google Sheets API live in Secrets Manager under ro/google/sa. Slack bot token and signing secret under ro/slack/*. SES sender identity lives in IAM and the verified-domain config. The confidence threshold, the “always review” category list, the chart of accounts location, and the admin owner all live in Parameter Store under /ro/config/; per-device upload tokens under /ro/upload-tokens/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.
Deploy
GitHub Actions with OIDC into a deploy role (no long-lived keys) running AWS SAM. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for ro-receipts and ro-rules-source so a bad edit can be rolled back in one click, and give the ro-intake queue a real DLQ from day one so a malformed image never silently vanishes. Total deployable surface: around eight Lambdas, three DDB tables (plus the dup GSI), four S3 buckets, one SQS queue with a DLQ, one EventBridge rule pair on the default bus (plus the Scheduler rules for digest and summary), one SES rule set, and one Budgets alarm.
That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.
All posts