Part 7 of 7 · Receipt organizer series ~8 min read

Engineering reference: the receipt organizer architecture

Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the SES inbound rule set, the SQS intake config, the Function URL surfaces, the DynamoDB schemas, and the Slack review flow. Read alongside the previous six posts; this one’s the build sheet.

Region and account shape

Default region: ap-southeast-1 (Singapore). SES inbound, Textract, Bedrock Global cross-Region inference, and the queue and Function URL surfaces are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for an SMB is a receipt that waits an extra hour on the queue, not a regional outage. One AWS account dedicated to the organizer (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.

Topology

AWS topology of the receipt organizer A topology diagram with three regions stacked vertically inside one AWS account boundary. Top region: ingress. Three boxes show the three capture lanes — an SES inbound rule set with action S3 PUT to s3://ro-raw-mime/ plus a capture Lambda intake-email that extracts the image or PDF and enqueues it, a mobile-snap lane where a phone shortcut posts the photo to a Function URL intake-upload that writes to s3://ro-receipts/ and enqueues it, and a web-upload lane that drag-and-drops a file to the same Function URL. Middle region: processing. The reader Lambda is triggered by messages on the SQS queue ro-intake; it runs Textract AnalyzeExpense on the image, checks field confidence against the threshold in s3://ro-rules-source/rules.txt, looks for duplicates in DynamoDB ro-receipts, then for a clean receipt calls the categorizer which applies vendor hints or Bedrock Haiku 4.5 to pick a category; it emits one of two events to the EventBridge default bus per receipt: ro.filed for a confident record or ro.needs_review for an unsure one. Bottom region: filing and review. The filing Lambda is triggered by an EventBridge rule on ro.filed; it writes the row to the expense sheet via the Google Sheets API, files the image under s3://ro-receipts/YYYY-MM/, and writes ro-receipts. The review Lambda is triggered by ro.needs_review; it posts a review card to Slack via the Web API with the image, the read fields, the proposed category, and Approve, Correct, and Reject buttons. Slack button clicks land on a Function URL Lambda action-handler that on approve or correct writes the row to the sheet and files the image, on reject moves the image to a rejected prefix, and writes ro-audit. CloudWatch Logs collects from every Lambda at 7-day retention. Across the right edge: a small box labelled AWS Budgets alarm at $30 monthly threshold, posting to SNS topic ro-cost-alarm. A note at the bottom: every filed record links to its image — and every interaction is logged to ro-audit. Ingress SES inbound rule set ro-inbound-rules action: S3 PUT s3://ro-raw-mime/ trigger: intake-email Function URL · upload intake-upload phone shortcut + web drag-and-drop → s3://ro-receipts/ SQS · ro-intake one msg per receipt receipt_id, source, submitter, s3 key DLQ: ro-intake-dlq Intake queue all three lanes feed ro-intake Processing Lambda · reader SQS event source Textract AnalyzeExpense confidence + dup check filed / review / dup / reject Lambda · categorizer vendor hints first else Bedrock Haiku 4.5 picks from the chart, sanity + review gate EventBridge default bus ro.filed ro.needs_review (dup, reject → logged, no event) Filing & review Lambda · filing on ro.filed: Sheets API row, image to YYYY-MM/, writes ro-receipts Slack review card image + fields + [Approve][Correct] [Reject] clicks → Function URL Lambda · action-handler writes ro-audit; approve/correct → Sheet row + file image; reject → rejected prefix Every filed record links to its image — and every interaction is logged to ro-audit.
Fig 7. AWS topology, in three regions of the diagram: ingress (three lanes onto one queue), processing (the reader and categorizer emitting events), filing and review (the record files or the review card ships and the response is recorded). Every Lambda is event- or queue-driven; nothing is synchronous-chained.

Lambda functions

All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.

  • intake-email — S3 PUT trigger on s3://ro-raw-mime/. Parses the MIME tree, extracts the first image (JPEG/PNG/HEIC) or PDF attachment, or renders the HTML body to an image if the receipt is inline. Writes the original to s3://ro-receipts/<receipt-id>, records the forwarding sender as the submitter, and sends a message to the ro-intake SQS queue. Memory: 512 MB (HEIC decode and HTML render). Timeout: 60 s.
  • intake-upload — Lambda Function URL, AuthType: NONE; verifies a per-device bearer token (issued from Parameter Store under /ro/upload-tokens/) before accepting the body. Serves the drag-and-drop upload page on GET and accepts a multipart file on POST. Writes the file to s3://ro-receipts/, tags the submitter, and enqueues to ro-intake. Used by both the phone shortcut and the web page. Memory: 256 MB. Timeout: 30 s.
  • reader — SQS event source on ro-intake (batch size 1 for clean per-receipt retries). Runs Textract AnalyzeExpense on the image; for multi-page PDFs uses the async StartExpenseAnalysis + completion via SNS. Reads the confidence threshold from s3://ro-rules-source/rules.txt; checks for duplicates by querying ro-receipts on a (vendor, date, total) GSI. Decides filed/needs-review/duplicate/rejected. For filed and needs-review, invokes the categorizer in-process, then emits ro.filed or ro.needs_review. Memory: 512 MB. Timeout: 120 s. Maximum receives 3, then to ro-intake-dlq.
  • categorizer — invoked by reader (or deployable as its own function). Applies vendor hints from rules.txt first; on no match, calls Bedrock Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0 via global.anthropic.claude-haiku-4-5-20251001-v1:0) with the read fields and the chart of accounts, constrained to return one category from that list plus a reason and a confidence. Runs the sanity check (category in chart, score numeric, tax-to-total plausible) and the review-gate comparison. Memory: 256 MB. Timeout: 30 s.
  • filing — EventBridge rule on ro.filed. Writes one row to the expense sheet via the Google Sheets API (service-account credentials in Secrets Manager under ro/google/sa): date, vendor, total, tax, category, submitter, image link. Moves the image to s3://ro-receipts/YYYY-MM/ and writes the final record to ro-receipts. Memory: 256 MB. Timeout: 30 s.
  • review — EventBridge rule on ro.needs_review. Posts a Slack review card via chat.postMessage with the receipt image, the read fields, the proposed category, the model’s reason, and Approve/Correct/Reject buttons. Writes a pending row to ro-review. Memory: 256 MB. Timeout: 30 s.
  • action-handler — Lambda Function URL, public with AuthType: NONE; verifies the Slack signing secret on the request body. Triggered by Slack button clicks (Approve/Correct/Reject) and modal submissions. On approve or correct, writes the row to the sheet via the Sheets API and files the image; on a category correction, optionally appends a vendor hint to rules.txt; on reject, moves the image to s3://ro-rejected/. Always writes to ro-audit. Memory: 256 MB. Timeout: 15 s.
  • digest — EventBridge Scheduler target, weekly Sunday 6pm. Reads ro-receipts and ro-review for the week; posts a digest to a configured Slack channel summarizing what was filed and what’s still waiting in review. No Bedrock; a plain summary table. Memory: 256 MB.
  • summary — EventBridge Scheduler target, monthly on the first Monday at 9am. Reads the past month’s ro-receipts and ro-audit; calls Bedrock Haiku 4.5 to write a one-paragraph spend-by-category note; emails it via SES to the configured stakeholder list. Memory: 512 MB.

Storage

  • DynamoDB · ro-receipts — one row per receipt. PK receipt_id; attributes: source, submitter, vendor, date, total, tax, category, result (filed/needs_review/duplicate/rejected), field_scores, image_key. GSI on (vendor, date, total) for the duplicate check. On-demand.
  • DynamoDB · ro-review — one row per review item. PK receipt_id; attributes: proposed_category, reason, read_fields, status (pending/approved/corrected/rejected), slack_ts. On-demand.
  • DynamoDB · ro-audit — one row per write action of any kind. PK (receipt_id, ts); attributes: action (filed/approved/corrected/rejected), by_user, before, after. On-demand. No TTL — this is the long-term audit trail.
  • S3 · ro-receipts — the original receipt images, filed under YYYY-MM/ once a record is filed. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 7 years (the usual record-retention window).
  • S3 · ro-rules-source — mirrored chart of accounts, vendor hints, tax rules, threshold, and the voice doc as plain text. Versioning enabled.
  • S3 · ro-raw-mime — raw inbound MIME from forwarded receipts. Lifecycle to Glacier at 30 days; expiry at 7 years.
  • S3 · ro-rejected — images rejected at review or read time, kept with a one-line reason for audit.

Bedrock

  • Foundation model. anthropic.claude-haiku-4-5-20251001-v1:0 via the Global cross-Region inference profile global.anthropic.claude-haiku-4-5-20251001-v1:0. Two callsites: categorizer for the per-receipt category pick, and summary for the monthly spend narrative. Sonnet 4.6 is not used — categorizing is a small, well-framed classification job that Haiku handles cleanly.
  • Embeddings. Not used. The chart of accounts is a short structured list; a constrained prompt beats vector retrieval here. No Knowledge Base, no S3 Vectors.
  • Quotas. Default account quotas are more than enough at SMB volume; the categorizer fires at most once per receipt, and the vendor-hint lane removes the regulars.

Textract

  • Receipts and single-page images. Synchronous AnalyzeExpense in reader — returns SummaryFields (vendor, date, total, tax) and LineItemGroups, each with a confidence score.
  • Multi-page PDFs. Async StartExpenseAnalysis; completion notified via SNS to a small continuation in reader. Most receipts are single-page, so the sync path dominates.
  • Formats. JPEG, PNG, and PDF natively; HEIC from phones is converted to JPEG in intake-email/intake-upload before storage. No DOCX path — receipts are images, not documents.

Queue and Function URL config

  • ro-intake — standard SQS queue; visibility timeout 180 s (over the reader’s 120 s timeout); redrive to ro-intake-dlq after 3 receives. The reader’s SQS event source uses batch size 1 so one bad image can’t fail a batch.
  • intake-upload Function URLAuthType: NONE, per-device bearer token verified in code; CORS limited to the upload page origin; request body size cap enforced.
  • action-handler Function URLAuthType: NONE; Slack signing-secret verification on every request; rejects requests older than 5 minutes to block replays.

SES inbound and outbound

  • Set the MX record on a dedicated subdomain (e.g. receipts.your-company.com) to inbound-smtp.ap-southeast-1.amazonaws.com.
  • SES inbound rule set ro-inbound-rules: one rule with recipient receipts@your-company.com → spam scan → S3 PUT to s3://ro-raw-mime/<message-id> → stop. The S3 PUT triggers intake-email.
  • SES outbound for the monthly summary email: verify a sender identity at books@your-company.com with DKIM and SPF on the parent domain. Out of sandbox by request.

IAM (least privilege per Lambda)

Each Lambda has its own role with policies scoped to exact ARNs. Sketch:

  • reader role: s3:GetObject on ro-receipts and ro-rules-source; textract:AnalyzeExpense + StartExpenseAnalysis + GetExpenseAnalysis; dynamodb:Query on the ro-receipts dup GSI; bedrock:InvokeModel on the Haiku ARN (when categorizer runs in-process); events:PutEvents on the default bus.
  • filing role: secretsmanager:GetSecretValue on ro/google/sa; s3:CopyObject + PutObject on ro-receipts; dynamodb:PutItem on ro-receipts; outbound network to sheets.googleapis.com.
  • review role: secretsmanager:GetSecretValue on the Slack bot token; dynamodb:PutItem on ro-review; outbound network to slack.com.
  • action-handler role: dynamodb:PutItem on ro-audit and ro-review; secretsmanager:GetSecretValue on the Sheets-API and Slack signing secrets; s3:CopyObject on ro-receipts and ro-rejected; outbound network to sheets.googleapis.com.
  • intake-email and intake-upload roles: s3:GetObject/PutObject on the receipt and MIME buckets; sqs:SendMessage on ro-intake; ssm:GetParameter on /ro/upload-tokens/ (upload only).

Slack review flow

Review cards are posted via the chat.postMessage Web API with Block Kit blocks: an image block for the receipt, a fields block for the read values, and an actions block with the Approve, Correct, and Reject buttons. Button clicks are sent by Slack to the configured Interactivity request URL, which is the action-handler Function URL. action-handler verifies the Slack signing secret, parses the action_id (approve, correct, reject), opens a modal where needed (Correct opens a pre-filled modal; Approve and Reject are one-tap), and processes the modal submission.

The Slack app needs chat:write and files:read, plus the Interactivity URL configured. The bot token lives in Secrets Manager under ro/slack/bot-token; the signing secret under ro/slack/signing-secret.

Observability and cost gates

  • CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on "error" + "throttle" + "timeout" to a CloudWatch metric for alerting.
  • Alarms: ro-intake-dlq depth > 0 (a receipt failed to process); reader Textract failure rate > 2% in 24h; action-handler signature-verification failures > 5/hour (might mean the Slack secret rotated).
  • X-Ray: off by default. Not worth the cost at SMB volume.
  • AWS Budgets: $30/month threshold, alarm at 80% and 100%, posts to SNS topic ro-cost-alarm subscribed to the on-call admin’s email and Slack.

Config and secrets

Service-account credentials for the Google Sheets API live in Secrets Manager under ro/google/sa. Slack bot token and signing secret under ro/slack/*. SES sender identity lives in IAM and the verified-domain config. The confidence threshold, the “always review” category list, the chart of accounts location, and the admin owner all live in Parameter Store under /ro/config/; per-device upload tokens under /ro/upload-tokens/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.

Deploy

GitHub Actions with OIDC into a deploy role (no long-lived keys) running AWS SAM. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for ro-receipts and ro-rules-source so a bad edit can be rolled back in one click, and give the ro-intake queue a real DLQ from day one so a malformed image never silently vanishes. Total deployable surface: around eight Lambdas, three DDB tables (plus the dup GSI), four S3 buckets, one SQS queue with a DLQ, one EventBridge rule pair on the default bus (plus the Scheduler rules for digest and summary), one SES rule set, and one Budgets alarm.

That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.

All posts