Engineering reference: the quote drafter architecture
Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Knowledge Base wiring, IAM scopes, the SES inbound rule set, the presigned-upload portal flow, the PDF rendering Lambda, and the CRM adapters. Read alongside the previous six posts; this one’s the build sheet.
Region and account shape
Default region: us-east-1. SES inbound, Bedrock cross-Region inference, and S3 Vectors are all available there with current SLAs and full feature support. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the real failure mode for an SMB is the rep missing a draft, not a regional outage. One AWS account dedicated to the drafter (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.
Topology
Lambda functions
All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256–512 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC, so there’s no NAT Gateway and no cold-start ENI provisioning.
intake-webhook— Lambda Function URL withAuthType: NONE. Verifies a per-form shared secret stored in Secrets Manager (qd/forms/<form-id>/secret) and a captcha token (hCaptchasiteverifyor Cloudflare Turnstilesiteverify). On success, writes a DynamoDB row toqd-auditand pushes a normalized message tointake-queue.fifo. Memory: 256 MB. Timeout: 5 s.intake-portal-presign— Lambda Function URL. Mints a presigneds3:PutObjectURL intos3://qd-uploads/<session>/<original-name>with a 30-minute TTL and a 25 MB content-length range. Stores session metadata (buyer name, email, terms-accept timestamp) inqd-sessions(DDB, TTL = 1 hour).intake-ses-parser— S3 PUT trigger ons3://qd-raw-mime/. Parses MIME, walks the tree to the latest reply, strips signatures and quoted threads usingmail-parser-reply. For attachments: Textract handles PDF, PNG, JPEG, and TIFF viaStartDocumentTextDetection+StartDocumentAnalysis(asynchronously to handle multi-page docs). Textract doesn’t accept DOCX, so DOCX attachments are read withpython-docxin the Lambda; XLSX attachments useopenpyxl. Emits a normalized message tointake-queue.fifo. Memory: 512 MB. Timeout: 60 s (the wait for Textract is via SNS notification, not blocking).intake-upload-parser— S3 PUT trigger ons3://qd-uploads/. Same shape as the SES parser: Textract for non-text formats, normalized output to the queue.drafter— SQS event source onintake-queue.fifo, batch size 1. Calls BedrockInvokeModelthree times in parallel (asyncio.gather) for the line-items, constraints, and context extractors usinganthropic.claude-haiku-4-5-20251001-v1:0via Global cross-Region inference. CallsRetrieveAndGenerateon Bedrock Knowledge Baseqd-catalog-kbfor the line-items resolution. Decides the move and writes toqd-rfqs. Pushes auto-draft moves todraft-pricer-queue; clarify, OOS, reject moves go straight to dispatch via theqd-eventsEventBridge bus. Memory: 1024 MB. Timeout: 90 s.pricer— SQS event source ondraft-pricer-queue, batch size 1. Reads the in-memory catalog index (lazy-loaded froms3://qd-catalog-source/catalog.txton cold start; cache invalidated on the next invocation when the S3 ETag changes, so a Drive edit propagates after the next 5-minute sync). Runs the five-stage pricing pipeline; writes toqd-drafts; pushes todraft-composer-queue. Memory: 512 MB. Timeout: 30 s. No model calls.composer— SQS event source ondraft-composer-queue, batch size 1. One BedrockInvokeModelcall to Haiku 4.5 with the priced lines and the voice doc passages as context. Runs the four guardrails inline (Gate 1: citation check; Gate 2: SKU regex + catalog lookup; Gate 3: block-list match; Gate 4: cap flag check). On any rejection, retries up to twice; on third failure, falls back to the templated cover paragraph from the voice doc. Updatesqd-drafts; emitsdraft.readyon EventBridge. Memory: 1024 MB. Timeout: 60 s.dispatch— EventBridge rule ondraft.ready, plus other move events (rfq.clarify,rfq.oos,rfq.reject). Pings the on-call rep in Slack via the Slack incoming webhook stored in Secrets Manager (qd/slack/webhook). Writes the conversation thread + draft to the configured CRM adapter. Memory: 256 MB. Timeout: 30 s.escalator— EventBridge Scheduler target. Runs at the 24-hour and 48-hour points afterdraft.readyif the draft hasn’t been actioned. 24h: re-pings the same rep. 48h: pages the sales lead. Memory: 256 MB. Timeout: 15 s.render-pdf— Lambda Function URL invoked when the rep clicks approve. Reads the draft fromqd-drafts, generates a PDF viareportlab, writes tos3://qd-quote-pdfs/<rfq-id>.pdf, sends to the customer via SES outbound, writes the final state to the CRM. Memory: 512 MB. Timeout: 30 s.
Storage
- DynamoDB ·
qd-audit— one row per intake event. PKrfq_id(UUIDv7); attributes: source lane, raw payload S3 key, dedupe hash, screen result. On-demand. TTL = 90 days. - DynamoDB ·
qd-rfqs— one row per RFQ post-extraction. PKrfq_id; attributes: extracted line items (with confidence + KB match), constraints, context, chosen move, drafter version. On-demand. No TTL. - DynamoDB ·
qd-drafts— one row per priced + composed draft. PKrfq_id; attributes: priced lines (with citations), cover paragraph, gate results, manager-approval flag, current state (queued,approved,edited,rejected,expired). On-demand. No TTL. - DynamoDB ·
qd-sessions— presigned-upload sessions. PKsession_id; TTL = 1 hour. On-demand. - S3 ·
qd-raw-mime— raw inbound MIME. Lifecycle to Glacier at 30 days; expiry at 365 days. - S3 ·
qd-uploads— buyer-uploaded spec docs. Same lifecycle. - S3 ·
qd-quote-pdfs— rendered customer-facing PDFs. Lifecycle to Glacier at 30 days; expiry at 7 years (or your retention policy). - S3 Vectors ·
qd-kb-vectors— the Bedrock Knowledge Base vector store backingqd-catalog-kb. - SQS ·
intake-queue.fifo— FIFO with content-based deduplication (5-minute window). 14-day retention. DLQintake-queue-dlq.fifoafter 3 failures. - SQS ·
draft-pricer-queue,draft-composer-queue— standard queues. 14-day retention. DLQs after 3 failures.
Bedrock
- Foundation model.
anthropic.claude-haiku-4-5-20251001-v1:0via the Global cross-Region inference profileglobal.anthropic.claude-haiku-4-5-20251001-v1:0. The drafter and composer use the same model with different system prompts; consolidating on one model keeps cost and quota management simple. - Embeddings.
amazon.titan-embed-text-v2:0, output dimension 1024, normalized. Used by the Knowledge Base for the catalog and rules docs. - Knowledge Base.
qd-catalog-kb, vector store on Amazon S3 Vectors ats3://qd-kb-vectors/, embedding model Titan v2. Data source:s3://qd-catalog-source/, populated by thedrive-syncLambda described below. Sync schedule: every 15 minutes via EventBridge Scheduler invokingStartIngestionJobon the data source. Bedrock KB doesn’t ship a native Google Drive connector, so the Drive folder lives one hop away through the sync Lambda; this also means a versioned S3 bucket gives you point-in-time history of every catalog change for free. - Lambda ·
drive-sync— EventBridge Scheduler target, fires every 5 minutes. Uses the Google Drive API (service-account credentials in Secrets Manager underqd/drive/sa) to exportcatalog.gdoc,rules.gdoc, andvoice.gdocas plain text and write them tos3://qd-catalog-source/<name>.txtif the DrivemodifiedTimeis newer than the S3LastModified. After each successful sync, callsStartIngestionJobonqd-catalog-kbonly if any file actually changed. Memory: 256 MB. Timeout: 30 s. - Quotas. Default account quotas are sufficient at SMB volume. Request a quota increase on Haiku TPS if you anticipate burst-mode RFQ volume above ~5/second.
IAM (least privilege per Lambda)
Each Lambda has its own role with policies scoped to exact ARNs. Sketch:
- drafter role:
bedrock:InvokeModelon the Haiku ARN;bedrock:Retrieve+bedrock:RetrieveAndGenerateonqd-catalog-kb;sqs:ReceiveMessage+DeleteMessageonintake-queue.fifo;dynamodb:PutItemonqd-rfqs;events:PutEventson theqd-eventsbus;sqs:SendMessageondraft-pricer-queue. - pricer role:
sqs:ReceiveMessage+DeleteMessageondraft-pricer-queue;dynamodb:GetItemonqd-rfqs;dynamodb:PutItemonqd-drafts;s3:GetObjecton the catalog cache bucket;sqs:SendMessageondraft-composer-queue. Nobedrock:*. - render-pdf role:
dynamodb:GetItem+UpdateItemonqd-drafts;s3:PutObjectonqd-quote-pdfs;ses:SendRawEmailfrom the verified sender identity; CRM-adapter outbound network access via the Lambda’s default outbound-internet (no VPC). - intake-portal-presign role:
s3:PutObject+s3:GetObjectpresign permission onqd-uploadsonly;dynamodb:PutItemonqd-sessions. Importantly, the role can generate presigned URLs that allow PUT, but the role itself never PUTs the content; the buyer’s browser does, using the presigned URL.
SES inbound and domains
- Set the MX record on a dedicated subdomain (e.g.
quotes.your-company.com) toinbound-smtp.us-east-1.amazonaws.com. - Configure the SES inbound rule set
qd-inbound-ruleswith one active rule. Conditions: recipient ends with@quotes.your-company.com. Actions, in order: scan for spam (built-in), write tos3://qd-raw-mime/<message-id>, stop. The S3 PUT triggersintake-ses-parser. - For SES outbound (sending the customer-facing quote PDFs), verify a separate sender identity at
quotes@your-company.comand configure DKIM and SPF on the parent domain. SES is in production-mode (out of sandbox) by request.
Presigned-upload portal flow
- Buyer opens the static portal at
https://upload.your-company.com/(CloudFront in front of an S3 bucket; static HTML/JS only). - Buyer types name + email and accepts terms. Browser POSTs to the
intake-portal-presignFunction URL. - Lambda mints an
s3:PutObjectpresigned URL intos3://qd-uploads/<session-id>/<sanitized-filename>with 30-minute TTL, 25 MB max content-length, and theContent-Disposition+Content-Typeconditions baked into the signature. Lambda writes the session row toqd-sessions(TTL 1 hour) and returns the URL. - Browser does an
S3.PutObjectdirectly with the signed URL. No content passes through Lambda. - S3 PUT event triggers
intake-upload-parser; from there it’s the same path as the other lanes.
PDF rendering
The render-pdf Lambda uses reportlab packaged in a Lambda layer. The PDF template lives in the deployment artifact (not in S3) so render-time is deterministic; the layout includes the company logo, a fixed header, the priced lines as a table, the cover paragraph, and a footer with the quote validity and a unique reference number tied to rfq_id. Rendering is on-demand: the template is small enough to render in a few hundred milliseconds, and rendering only when the rep approves means drafts that get edited or rejected never burn the cycles.
CRM adapters
A single crm-adapter Lambda layer with one module per CRM, switched at runtime via an environment variable (CRM_ADAPTER=hubspot|salesforce|pipedrive|drive-sheet). Each adapter implements four operations: upsert_contact(email, name, company, domain), create_deal(rfq_id, contact_id, line_items, total), attach_file(deal_id, s3_key), add_note(deal_id, text). The Drive Sheet adapter is the fallback for the smallest setups; it appends to a Google Sheet via the Sheets API with the same schema as the other adapters’ deal table.
OAuth credentials per CRM live in Secrets Manager under qd/crm/<adapter>/oauth. Refresh tokens are rotated by an EventBridge Scheduler rule firing a refresh-crm-tokens Lambda once a day.
Observability and cost gates
- CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. One log stream per Lambda invocation. Subscription filter on the keywords
"error","throttle","timeout"to a CloudWatch metric for alerting. - Alarms: queue depth on
intake-queue.fifo> 50 for 5 min (someone’s posting fast and the drafter can’t keep up); DLQ depth > 0 (something failed three times); Lambda error rate per function > 1% over 5 min. - X-Ray: off by default. The pipeline is short and the queues handle correlation; X-Ray cost isn’t worth it at SMB volume.
- AWS Budgets: $25/month threshold, alarm at 80% and 100%, posts to SNS topic
qd-cost-alarmwhich subscribes the on-call rep’s email and Slack.
Config and secrets
Per-form shared secrets, the captcha key, the Slack webhook, the CRM OAuth credentials, and the SES sender identity all live in Secrets Manager under the prefix qd/. Application configuration (Bedrock model IDs, the discount cap, the block-list phrases for Gate 3, the Drive folder ID) lives in a single Parameter Store hierarchy /qd/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment; Secrets Manager values are fetched per-invocation only when the secret is actually needed.
Deploy
Whichever IaC you prefer. The only opinionated bits are: deploy Function URLs separately from API Gateway (since there isn’t one), configure the SES rule set as a separate stack since rule-set changes can affect mail flow, and turn on S3 versioning for qd-catalog-source so a bad Drive edit can be rolled back in one click. CDK with a Python stack file works well; SAM also fits. Total deployable surface: around thirteen Lambdas, four DDB tables, five S3 buckets, three SQS queues, one EventBridge bus, one Knowledge Base, one SES rule set, and one Budgets alarm.
That’s the full system. Six narrative posts and this engineering reference. The repo template (or a deployable starter) lives where the rest of my AWS scaffolding does — if you want to talk about adapting it for your business, see Work with me.
All posts