Part 7 of 7 · Quote drafter series ~9 min read

Engineering reference: the quote drafter architecture

Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Knowledge Base wiring, IAM scopes, the SES inbound rule set, the presigned-upload portal flow, the PDF rendering Lambda, and the CRM adapters. Read alongside the previous six posts; this one’s the build sheet.

Region and account shape

Default region: us-east-1. SES inbound, Bedrock cross-Region inference, and S3 Vectors are all available there with current SLAs and full feature support. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the real failure mode for an SMB is the rep missing a draft, not a regional outage. One AWS account dedicated to the drafter (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.

Topology

AWS topology of the quote drafter A topology diagram with three regions stacked vertically inside one AWS account boundary. Top region: ingress. Three boxes show the three lanes — a Lambda Function URL labelled "intake-webhook" (Web form lane), an SES inbound rule set with action "S3 PUT to s3://qd-raw-mime/" plus the parser Lambda labelled "intake-ses-parser" (Sales inbox lane), and a Lambda Function URL labelled "intake-portal-presign" plus the upload bucket "s3://qd-uploads/" plus the parser Lambda labelled "intake-upload-parser" (Direct uploads lane). All three lanes write to a single SQS queue labelled "intake-queue.fifo" with content-based deduplication and a 14-day retention. Middle region: processing. The drafter Lambda labelled "drafter" reads from the intake queue, calls Bedrock for the three extractors, queries the Bedrock Knowledge Base "qd-catalog-kb" (data source: an S3 bucket s3://qd-catalog-source/ that a small drive-sync Lambda mirrors from a Google Drive folder every 5 minutes; vector store: an S3 Vectors index in s3://qd-kb-vectors/) for catalog resolution, and writes to a DynamoDB table "qd-rfqs" with the extraction result and chosen move. The pricer Lambda labelled "pricer" reads auto-draft RFQs, applies the five-stage pipeline against the in-memory catalog index lazy-loaded from the qd-catalog-source S3 bucket and refreshed when the S3 ETag changes, and writes the priced quote to "qd-drafts". The composer Lambda labelled "composer" reads priced drafts, calls Bedrock for the cover paragraph, runs the four guardrails inline, and updates "qd-drafts" with the gated draft. Bottom region: egress and review. The dispatch Lambda labelled "dispatch" pings the rep on Slack and writes to the CRM via the configured adapter. EventBridge Scheduler runs one-off rules per draft for 24-hour reminders and 48-hour escalations into the escalator Lambda labelled "escalator". When the rep clicks approve, the render-pdf Lambda labelled "render-pdf" reads the draft, generates a PDF using ReportLab, writes to "s3://qd-quote-pdfs/" with a 30-day lifecycle rule to Glacier, and emails the customer via SES outbound. CloudWatch Logs collects from every Lambda at 7-day retention. Across the right edge: a small box labelled AWS Budgets alarm at $25 monthly threshold, posting to SNS topic "qd-cost-alarm." A note at the bottom: every Lambda is invoked from a single inbox or queue — no synchronous chains. Ingress Lambda Function URL intake-webhook + intake-portal-presign (form · portal presign) no API Gateway SES inbound rule set qd-inbound-rules action: S3 PUT s3://qd-raw-mime/ trigger: intake-ses-parser S3 + Lambda upload s3://qd-uploads/ PUT event → intake-upload-parser (Textract on PDF/img) SQS · intake-queue.fifo content-based dedup · 14d retention Processing Lambda · drafter 3 extractors via Claude Haiku 4.5 catalog lookup via KB qd-catalog-kb writes → qd-rfqs (DDB) Lambda · pricer five-stage pipeline in plain Python in-memory catalog index cache from S3 (drive-sync) writes → qd-drafts (DDB) Lambda · composer + gates cover paragraph via Haiku grounded in voice doc guardrails inline: citation, no-fab-SKU, no-availability, cap Egress & review Lambda · dispatch pings rep on Slack writes thread + draft via CRM adapter EventBridge Scheduler one-off rules per draft 24h reminder + 48h escalate → Lambda escalator Lambda · render-pdf on rep approve click ReportLab → s3://qd-quote-pdfs/, SES out Every Lambda is invoked from a queue, an event, or a click — no synchronous chains.
Fig 7. AWS topology, in three regions of the diagram: ingress (three lanes into one queue), processing (drafter, pricer, composer), egress and review (dispatch, scheduler, PDF render). All Lambdas are queue- or event-driven; nothing is synchronous-chained.

Lambda functions

All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256–512 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC, so there’s no NAT Gateway and no cold-start ENI provisioning.

  • intake-webhook — Lambda Function URL with AuthType: NONE. Verifies a per-form shared secret stored in Secrets Manager (qd/forms/<form-id>/secret) and a captcha token (hCaptcha siteverify or Cloudflare Turnstile siteverify). On success, writes a DynamoDB row to qd-audit and pushes a normalized message to intake-queue.fifo. Memory: 256 MB. Timeout: 5 s.
  • intake-portal-presign — Lambda Function URL. Mints a presigned s3:PutObject URL into s3://qd-uploads/<session>/<original-name> with a 30-minute TTL and a 25 MB content-length range. Stores session metadata (buyer name, email, terms-accept timestamp) in qd-sessions (DDB, TTL = 1 hour).
  • intake-ses-parser — S3 PUT trigger on s3://qd-raw-mime/. Parses MIME, walks the tree to the latest reply, strips signatures and quoted threads using mail-parser-reply. For attachments: Textract handles PDF, PNG, JPEG, and TIFF via StartDocumentTextDetection + StartDocumentAnalysis (asynchronously to handle multi-page docs). Textract doesn’t accept DOCX, so DOCX attachments are read with python-docx in the Lambda; XLSX attachments use openpyxl. Emits a normalized message to intake-queue.fifo. Memory: 512 MB. Timeout: 60 s (the wait for Textract is via SNS notification, not blocking).
  • intake-upload-parser — S3 PUT trigger on s3://qd-uploads/. Same shape as the SES parser: Textract for non-text formats, normalized output to the queue.
  • drafter — SQS event source on intake-queue.fifo, batch size 1. Calls Bedrock InvokeModel three times in parallel (asyncio.gather) for the line-items, constraints, and context extractors using anthropic.claude-haiku-4-5-20251001-v1:0 via Global cross-Region inference. Calls RetrieveAndGenerate on Bedrock Knowledge Base qd-catalog-kb for the line-items resolution. Decides the move and writes to qd-rfqs. Pushes auto-draft moves to draft-pricer-queue; clarify, OOS, reject moves go straight to dispatch via the qd-events EventBridge bus. Memory: 1024 MB. Timeout: 90 s.
  • pricer — SQS event source on draft-pricer-queue, batch size 1. Reads the in-memory catalog index (lazy-loaded from s3://qd-catalog-source/catalog.txt on cold start; cache invalidated on the next invocation when the S3 ETag changes, so a Drive edit propagates after the next 5-minute sync). Runs the five-stage pricing pipeline; writes to qd-drafts; pushes to draft-composer-queue. Memory: 512 MB. Timeout: 30 s. No model calls.
  • composer — SQS event source on draft-composer-queue, batch size 1. One Bedrock InvokeModel call to Haiku 4.5 with the priced lines and the voice doc passages as context. Runs the four guardrails inline (Gate 1: citation check; Gate 2: SKU regex + catalog lookup; Gate 3: block-list match; Gate 4: cap flag check). On any rejection, retries up to twice; on third failure, falls back to the templated cover paragraph from the voice doc. Updates qd-drafts; emits draft.ready on EventBridge. Memory: 1024 MB. Timeout: 60 s.
  • dispatch — EventBridge rule on draft.ready, plus other move events (rfq.clarify, rfq.oos, rfq.reject). Pings the on-call rep in Slack via the Slack incoming webhook stored in Secrets Manager (qd/slack/webhook). Writes the conversation thread + draft to the configured CRM adapter. Memory: 256 MB. Timeout: 30 s.
  • escalator — EventBridge Scheduler target. Runs at the 24-hour and 48-hour points after draft.ready if the draft hasn’t been actioned. 24h: re-pings the same rep. 48h: pages the sales lead. Memory: 256 MB. Timeout: 15 s.
  • render-pdf — Lambda Function URL invoked when the rep clicks approve. Reads the draft from qd-drafts, generates a PDF via reportlab, writes to s3://qd-quote-pdfs/<rfq-id>.pdf, sends to the customer via SES outbound, writes the final state to the CRM. Memory: 512 MB. Timeout: 30 s.

Storage

  • DynamoDB · qd-audit — one row per intake event. PK rfq_id (UUIDv7); attributes: source lane, raw payload S3 key, dedupe hash, screen result. On-demand. TTL = 90 days.
  • DynamoDB · qd-rfqs — one row per RFQ post-extraction. PK rfq_id; attributes: extracted line items (with confidence + KB match), constraints, context, chosen move, drafter version. On-demand. No TTL.
  • DynamoDB · qd-drafts — one row per priced + composed draft. PK rfq_id; attributes: priced lines (with citations), cover paragraph, gate results, manager-approval flag, current state (queued, approved, edited, rejected, expired). On-demand. No TTL.
  • DynamoDB · qd-sessions — presigned-upload sessions. PK session_id; TTL = 1 hour. On-demand.
  • S3 · qd-raw-mime — raw inbound MIME. Lifecycle to Glacier at 30 days; expiry at 365 days.
  • S3 · qd-uploads — buyer-uploaded spec docs. Same lifecycle.
  • S3 · qd-quote-pdfs — rendered customer-facing PDFs. Lifecycle to Glacier at 30 days; expiry at 7 years (or your retention policy).
  • S3 Vectors · qd-kb-vectors — the Bedrock Knowledge Base vector store backing qd-catalog-kb.
  • SQS · intake-queue.fifo — FIFO with content-based deduplication (5-minute window). 14-day retention. DLQ intake-queue-dlq.fifo after 3 failures.
  • SQS · draft-pricer-queue, draft-composer-queue — standard queues. 14-day retention. DLQs after 3 failures.

Bedrock

  • Foundation model. anthropic.claude-haiku-4-5-20251001-v1:0 via the Global cross-Region inference profile global.anthropic.claude-haiku-4-5-20251001-v1:0. The drafter and composer use the same model with different system prompts; consolidating on one model keeps cost and quota management simple.
  • Embeddings. amazon.titan-embed-text-v2:0, output dimension 1024, normalized. Used by the Knowledge Base for the catalog and rules docs.
  • Knowledge Base. qd-catalog-kb, vector store on Amazon S3 Vectors at s3://qd-kb-vectors/, embedding model Titan v2. Data source: s3://qd-catalog-source/, populated by the drive-sync Lambda described below. Sync schedule: every 15 minutes via EventBridge Scheduler invoking StartIngestionJob on the data source. Bedrock KB doesn’t ship a native Google Drive connector, so the Drive folder lives one hop away through the sync Lambda; this also means a versioned S3 bucket gives you point-in-time history of every catalog change for free.
  • Lambda · drive-sync — EventBridge Scheduler target, fires every 5 minutes. Uses the Google Drive API (service-account credentials in Secrets Manager under qd/drive/sa) to export catalog.gdoc, rules.gdoc, and voice.gdoc as plain text and write them to s3://qd-catalog-source/<name>.txt if the Drive modifiedTime is newer than the S3 LastModified. After each successful sync, calls StartIngestionJob on qd-catalog-kb only if any file actually changed. Memory: 256 MB. Timeout: 30 s.
  • Quotas. Default account quotas are sufficient at SMB volume. Request a quota increase on Haiku TPS if you anticipate burst-mode RFQ volume above ~5/second.

IAM (least privilege per Lambda)

Each Lambda has its own role with policies scoped to exact ARNs. Sketch:

  • drafter role: bedrock:InvokeModel on the Haiku ARN; bedrock:Retrieve + bedrock:RetrieveAndGenerate on qd-catalog-kb; sqs:ReceiveMessage + DeleteMessage on intake-queue.fifo; dynamodb:PutItem on qd-rfqs; events:PutEvents on the qd-events bus; sqs:SendMessage on draft-pricer-queue.
  • pricer role: sqs:ReceiveMessage + DeleteMessage on draft-pricer-queue; dynamodb:GetItem on qd-rfqs; dynamodb:PutItem on qd-drafts; s3:GetObject on the catalog cache bucket; sqs:SendMessage on draft-composer-queue. No bedrock:*.
  • render-pdf role: dynamodb:GetItem + UpdateItem on qd-drafts; s3:PutObject on qd-quote-pdfs; ses:SendRawEmail from the verified sender identity; CRM-adapter outbound network access via the Lambda’s default outbound-internet (no VPC).
  • intake-portal-presign role: s3:PutObject + s3:GetObject presign permission on qd-uploads only; dynamodb:PutItem on qd-sessions. Importantly, the role can generate presigned URLs that allow PUT, but the role itself never PUTs the content; the buyer’s browser does, using the presigned URL.

SES inbound and domains

  • Set the MX record on a dedicated subdomain (e.g. quotes.your-company.com) to inbound-smtp.us-east-1.amazonaws.com.
  • Configure the SES inbound rule set qd-inbound-rules with one active rule. Conditions: recipient ends with @quotes.your-company.com. Actions, in order: scan for spam (built-in), write to s3://qd-raw-mime/<message-id>, stop. The S3 PUT triggers intake-ses-parser.
  • For SES outbound (sending the customer-facing quote PDFs), verify a separate sender identity at quotes@your-company.com and configure DKIM and SPF on the parent domain. SES is in production-mode (out of sandbox) by request.

Presigned-upload portal flow

  1. Buyer opens the static portal at https://upload.your-company.com/ (CloudFront in front of an S3 bucket; static HTML/JS only).
  2. Buyer types name + email and accepts terms. Browser POSTs to the intake-portal-presign Function URL.
  3. Lambda mints an s3:PutObject presigned URL into s3://qd-uploads/<session-id>/<sanitized-filename> with 30-minute TTL, 25 MB max content-length, and the Content-Disposition + Content-Type conditions baked into the signature. Lambda writes the session row to qd-sessions (TTL 1 hour) and returns the URL.
  4. Browser does an S3.PutObject directly with the signed URL. No content passes through Lambda.
  5. S3 PUT event triggers intake-upload-parser; from there it’s the same path as the other lanes.

PDF rendering

The render-pdf Lambda uses reportlab packaged in a Lambda layer. The PDF template lives in the deployment artifact (not in S3) so render-time is deterministic; the layout includes the company logo, a fixed header, the priced lines as a table, the cover paragraph, and a footer with the quote validity and a unique reference number tied to rfq_id. Rendering is on-demand: the template is small enough to render in a few hundred milliseconds, and rendering only when the rep approves means drafts that get edited or rejected never burn the cycles.

CRM adapters

A single crm-adapter Lambda layer with one module per CRM, switched at runtime via an environment variable (CRM_ADAPTER=hubspot|salesforce|pipedrive|drive-sheet). Each adapter implements four operations: upsert_contact(email, name, company, domain), create_deal(rfq_id, contact_id, line_items, total), attach_file(deal_id, s3_key), add_note(deal_id, text). The Drive Sheet adapter is the fallback for the smallest setups; it appends to a Google Sheet via the Sheets API with the same schema as the other adapters’ deal table.

OAuth credentials per CRM live in Secrets Manager under qd/crm/<adapter>/oauth. Refresh tokens are rotated by an EventBridge Scheduler rule firing a refresh-crm-tokens Lambda once a day.

Observability and cost gates

  • CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. One log stream per Lambda invocation. Subscription filter on the keywords "error", "throttle", "timeout" to a CloudWatch metric for alerting.
  • Alarms: queue depth on intake-queue.fifo > 50 for 5 min (someone’s posting fast and the drafter can’t keep up); DLQ depth > 0 (something failed three times); Lambda error rate per function > 1% over 5 min.
  • X-Ray: off by default. The pipeline is short and the queues handle correlation; X-Ray cost isn’t worth it at SMB volume.
  • AWS Budgets: $25/month threshold, alarm at 80% and 100%, posts to SNS topic qd-cost-alarm which subscribes the on-call rep’s email and Slack.

Config and secrets

Per-form shared secrets, the captcha key, the Slack webhook, the CRM OAuth credentials, and the SES sender identity all live in Secrets Manager under the prefix qd/. Application configuration (Bedrock model IDs, the discount cap, the block-list phrases for Gate 3, the Drive folder ID) lives in a single Parameter Store hierarchy /qd/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment; Secrets Manager values are fetched per-invocation only when the secret is actually needed.

Deploy

Whichever IaC you prefer. The only opinionated bits are: deploy Function URLs separately from API Gateway (since there isn’t one), configure the SES rule set as a separate stack since rule-set changes can affect mail flow, and turn on S3 versioning for qd-catalog-source so a bad Drive edit can be rolled back in one click. CDK with a Python stack file works well; SAM also fits. Total deployable surface: around thirteen Lambdas, four DDB tables, five S3 buckets, three SQS queues, one EventBridge bus, one Knowledge Base, one SES rule set, and one Budgets alarm.

That’s the full system. Six narrative posts and this engineering reference. The repo template (or a deployable starter) lives where the rest of my AWS scaffolding does — if you want to talk about adapting it for your business, see Work with me.

All posts