Key takeaways · verified May 2026

Single AWS account in ap-southeast-1 (Singapore); Bedrock via Global cross-Region inference.
Five subsystems: Build & Deploy, Knowledge Sync, Intake (4 lanes → SQS), Qualifier (parallel extractors + scorer + composer), Dispatch & routing.
Models: global.anthropic.claude-haiku-4-5-20251001-v1:0 + amazon.titan-embed-text-v2:0; vector store is S3 Vectors (GA Dec 2, 2025).
Lead sources: Meta Lead Ads webhook (Graph API v24/v25), Google Ads lead form asset webhook, Google Ads conversion-import poll, SES inbound for email.
Day-one paperwork: Meta App with leads_retrieval, Google Ads developer token under an MCC, SES domain MX verification.

Posts 1–6 walk through the system in plain language. This page is the dense version. Nothing softened — just the architecture as you’d sketch it on a whiteboard during a design review.

Fig 7. Full architecture, ap-southeast-1. White boxes = AWS resources; dashed AWS container; dashed grey boxes = subsystem groupings; dashed grey arrows = config feed and side branches.

Read this top-down, then column-by-column

Top row is the four external surfaces. Below it, the AWS account contains five subsystems: Build & Deploy across the top, then Knowledge Sync, then three runtime columns (Intake, Qualifier, Dispatch & routing), with a Cross-cutting strip at the bottom. Leads enter through four intake paths (three webhooks behind Lambda Function URLs plus an SES-inbound parser) and all four write into a single SQS queue qu-leads-in after deduplicating against tbl-leads on a normalized email key within a 24-hour window. The SQS event source invokes fn-process, which runs the three extractors in parallel against Bedrock Claude Haiku, runs the passive enrichment, applies the two override gates (partner-allowlist hot, reject), computes the linear fit score against the ICP doc, picks one of four moves, and on hot or warm calls fn-compose. The composer calls Bedrock Retrieve against kb-policies to fetch the top grounded chunks, then calls Bedrock ConverseStream with those chunks plus strict tool_use over four tools (answer, draft, escalate, ignore) — the streaming RAG endpoint RetrieveAndGenerateStream doesn’t accept client-side tool definitions, so we pair the two APIs explicitly. The composer then runs the four guardrails (citation, no fabricated specifics, no commit on availability, no PII in subject) and writes the chosen action to tbl-actions. Dispatch routes by move: fn-route-hot picks an owner round-robin, sets the CRM owner field atomically, fans out via SNS to Slack and optional SMS, and arms a 15-minute escalation timer; fn-route-warm packages the draft and fans out to email; fn-route-nurture tags in the CRM and enrolls in a campaign; fn-route-reject archives with a reason. fn-daily-digest emails the team summary at 8am.

Naming conventions used in the diagram

Lambda functions: fn-<purpose> — fn-intake-form, fn-intake-meta, fn-intake-gads, fn-poll-gads, fn-intake-email, fn-process, fn-compose, fn-route-hot, fn-route-warm, fn-route-nurture, fn-route-reject, fn-daily-digest, fn-drive-sync, fn-archive.
Lambda runtimes: Python 3.13 for the qualifier, composer, daily digest, drive sync, archive, and routing functions (the Bedrock SDK is more ergonomic in Python). Python 3.14 has been available on Lambda since November 2025; 3.13 is the safe production default in May 2026. Node.js 22.x is fine for fn-intake-meta and fn-intake-gads if you prefer JS for HMAC verification; Node.js 24.x is also available since 2025 and either is current.
DynamoDB tables: tbl-leads (partition key email_norm — lowercased, plus-tag-stripped — sort key seen_at, with TTL of 30 days; used for dedupe and 24-hour-window resubmission collapse), tbl-actions (partition key lead_id, sort key action_ts, with move, score, score_breakdown, owner, acked_at, reply_text, cited_passages, guardrail_flags; queried by the escalation timer to detect unacked hot leads).
SQS queues: qu-leads-in (standard queue with 5-minute visibility timeout), qu-leads-dlq (5 retries before failure goes to DLQ; CloudWatch alarm on DLQ depth > 0 fires t-alarms).
SNS topics: t-hot-pings for urgent fan-out (Slack via Amazon Q Developer + optional SMS via SNS), t-warm-drafts for normal-priority human review fan-out (SES email + optional Slack), t-alarms for general failures.
S3 layout: single bucket lead-intake-bot-data with prefixes kb-source/ (Drive mirror), inbound-mime/ (raw SES messages), drafts/{date}/ (full warm draft packages), archive/.
Knowledge Base: kb-policies, a Bedrock managed Knowledge Base with an S3 connector pointed at the synced policies prefix. Bedrock KBs do not have a native Drive connector as of 2026-05 (current native connectors: S3, Confluence, SharePoint, Salesforce, Web Crawler, plus a custom-API option), so a small fn-drive-sync Lambda mirrors the Drive folder to S3 on a 5-minute schedule. Embeddings model is amazon.titan-embed-text-v2:0; vector store is Amazon S3 Vectors (GA December 2, 2025 — cheapest quick-create option for small/medium KBs: no provisioned capacity, no monthly minimum, $0.06/GB-month for stored vectors plus tiered query charges and ~$2.50 per million API calls — provisioned and managed by Bedrock when you create the KB). OpenSearch Serverless and Aurora pgvector remain valid alternatives for higher query throughput.

Region, model access, lead-source APIs, and Drive auth

Everything runs in ap-southeast-1 (Singapore). Bedrock model invocations use the Global cross-Region inference profile (global. prefix on model IDs). Data at rest stays in Singapore. Inference may route to other regions for capacity, billed at on-demand Singapore rates.

The intake Lambdas run as Lambda Function URLs to keep webhook ingress free of API Gateway. Each lane has its own current-2026 reality, and the design accounts for the differences honestly.

Website forms (lane 1). Your form posts JSON over HTTPS to fn-intake-form’s Function URL. Two checks before any real work: a per-form shared secret in the body (random string per origin, stored in Secrets Manager and embedded in the form via a server-rendered include), and a captcha token verified against the captcha provider’s siteverify endpoint. Cloudflare Turnstile is fully free with no monthly cap. hCaptcha’s publisher tier is free up to 10K verifications per month. Google reCAPTCHA Enterprise on Google Cloud has a free tier of 10K assessments per month, with paid pricing above that. Classic free reCAPTCHA v2/v3 no longer accepts new keys — new sites must use reCAPTCHA Enterprise on Google Cloud, and existing v2/v3 keys created outside Google Cloud are being migrated through Q1 2026. CORS on the Function URL is restricted to your own domain(s). Preflight with Access-Control-Allow-Origin: https://yourdomain.com; no wildcards.

Meta Lead Ads (lane 2). Webhooks fire on the Page object with the leadgen field. You subscribe in Meta Business Manager and configure the callback URL plus a verify token. The webhook payload contains leadgen_id and form_id only, not the field answers. fn-intake-meta verifies the X-Hub-Signature-256 header (App Secret as the HMAC key, computed over the raw request body, constant-time comparison), then makes a separate call to GET /v25.0/{leadgen_id}?fields=field_data,form_id,created_time with the page access token to retrieve the answers. Page tokens last ~60 days. A small refresh worker runs weekly to exchange the current token for a fresh long-lived one before it expires. Pin to v24.0 or v25.0 on outgoing calls. v18.0 sunset on 2026-01-26, v19.0 sunsets on 2026-05-21, and v20.0 sunsets on 2026-09-24. v25.0 is the current latest as of May 2026.

Google Ads (lane 3). Two patterns supported. The lead form asset webhook (the surface formerly called the Lead Form Extension before Google’s Extensions-to-Assets rename) fires on submission to a URL you configure in the Google Ads campaign settings, with a google_key in the request body that you verify against the value you set for that lead form. For accounts using conversion-import-only lead capture (older campaigns, search ads with no lead form asset), fn-poll-gads runs on EventBridge cron cron(0 * * * ? *) hourly and queries the Google Ads API for new conversion events using a developer token plus an OAuth 2.0 refresh token (installed-app flow under your manager account). Service-account auth is supported only via Google Workspace domain-wide delegation, which is a narrow path most advertisers can’t use; the OAuth refresh-token flow is the standard production setup. Note that as of April 21, 2026, Google Ads API enforces MFA on new OAuth refresh tokens (service accounts via DwD are exempt) — plan for re-auth as part of operations. The Google Ads API requires an approved developer token under a Manager Account (MCC) — the application takes a few business days; do this on day one.

SES inbound (lane 4). SES inbound rules accept email at your domain. Verify the domain’s MX record points to SES inbound; some regions require Easy DKIM and SPF. Each rule has an action chain — for this system, write to S3 first, then trigger a Lambda. The Lambda receives an S3 PUT event with the raw MIME, parses it with the Python email module (or mailparser for richer extraction), strips signatures and quoted threads with the same heuristics the email-assistant series uses, and writes the cleaned record to qu-leads-in. Inbound mail is also a frequent vector for auto-replies and list confirmations. The parser drops anything with Auto-Submitted: headers or known list-management subject patterns before queueing. SES inbound is not available in every region. In May 2026, ap-southeast-1 supports inbound — but double-check before you commit.

CRM destinations. fn-route-hot, fn-route-warm, fn-route-nurture, and fn-route-reject each have a small CRM client that knows how to upsert_contact, set_owner, add_tag, and archive. A single CRM-adapter module switches on the configured CRM (HUBSPOT, SALESFORCE, PIPEDRIVE, or DRIVE_SHEET for tiny SMBs); each adapter handles the API specifics. The Drive Sheet adapter is the smallest fallback — fn-route-* functions append rows to a configured Google Sheet via the Sheets API. CRM API keys live in Secrets Manager.

Google Drive authentication uses a service account with domain-wide delegation over two scopes: https://www.googleapis.com/auth/drive.readonly on the policies folder, and https://www.googleapis.com/auth/spreadsheets if the Drive Sheet adapter is enabled. The credential lives in AWS Secrets Manager. The fn-drive-sync Lambda runs on a 5-minute EventBridge schedule, pulls any changed docs from Drive, writes them to lead-intake-bot-data/kb-source/, and lets the Bedrock KB’s S3 connector index from there. Editing a doc and saving propagates within about 10 minutes (5 to sync, 5 to index). Manual re-sync is one CLI call to StartIngestionJob.

The composer uses strict tool_use: four tool definitions (answer, draft, escalate, ignore) with required parameter schemas. The answer and draft tools require a citation_passages array referencing one or more retrieved passages by id. The runtime validates each citation against the retrieved set before allowing dispatch. If the model emits an answer with a citation that wasn’t in the retrieved set, the runtime downgrades to draft — the safer-by-default failure mode. The four guardrails (citation, no fabricated specifics, no commit on availability, no PII in subject) all run after the model returns and before the reply is dispatched anywhere.

What’s deliberately not on the diagram

IAM policy details — per-Lambda execution roles are minimal (one bucket prefix, one or two tables, a single Bedrock KB ID, InvokeModel on one model, the relevant outbound permissions via Secrets Manager).
Per-business policies layout — a flat Drive folder is fine for the first few months; subdivide by topic (pricing/, integrations/, icp/) once the file count grows past a couple of dozen.
X-Ray tracing — on for fn-process and fn-compose, sampling 100% during tuning, 10% in steady state.
Bedrock Guardrails — managed contextual grounding (numeric grounding + relevance scores), PII redaction, prompt-attack/jailbreak filters, and the newer Automated Reasoning checks (formal-logic policy validation, GA August 2025). The custom citation-verify, no-fabricated-specifics, no-commit-on-availability, and no-PII-in-subject steps in fn-compose are roughly the contextual-grounding and PII ideas hand-rolled; turning on Guardrails moves the threshold into console configuration and adds prompt-attack defence on every model call. Worth enabling once thresholds are stable.
Multi-language replies — the composer reads the language of the inbound message and falls back to escalate if the language isn’t in the configured set. Adding a language is a config edit and a translated voice-file section, not a code change.
Multi-tenant variant — if running this on behalf of multiple SMBs, namespace the KB and tables per tenant and inject tenant_id into every record. The architecture doesn’t change shape; the IDs do.
Step Functions vs in-Lambda orchestration — the per-lead pipeline (extract → score → pick → compose → route) fits comfortably inside a single Lambda invocation under the 15-minute limit. Step Functions becomes worth it only if you need long-poll waits between human approval and send; for the synchronous draft package pattern shown here, in-Lambda is simpler and cheaper.
Backfill — on day one the system is empty of historical leads. A one-shot backfill script can populate tbl-leads with existing email keys (so re-engagements get re-evaluated as “known contact” instead of cold leads) without triggering a flood of belated drafts. Off the diagram because it runs once.

If you’re recreating this

Day-one paperwork. If you’re going to use Meta Lead Ads, register a Meta App in Business Manager and submit it for the leads_retrieval permission. Review takes a few business days. If you’re going to use Google Ads, request a Google Ads API developer token on day one. It requires a Manager Account (MCC) and the application takes a few business days. SES inbound requires verifying your domain’s MX record points to SES, which can take several hours to propagate.

Start with Build & Deploy alone (a single Lambda, no triggers). Once git push reliably updates an empty stack, wire up fn-drive-sync with one short ICP doc and confirm the doc lands in S3 within five minutes. Create the Bedrock Knowledge Base over that S3 prefix and confirm a one-shot Retrieve call returns a passage. Then add one intake lane. fn-intake-form is the easiest — you control the form, and you can iterate on the shared secret and captcha pattern without waiting on platform approvals. Then add the SQS-driven fn-process with the three extractors, the override gates, the linear scorer, and the four-move picker. Then fn-compose with strict tool_use and the four guardrails. (This is the part most worth integration-testing. Intentionally try to make the model cite a passage outside the retrieved set, fabricate a discount, or commit to a specific time. Confirm each guardrail downgrades the reply correctly.) Then fn-route-warm and fn-route-hot. Then add the Meta and Google Ads intake lanes once the offline path works end-to-end. Cross-cutting (audit, logs, alarms, budget, archive) goes in from day one.

All posts