Part 2 of 7 · Lead intake bot series ~4 min read

How a lead reaches the intake bot

Leads don’t arrive at your business through one door. Your website form posts to an HTTPS endpoint. Meta Lead Ads pushes a webhook the moment a Facebook or Instagram user submits a lead form. Google Ads pushes through a lead form asset webhook (or a poll of the conversion-import API for older campaigns). And your shared sales inbox sees the rest. The intake’s job is to fold those different mechanisms into one queue, drop duplicates, and screen out junk before any AI sees a single field.

Key takeaways

  • Three intake lanes feed one queue: website form, ad platforms (Meta + Google), and the shared sales inbox via SES.
  • Push when the platform offers it; poll when it doesn’t. Each lane has its own current-2026 reality.
  • Dedupe drops emails or phones already seen in the last 24 hours.
  • Screen kills banned-domain spam, competitor emails, and missing-required-field junk.
  • Both filters run in plain Lambda code — no AI, no token cost.

Three lanes at the door

Three intake lanes funnel into one queue A diagram with three vertical lane columns at the top and a single unified row at the bottom. Lane one, Web form: a visitor fills out the "Contact us" form on your site; the form posts JSON over HTTPS to a Lambda Function URL; the cloud verifies a per-form shared secret and a hCaptcha or Turnstile token; emits a normalized record. Lane two, Ad platforms: Meta Lead Ads (Facebook and Instagram lead forms) push a webhook the moment the user submits — the cloud verifies the X-Hub-Signature-256 HMAC, fetches the full lead data by lead_id from the Graph API since the webhook itself only carries the ID, and emits a normalized record. Google Ads either pushes via the lead form asset webhook (with the configured google_key in the request body) or is polled hourly through the conversion-import API for advertisers without webhooks; either way the cloud emits the same normalized record. Lane three, Inbox: the shared sales address is an SES inbound rule that writes the raw MIME to S3 and triggers a Lambda; the Lambda parses the message, reads any structured fields out of the body or attachments, and emits the same normalized record. All three lanes write into one shared queue. Below them, a unified row labelled "Normalize, dedupe, screen" sits across the full width: normalize folds source-specific shapes into one common lead object; dedupe drops emails or phones already seen in the last 24 hours (so triple-form-submitters and webhook retries don't spawn three rows); screen runs free in-Lambda filters for banned-domain spam, competitor email domains, banned-phrase floods, and missing-required-field submissions. An output arrow on the right reads "to qualifier, in one shape." A note at the bottom: cheap gates first — nothing reaches the AI until the source-specific noise is stripped. Lane 1 · push Web form • Visitor submits form • Posts JSON to a Lambda Function URL • Cloud verifies a shared secret + captcha • Emits a normalized record into the queue • Latency: under a second Lane 2 · push (ad platforms) Meta + Google Ads • Meta Lead Ads webhook fires on submission • Cloud verifies HMAC, fetches lead by ID • Google Ads via Lead Form Extension webhook • Or hourly poll of conversion-import API • Emits same record Lane 3 · SES inbound Sales inbox • SES receive rule writes MIME to S3 • S3 PUT triggers a parser Lambda • Strips signatures, quoted threads, trackers • Emits same shape as the form and ad lanes • Latency: a few seconds Normalize, dedupe, screen • Normalize: one shape across all sources • Dedupe: skip emails/phones seen in the last 24h (retries, multi-submits) • Screen: banned domains, competitor emails, banned-phrase spam, missing fields → to qualifier, in one shape
Fig 2. Three lanes, one queue. Push when the platform offers it, parse when it doesn’t. Cheap filters first — nothing reaches the AI until the source-specific noise is stripped.

Why three different mechanisms

The three lanes look different because the platforms behind them work differently. The intake mirrors those differences honestly instead of pretending they’re uniform.

Your own form is the easiest lane. The form posts JSON over HTTPS to a Lambda Function URL with a shared secret in the body and a captcha token in the headers. The cloud verifies both before doing any work. Captcha alone catches most automated form spam. The shared secret stops anyone from POSTing to the URL directly even if they find it in your page source. You own this lane end-to-end. No platform deprecation can break it.

Meta Lead Ads is push, with a small twist. When a Facebook or Instagram user submits a lead form attached to an ad, Meta’s webhook fires within seconds. But the payload is just the lead_id and the form ID, not the answers. The cloud verifies the X-Hub-Signature-256 HMAC (App Secret as the key, computed over the raw body, constant-time comparison), then makes a separate call to the Graph API to fetch the field values. That second call is the only fragile bit. The page access token expires every 60 days, so a small refresh worker rotates it before it lapses. Pin to v24.0 or v25.0 on outgoing calls. v18.0 sunset on 2026-01-26, v19.0 sunsets on 2026-05-21, and v20.0 sunsets on 2026-09-24.

Google Ads has two patterns. If your campaigns use a lead form asset (Google’s current name for the surface formerly called the Lead Form Extension), you can configure a webhook URL right in Google Ads with a google_key. Google posts the form fields the moment a user submits, and you have the lead immediately. If your campaigns don’t use a lead form asset (search ads with a different lead surface, older campaigns), the alternative is a small hourly poll of the conversion-import API to pick up anything new. Both patterns end at the same normalized record.

The inbox lane catches everything else. Plenty of leads still come in as plain emails: a partner referral, a reply to a cold email, a sign-up from a directory site that doesn’t do webhooks, a forwarded RFP. SES inbound rules accept email at your domain, write the raw MIME to an S3 bucket, and trigger a parser Lambda. The parser strips signatures, quoted threads, and tracking pixels (the same tools the email-assistant series uses), then emits the same normalized record as the other lanes. The bot doesn’t care that the lead came through email; the downstream code reads name, email, and message just like a form submission.

Mixing all three in the same intake means the qualifier doesn’t care which source a lead came from once it’s in the queue. The downstream code never branches on source, only on content. Source is a tag on the record, not a control-flow split.

What “normalize” actually means

Each source hands the cloud a different shape. A web form gives you the field names you defined. Meta Lead Ads returns an array of question-answer pairs keyed by your form’s field names (which aren’t always what you’d call them). Google Ads gives a flat key-value object with mostly stable names. SES gives raw MIME you have to parse. Normalization folds those into one common lead object: a source name, a stable lead ID, the contact (name, email, phone), the company (domain, name if provided), the free-text message, the timestamp, the campaign or page identifier, and a small bag of source-specific extras for the engineering reference.

The reason for putting normalization here, before anything else, is so the rest of the system reads exactly one kind of message. The qualifier doesn’t have to know what Meta calls the email field. It just reads contact.email.

Dedupe and screen, before any AI runs

Two free filters sit between the lanes and the qualifier.

Dedupe drops a lead whose email or phone has been seen in the last 24 hours. Real buyers occasionally submit your form twice in a minute (different tabs, slow connection). Webhooks retry on transient failures. Meta sometimes double-fires a lead webhook if its backend gets unsure. Without dedupe the team gets pinged twice on the same hot lead and one of them goes stale. With it, the second arrival is dropped quietly and the existing row is updated with whatever the second submission added.

Screen handles the obvious junk: banned-domain spam (the same crypto-pump message posted to a thousand contact forms), competitor email domains (a list the sales team maintains), banned-phrase floods (“buy our SEO service”), and submissions missing required fields (an email with no message). Vendor pitches with “I work for X and would love to discuss” phrasing get their own bucket. They’re archived, not deleted, so a real lead with that phrasing can be retrieved on appeal. The screen runs in plain Lambda code with a small banned-list and a regex. No AI involved.

The point of doing both before the AI runs is simple. Token spend is the only line on the bill that grows fast, and most junk is identifiable without it. Free gates first, paid gates only when the message has already proved it’s a real lead.

What this hands to the next post

By the time a lead leaves the intake, it’s in one shape, with a stable ID, no duplicates, no obvious spam, and a source tag the downstream code uses for analytics but not for branching. The next post is about what the qualifier does with that — how it actually reads the lead, extracts intent and urgency and fit signals, and starts to decide which of the four moves applies.

All posts