Part 5 of 7 · AWS autoposting series ~5 min read

How replies work without making things up

The bot answers from the client’s docs only — or escalates to a human. Citation required, no exceptions.

AI reply bots have a bad reputation, and they’ve earned it. They quote prices that don’t exist, promise features that aren’t real, give medical or financial advice they shouldn’t. The fix isn’t to make the model bigger — it’s to take away its freedom to make things up.

RAG reply pipeline with confidence gate and citation requirement A vertical pipeline. A Facebook user message arrives at the Meta webhook, which hits a Lambda Function URL that verifies HMAC and returns 200 within milliseconds. The message is enqueued to SQS. The Reply Lambda picks it up, embeds the user question, and searches the S3 Vectors index for the top relevant chunks. A confidence gate checks the top similarity score: if low, the message is escalated to a human via SNS and logged as unanswered. If high, Claude Haiku 4.5 generates a reply using only the retrieved chunks as context. A final citation guardrail blocks any reply that did not cite a chunk. Replies that pass go out via the Graph API back to the Facebook page. Facebook user message Meta webhook → Lambda Function URL verifies HMAC signature returns 200 within milliseconds enqueue SQS queue (with DLQ) Embed user question Bedrock Titan Search S3 Vectors top 3–5 most relevant chunks Confidence gate low Escalate to you SNS · log unanswered high Claude Haiku 4.5 — grounded answers using ONLY the retrieved chunks refuses if not covered Citation guardrail none Don’t send log only cited → Reply via Graph API
Fig 5. Two guardrails: a confidence gate before the AI runs, a citation check before the reply ships.

Why webhook then queue

Facebook’s webhook expects a quick “got it” within a couple of seconds, or it’ll keep retrying the same message. So the receiver does the bare minimum — checks the message is real, drops it into a queue, and answers Facebook right away. The slow work (looking things up, calling the AI) happens after, off the critical path. If the AI happens to be slow that day, Facebook still gets its “got it” on time. And if a message fails partway through, it lands in a separate “something went wrong” queue you can look at later, instead of vanishing.

Why a confidence gate

The vector search returns the top relevant chunks ranked by similarity. If the top match is weak — below 0.6, say — that’s the system telling you “I don’t actually know this.” Forcing the model to answer anyway is exactly how hallucinations happen.

So the confidence gate is brutal: low score, the message lands in your inbox as an alert and gets logged in an “unanswered” list. That list becomes the next batch of FAQ entries the client should add to their Drive doc. The bot improves over time without anyone training a model.

Why a citation guardrail

Even when given the right context, models sometimes free-style anyway. So the instructions to the model are blunt:

  • Answer using ONLY the provided context.
  • Return both the reply text and the IDs of the chunks you actually used.

The Reply Lambda checks the response. No cited chunk = no send. The reply is logged but never reaches Facebook. This is the structural guarantee that the bot can’t invent things — if the model didn’t ground its answer in actual content from the client’s docs, the reply doesn’t go out.

Same denylist as posts

The reply pipeline reuses the page’s denylist from the posting pipeline. A reply on the SFC page can’t mention “XAUUSD,” even by accident. A reply on DailyScalper can’t talk about theology. The same per-page rules govern both directions of the conversation.

Start in draft mode

For every new client, replies start in draft mode by default: instead of going to Facebook, they go to your inbox. You review them in batches for a week or two. Once you trust the system on a category — say, pricing questions — you flip that category to auto-send. Broader categories like general FAQ come later.

The cost is a slower rollout. The benefit is that no theological or financial mishap reaches a real user before you’ve seen what the bot would have said.

All posts