Part 5 of 7 · Email assistant series ~5 min read

How a reply stays accurate

An auto-reply is only safe if it comes from something you wrote. The brain answers from passages in your knowledge file, points at the one it used, and refuses to send if it can’t. Two thresholds, one safety net, no made-up prices.

Reply accuracy: retrieve, score, branch on confidence A vertical flow showing how the brain decides whether an answer can go out without a human checking. At the top, the cleaned question from the reader. An arrow leads to a Retrieve step that pulls the top three relevant passages from the knowledge file (an S3 vector index). The retrieved passages flow into a Compose step where the brain drafts a reply that must cite at least one passage. Then a Score step that produces a confidence number. From the score, three lanes branch out by threshold. High score (greater than 0.85) goes to Auto-send. Mid score (between 0.6 and 0.85) goes to Human draft queue. Low score (less than 0.6) goes to Escalate, which forwards the email to a human. A bottom note reads: every auto-reply has a citation. No citation, no auto-send. Question cleaned email from the reader Retrieve top 3 passages from knowledge file Compose draft reply, must cite a passage Score confidence in match + answer ≥ 0.85 0.6 – 0.85 < 0.6 Auto-send SES outbound, cited reply, audit log Draft queue reply staged, approve in seconds on your phone Escalate forward to a human, no draft attempted, brief attached Every auto-reply has a citation. No citation, no auto-send.
Fig 5. Two thresholds. The brain has to be earning its auto-send privilege every single time.

The two questions every reply has to answer

Before any auto-reply leaves your domain, the brain must say yes to two questions:

  1. Is the answer in the knowledge file? The brain has to point at a specific passage that supports its reply. If it can’t, no draft gets written — the email goes to escalate.
  2. Is the model confident? The model has to send back a number with its tool choice. Below the threshold, the reply becomes a draft for human approval, not an auto-send.

Both have to clear, every time. The model is fully allowed to say “I don’t know” — it just turns into a different tool call.

The knowledge file is one Drive doc

Your “knowledge file” isn’t a database, an admin panel, or a SaaS dashboard. It’s one Google Doc you edit like a normal document, with sections roughly like:

  • About the business — what you do, who you serve, where you’re located, hours.
  • Services and pricing — what you offer, what it costs, what’s included.
  • Common questions — the questions you’ve answered ten times each, with the answers in your tone.
  • Policies — refunds, cancellations, lead times, what you do and don’t support.
  • Tone and style — how you want replies to sound. Formal? Warm? First-name basis? Always sign off with your name?

Behind the scenes, when you save the doc, a small sync job pulls the latest version, splits it into chunks, turns each chunk into numbers a computer can search (an embedding), and saves the result to a vector index in S3 (S3 Vectors — AWS’s built-in vector storage, generally available since December 2025 at re:Invent, with up to 2 billion vectors per index). The brain searches that index for every email. There’s no separate admin screen to keep in sync; the doc is the source of truth.

How retrieval actually works

For each cleaned email, the brain runs three short steps:

  1. Search. Turn the question into numbers, look up the top three matching passages in the vector index. Cheap and fast.
  2. Read. The brain reads the question and the three passages together. It’s told plainly: only answer based on these passages; if they don’t cover the question, say so.
  3. Cite. If the brain writes an answer, it has to attach the ID of the passage it used. The audit log keeps both the reply and the citation, so you can check any auto-send by reading the source it came from.

The citation is what keeps things honest. A model with no requirement to cite will happily make up prices and policies. A model that must cite either points at a real passage or admits the question isn’t covered.

The two thresholds

Confidence is a number, not a feeling. The brain sends back a score between 0 and 1 with its tool choice. Two thresholds split that range into three lanes:

  • 0.85 and above — auto-send. The brain found a clean match in the knowledge file and the answer is almost a copy of the cited passage. These tend to be FAQ-shaped questions: “what are your hours?”, “do you ship to X?”, “how do I reschedule?”
  • 0.6 to 0.85 — draft. The brain has a relevant passage but had to do real writing. Maybe the question covers two topics, or the passage is close but not exact. The reply is held for human approval, not sent.
  • Below 0.6 — escalate. No good match. The brain doesn’t even try a draft — the email goes straight to a human, with the brief from part 4 attached.

The thresholds can be changed and live in the same Drive doc as the knowledge file. Tighten them when you’re launching, loosen them once you trust the assistant.

What happens when the model is wrong anyway

Even with citations and confidence thresholds, the model will sometimes auto-send a reply you wouldn’t have. Three layers handle that:

  • Audit log review. Every auto-send writes the question, the cited passage, the reply, and the score to a small table. Once a week, a short report lists the auto-sent replies so you can scan for misses.
  • Self-correction. If you spot a bad reply, you reply to the original sender from your own inbox — the system sees that thread already has a human in it and stops auto-sending on it.
  • Knowledge-file edits. A bad reply almost always traces back to a missing or unclear passage. Edit the doc, and the next email of that shape gets the better answer. The system improves itself by improving its source.

What it never does

The accuracy rules cash out as a few hard refusals the brain enforces every email:

  • No reply without a passage citation. If the search comes back empty, the brain escalates.
  • No made-up numbers. If the sender asks “how much” and the file doesn’t say, the reply asks them what package they have in mind, or escalates.
  • No promises about future actions. The brain answers questions; it doesn’t book, schedule, or commit on your behalf.
  • No claims of identity beyond what’s in the file. If the file says “Allan replies personally,” the assistant signs off as Allan; if not, it signs off in a general way.

In plain words

The reply has to come from something you wrote. The brain has to point at the passage. The score has to clear the bar. If all three line up, the reply auto-sends and you barely notice it happened. If any of them slip, the email is on your desk before it ever leaves your domain. The model gets to be helpful and confident inside the file, and silent outside it.

All posts