How a reply stays accurate

Fig 5. Two thresholds. The brain has to be earning its auto-send privilege every single time.

The two questions every reply has to answer

Before any auto-reply leaves your domain, the brain must say yes to two questions:

Is the answer in the knowledge file? The brain has to point at a specific passage that supports its reply. If it can’t, no draft gets written — the email goes to escalate.
Is the model confident? The model has to send back a number with its tool choice. Below the threshold, the reply becomes a draft for human approval, not an auto-send.

Both have to clear, every time. The model is fully allowed to say “I don’t know” — it just turns into a different tool call.

The knowledge file is one Drive doc

Your “knowledge file” isn’t a database, an admin panel, or a SaaS dashboard. It’s one Google Doc you edit like a normal document, with sections roughly like:

About the business — what you do, who you serve, where you’re located, hours.
Services and pricing — what you offer, what it costs, what’s included.
Common questions — the questions you’ve answered ten times each, with the answers in your tone.
Policies — refunds, cancellations, lead times, what you do and don’t support.
Tone and style — how you want replies to sound. Formal? Warm? First-name basis? Always sign off with your name?

Behind the scenes, when you save the doc, a small sync job pulls the latest version, splits it into chunks, turns each chunk into numbers a computer can search (an embedding), and saves the result to a vector index in S3 (S3 Vectors — AWS’s built-in vector storage, generally available since December 2025 at re:Invent, with up to 2 billion vectors per index). The brain searches that index for every email. There’s no separate admin screen to keep in sync; the doc is the source of truth.

How retrieval actually works

For each cleaned email, the brain runs three short steps:

Search. Turn the question into numbers, look up the top three matching passages in the vector index. Cheap and fast.
Read. The brain reads the question and the three passages together. It’s told plainly: only answer based on these passages; if they don’t cover the question, say so.
Cite. If the brain writes an answer, it has to attach the ID of the passage it used. The audit log keeps both the reply and the citation, so you can check any auto-send by reading the source it came from.

The citation is what keeps things honest. A model with no requirement to cite will happily make up prices and policies. A model that must cite either points at a real passage or admits the question isn’t covered.

The two thresholds

Confidence is a number, not a feeling. The brain sends back a score between 0 and 1 with its tool choice. Two thresholds split that range into three lanes:

0.85 and above — auto-send. The brain found a clean match in the knowledge file and the answer is almost a copy of the cited passage. These tend to be FAQ-shaped questions: “what are your hours?”, “do you ship to X?”, “how do I reschedule?”
0.6 to 0.85 — draft. The brain has a relevant passage but had to do real writing. Maybe the question covers two topics, or the passage is close but not exact. The reply is held for human approval, not sent.
Below 0.6 — escalate. No good match. The brain doesn’t even try a draft — the email goes straight to a human, with the brief from part 4 attached.

The thresholds can be changed and live in the same Drive doc as the knowledge file. Tighten them when you’re launching, loosen them once you trust the assistant.

What happens when the model is wrong anyway

Even with citations and confidence thresholds, the model will sometimes auto-send a reply you wouldn’t have. Three layers handle that:

Audit log review. Every auto-send writes the question, the cited passage, the reply, and the score to a small table. Once a week, a short report lists the auto-sent replies so you can scan for misses.
Self-correction. If you spot a bad reply, you reply to the original sender from your own inbox — the system sees that thread already has a human in it and stops auto-sending on it.
Knowledge-file edits. A bad reply almost always traces back to a missing or unclear passage. Edit the doc, and the next email of that shape gets the better answer. The system improves itself by improving its source.

What it never does

The accuracy rules cash out as a few hard refusals the brain enforces every email:

No reply without a passage citation. If the search comes back empty, the brain escalates.
No made-up numbers. If the sender asks “how much” and the file doesn’t say, the reply asks them what package they have in mind, or escalates.
No promises about future actions. The brain answers questions; it doesn’t book, schedule, or commit on your behalf.
No claims of identity beyond what’s in the file. If the file says “Allan replies personally,” the assistant signs off as Allan; if not, it signs off in a general way.

In plain words

The reply has to come from something you wrote. The brain has to point at the passage. The score has to clear the bar. If all three line up, the reply auto-sends and you barely notice it happened. If any of them slip, the email is on your desk before it ever leaves your domain. The model gets to be helpful and confident inside the file, and silent outside it.

All posts