How a policy answer gets grounded

Key takeaways

The question is searched against the handbook by meaning, not by exact words.
Only the top few sections are pulled — enough to answer, few enough to stay precise.
If nothing scores well enough, the answerer routes to “ask HR” without calling the model at all.
The model answers from the pulled sections only and must cite the section it used.
This is retrieval: search first, then answer. The model never works from memory.

The grounding flow, per question

Fig 3. The grounding flow, per question. Five steps turn a question into a search, keep the best sections, and let the model answer only from them. If nothing matches, or the model declines, the answer is an honest “ask HR.”

Search by meaning, not by exact words

Here’s the problem the search step solves. The staff member types “how many sick days do I get?” The handbook section is headed “Paid Personal Leave — Accrual and Use,” and the words “sick days” might not appear in it at all. A plain keyword search would miss it. So instead of matching words, the answerer matches meaning.

It does that with embeddings. An embedding turns a piece of text into a long list of numbers — a kind of fingerprint of what the text is about. Texts that mean similar things get similar fingerprints, even when they share no words. The answerer runs the question through Amazon Titan Text Embeddings V2 to get the question’s fingerprint, then compares it against the fingerprints of every handbook section (which the indexer computed ahead of time — that’s Part 5). The sections whose fingerprints are closest to the question’s are the ones most likely to hold the answer. “Sick days” and “paid personal leave” land close together, so the right section comes back.

Those fingerprints live in Amazon S3 Vectors — a store built for exactly this kind of search, where you ask “which of my stored fingerprints are nearest to this one?” and get an answer in milliseconds. There’s no always-on search server to pay for; you query it only when a question comes in.

Keep the context thin on purpose

The search can return ten or twenty sections, ranked by how close they are. The answerer keeps only the best three to five. That’s a deliberate constraint. Hand the model too many sections and two bad things happen: the answer gets vaguer (it tries to reconcile sections that don’t all apply), and the call costs more (more text in means more to pay for). Hand it the few sections that actually matter and the answer is sharp, cheap, and easy to cite.

There’s also a confidence floor before the model is called at all. If the best-matching section’s score is still weak — nothing in the handbook is really close to the question — the answerer skips the model entirely and routes straight to “ask HR.” No point paying for a model call to answer a question the handbook plainly doesn’t cover. The cheapest, safest answer to “the docs don’t have this” is to say so directly.

Answer from the sections only

With the top sections in hand, the answerer calls Claude Haiku 4.5 (a small, fast, cheap model that’s plenty for reading a few policy sections and writing a short answer) with a strict prompt. The shape of it: “Here is the staff member’s question. Here are the only handbook sections you may use. Answer in plain words. Quote the specific rule. If these sections do not contain the answer, reply exactly that you don’t have it in the handbook — do not guess, do not use general knowledge about employment policy.”

That last line is the whole game. Models are trained on the public internet, so they have plenty of opinions about how sick leave “usually” works. For a staff policy answerer, “usually” is poison — your business’s policy is whatever your docs say, not the industry average. By handing the model only your sections and forbidding outside knowledge, the answer is pinned to your reality. If your handbook says 12 sick days, it answers 12; it never drifts to “most companies offer 10.”

Why retrieval, and not just a big prompt

You might wonder why we search at all — why not paste the whole handbook into every model call and let it find the answer? Two reasons. First, cost and speed: a 40-page handbook in every call is slow and adds up fast at any real question volume. Searching first means each call carries only the few relevant sections. Second, and more important, precision: a model handed the entire handbook is more likely to blend unrelated policies or cite the wrong one. Handing it exactly the sections that matched the question keeps the answer tight and makes the citation in Part 4 trustworthy — the section it answers from is the section it was given.

Next post: how a policy answer stays honest — how the citation gets attached, how off-limits topics get handed to a human, and the guardrails that keep the answerer from ever inventing policy.

All posts