How a proposal gets drafted

Key takeaways

The build job runs as one short Lambda, kicked off the moment you tap “start the draft.”
Retrieval finds your closest past proposals using Titan embeddings over S3 Vectors.
The price summary is computed in plain Python from your rate card — no model touches the math.
One Claude Sonnet 4.6 call writes all five sections, grounded on your templates and retrieved work.
A guard re-reads the draft so the prose can only repeat numbers that were computed.

The build flow, per brief

Fig 3. The build flow, per brief, when you start a draft. Five steps turn a clean brief into a draft. Retrieval grounds it on your past work; the price is computed in code; the model only writes the prose.

Steps 1–2: find your closest past work

Your past proposals are your best template — better than any generic shell, because they already sound like you and they already solved problems like this one. The trick is finding the right past proposals for this brief. The system does that with embeddings: when each past proposal is added to the Drive folder, the generator turns each of its sections into a vector — a string of numbers that captures what the text is about — using Amazon Titan Text Embeddings V2, and stores it in S3 Vectors (Amazon’s built-in way to search vectors without running a separate database).

When a new brief comes in, the generator embeds the need the same way and asks S3 Vectors for the nearest past sections. A brief about “rebuild the booking flow” pulls up the approach and timeline from your two past booking jobs, even if those proposals never used the word “rebuild.” If nothing clears a similarity bar — say it’s the first proposal of a kind you’ve never done — the generator falls back to the plain section templates and notes that to you, so you read the first draft a little more carefully.

Step 3: compute the price in plain Python

This is the most important rule in the whole system: the model does not decide the price. A small Python function reads your rate card and packages from the rules doc, looks at the brief’s scope and rough budget, and computes a price summary — line items, subtotal, any discount, total — as plain numbers. If the brief says “around $18k” and your day rate and the scope land at $17,500, that’s the number; the function doesn’t round up to match the budget or invent a figure to fill a gap.

Keeping the math in code, not the model, is what makes the price trustworthy. A model asked to “write a price summary” will happily produce a confident, wrong number. A Python function with your rate card produces the same number every time, and you can read the function. The model is handed the computed total and told to present it — not to do arithmetic.

Steps 4–5: assemble, then write all five sections in one call

Now the generator builds one prompt. Into it go: the clean brief, the retrieved past sections, the plain section templates, the voice notes from the rules doc, and the computed price summary. The instruction is specific: “Write a sales proposal with exactly these five sections — cover, understanding of the need, approach, timeline, price summary. Match the voice of the examples. Use the price summary exactly as given; do not change any number. Do not promise anything the templates don’t support.”

One Claude Sonnet 4.6 call returns all five sections as structured fields. Sonnet 4.6 is the heavier model in the stack, used here because writing a whole coherent proposal — one that reads as a single voice across five sections and stays faithful to the grounding — is real reasoning work, the kind that earns the cost. The cheaper Haiku 4.5 handles the small jobs (tidying the brief, classifying scope); the draft is where the better model pays for itself. If the model omits a section or wanders off the templates, the generator retries once with a tighter prompt; if the retry also fails, the brief is held for a human rather than shipping a broken draft.

Why one call instead of five? Coherence. A proposal written section-by-section in separate calls reads like five people wrote it — the approach promises something the timeline doesn’t deliver, the cover oversells what the scope under-delivers. Writing all five in one pass, with the whole brief in view, keeps the document consistent with itself.

Why split the work this way

The pattern here is the same one that runs through the whole series: do the deterministic parts in code, and let the model do only the part that genuinely needs language. Retrieval is a search problem — code. Pricing is arithmetic — code. Deciding whether a draft is structurally complete — code. The one job left for the model is the one it’s actually good at: writing five sections of fluent, on-voice prose from grounded material. By the time Sonnet runs, every fact it needs has already been decided; its job is to say those facts well.

Next post: how the draft stays on brand — the voice rules, the banned-claim checks, and the price-and-date guard that re-reads the draft before it ever reaches you.

All posts