Part 3 of 7 · Proposal generator series ~5 min read

How a proposal gets drafted

You tapped “start the draft.” Now the generator has about ninety seconds to turn three lines of brief into a proposal that sounds like you wrote it. It does that in five steps: read the brief, find your closest past work, compute the price in plain code, write all five sections in one grounded call, then check the draft before anyone sees it. The model writes the words. It never decides the numbers.

Key takeaways

  • The build job runs as one short Lambda, kicked off the moment you tap “start the draft.”
  • Retrieval finds your closest past proposals using Titan embeddings over S3 Vectors.
  • The price summary is computed in plain Python from your rate card — no model touches the math.
  • One Claude Sonnet 4.6 call writes all five sections, grounded on your templates and retrieved work.
  • A guard re-reads the draft so the prose can only repeat numbers that were computed.

The build flow, per brief

Build flow per brief when you start a draft A vertical build-flow diagram. At the top, an input box "Clean brief" with the client, the need, the rough budget, and the due date. Below that, step one "Embed the brief" — turn the need into a vector with Titan Text Embeddings V2 so similar past work can be found. Below that, a check "Found close past proposals?" — query S3 Vectors for the nearest past sections; if none clear the similarity bar, route to "Template-only" (draft from the section shells alone). If matches are found, continue and carry the top few. The next step "Compute the price in Python" — apply the rate card and packages to the brief's budget to get a fixed price summary; no model touches this. The next step "Assemble the grounded prompt" — stitch together the brief, the retrieved past sections, the section templates, the voice notes, and the computed price into one prompt for Claude Sonnet 4.6. The next check "Write all five sections in one call" — the model returns cover, understanding of the need, approach, timeline, and price summary as structured fields; if the model omits a section or strays off the templates, route to "Retry once" with a tighter prompt; if it returns a clean set, continue. Each terminal box — Template-only, Drafted, Retry once, Held — leads into the brand and price guard covered in Part 4. A note at the bottom: the rate card holds every number; the model only restates them — change a rate and the next draft prices itself. Clean brief client · need · budget · due Step 1 Embed the brief Titan V2 vector of the need Step 2 Close past proposals? query S3 Vectors Step 3 Compute the price rate card × brief, in Python Step 4 Assemble the prompt none clear → template-only matches → ground on them Step 5 Write five sections, one call Claude Sonnet 4.6 Template-only shells, no past work Drafted five clean sections Retry once tighter prompt Held retry failed, for human if none none fail clean off-template The rate card holds every number — change a rate and the next draft prices itself.
Fig 3. The build flow, per brief, when you start a draft. Five steps turn a clean brief into a draft. Retrieval grounds it on your past work; the price is computed in code; the model only writes the prose.

Steps 1–2: find your closest past work

Your past proposals are your best template — better than any generic shell, because they already sound like you and they already solved problems like this one. The trick is finding the right past proposals for this brief. The system does that with embeddings: when each past proposal is added to the Drive folder, the generator turns each of its sections into a vector — a string of numbers that captures what the text is about — using Amazon Titan Text Embeddings V2, and stores it in S3 Vectors (Amazon’s built-in way to search vectors without running a separate database).

When a new brief comes in, the generator embeds the need the same way and asks S3 Vectors for the nearest past sections. A brief about “rebuild the booking flow” pulls up the approach and timeline from your two past booking jobs, even if those proposals never used the word “rebuild.” If nothing clears a similarity bar — say it’s the first proposal of a kind you’ve never done — the generator falls back to the plain section templates and notes that to you, so you read the first draft a little more carefully.

Step 3: compute the price in plain Python

This is the most important rule in the whole system: the model does not decide the price. A small Python function reads your rate card and packages from the rules doc, looks at the brief’s scope and rough budget, and computes a price summary — line items, subtotal, any discount, total — as plain numbers. If the brief says “around $18k” and your day rate and the scope land at $17,500, that’s the number; the function doesn’t round up to match the budget or invent a figure to fill a gap.

Keeping the math in code, not the model, is what makes the price trustworthy. A model asked to “write a price summary” will happily produce a confident, wrong number. A Python function with your rate card produces the same number every time, and you can read the function. The model is handed the computed total and told to present it — not to do arithmetic.

Steps 4–5: assemble, then write all five sections in one call

Now the generator builds one prompt. Into it go: the clean brief, the retrieved past sections, the plain section templates, the voice notes from the rules doc, and the computed price summary. The instruction is specific: “Write a sales proposal with exactly these five sections — cover, understanding of the need, approach, timeline, price summary. Match the voice of the examples. Use the price summary exactly as given; do not change any number. Do not promise anything the templates don’t support.”

One Claude Sonnet 4.6 call returns all five sections as structured fields. Sonnet 4.6 is the heavier model in the stack, used here because writing a whole coherent proposal — one that reads as a single voice across five sections and stays faithful to the grounding — is real reasoning work, the kind that earns the cost. The cheaper Haiku 4.5 handles the small jobs (tidying the brief, classifying scope); the draft is where the better model pays for itself. If the model omits a section or wanders off the templates, the generator retries once with a tighter prompt; if the retry also fails, the brief is held for a human rather than shipping a broken draft.

Why one call instead of five? Coherence. A proposal written section-by-section in separate calls reads like five people wrote it — the approach promises something the timeline doesn’t deliver, the cover oversells what the scope under-delivers. Writing all five in one pass, with the whole brief in view, keeps the document consistent with itself.

Why split the work this way

The pattern here is the same one that runs through the whole series: do the deterministic parts in code, and let the model do only the part that genuinely needs language. Retrieval is a search problem — code. Pricing is arithmetic — code. Deciding whether a draft is structurally complete — code. The one job left for the model is the one it’s actually good at: writing five sections of fluent, on-voice prose from grounded material. By the time Sonnet runs, every fact it needs has already been decided; its job is to say those facts well.

Next post: how the draft stays on brand — the voice rules, the banned-claim checks, and the price-and-date guard that re-reads the draft before it ever reaches you.

All posts