How answers get grouped into themes

Key takeaways

The grouper runs once a week via EventBridge Scheduler — not on every answer.
Each answer becomes a vector via Amazon Titan Text Embeddings V2 (1024 numbers per answer).
Close vectors are clustered with plain code; the cluster sizes are the theme counts.
Claude Haiku 4.5 names each cluster and picks one real answer as the representative quote.
A cluster has to clear a minimum size to count as a theme; small ones fall into a long tail.

The grouping flow, per weekly run

Fig 3. The grouper’s flow, per weekly run. Five steps turn a week of raw answers into a handful of named, counted, quoted themes. The model labels and quotes; the clustering and the counting are plain code.

Step 1: turn each answer into a vector

The grouper reads the week’s cleaned answers from the store. For each one, it calls Amazon Titan Text Embeddings V2 and gets back a vector — a list of 1024 numbers that captures what the answer is about. The useful property is that two answers with the same meaning produce vectors that sit close together, even when they share no words. “Took forever to pay” and “the line at the till was endless” end up near each other; “loved the staff” ends up somewhere else entirely. The vectors are saved in an Amazon S3 Vectors index so the grouper can search and compare them cheaply without running a database server.

This is the step plain code simply cannot do. Keyword matching would put “till” and “pay” in different buckets and miss that they’re the same gripe. Embeddings are what make grouping by meaning possible at all — which is exactly why this is one of the few places the system reaches for a model.

Step 2: cluster the close vectors

With every answer now a point in space, grouping the close ones is a plain-math job — no model needed. The grouper runs a standard clustering routine in Python that finds dense groups of nearby points and leaves the scattered ones ungrouped. Answers that mean roughly the same thing fall into the same cluster; genuinely different answers form their own. Crucially, the number of answers in each cluster is just a count — a real, exact number, computed by counting rows. When the summary later says “61 people raised slow checkout,” that 61 is a count of actual answers, not a model’s estimate.

Step 3: set aside the tiny clusters

Not every cluster is a theme. Three people mentioning the same oddly specific thing is interesting, but it isn’t a trend, and a summary that lists fifteen “themes” of two answers each is just noise in a nicer font. The rules doc sets a min_theme_size (default around 1% of the week’s answers, with a floor of five). Any cluster below that is folded into a long tail that the summary mentions in one line (“plus a scattering of one-off comments”) but doesn’t break out. This keeps the summary to the handful of themes that actually matter.

Step 4: name each theme

Now the model earns its second keep. For each surviving cluster, the grouper sends Claude Haiku 4.5 a small sample of the answers in it and asks for a short, plain-English name: “Slow checkout,” “Friendly staff,” “Parking is hard.” The prompt is tight: “Name the shared topic in three or four words. Don’t add a topic that isn’t in these answers. Return the name only.” The model never sees the counts and never decides which answers belong to which group — the clustering already did that. It’s purely putting a readable label on a group plain code already formed.

Step 5: pick one real quote

A count and a name tell you what people said and how many; a quote tells you how it felt. For each theme, the grouper picks the answer sitting closest to the centre of its cluster — the most representative single response — and stores it verbatim as the theme’s quote. It’s a real thing a real person wrote, shown word for word, never a paraphrase the model invented. If the most central answer is unusually long, Haiku is allowed to trim it to a clean sentence, but only by cutting — never by rewording.

Why weekly, and why this split of work

The grouper runs weekly rather than on every answer for two reasons. First, grouping only makes sense in bulk — you can’t cluster one answer. Second, embedding and clustering a whole week at once is far cheaper than re-running the math every time a single answer lands. The urgent lane in Part 5 is what handles the “can’t wait” case; the grouper is deliberately the slow, thorough, weekly read.

And the split of work — model for embeddings, naming, and quote-trimming; plain code for clustering and counting — is the whole reason the numbers are trustworthy. The expensive, fuzzy judgment (what does this mean?) goes to the model. The exact, checkable arithmetic (how many?) stays in code where it can’t drift. That’s what lets the summary say a number and mean it.

Next post: how the summary gets written from these themes and reaches the owner’s inbox — and the guardrails that keep it honest.

All posts