Part 1 of 7 · Survey analyzer series ~5 min read

A survey analyzer on AWS for a few dollars a month

A small business runs a survey and gets back a pile of free-text answers. The “anything else you’d like to tell us?” box, filled in by three hundred people. Most owners scroll the first twenty, feel like they’ve got the gist, and never read the rest. The real signal — the same complaint raised forty different ways, the one furious customer buried on page nine — gets lost in the scroll. Reading every answer is slow, and skimming a few is how you miss the thing that actually matters. This post walks through the design of a small analyzer that reads every answer, groups them into the handful of themes people actually raised, pulls a real quote for each, and emails you a short summary — while flagging anything urgent for a human the moment it lands.

Key takeaways

  • Three sources for answers: a form-submit lane, an inbox forwarding lane, and a Drive sheet of exports.
  • Every answer gets a quick urgent check the moment it lands; everything else waits for the weekly grouping pass.
  • Grouping is embeddings plus plain-code clustering; a model only names each theme and picks one real quote.
  • The weekly summary names each theme, counts it, and quotes it — written from the groups, never one answer.
  • Designed on AWS for about $3.20/month at typical small-business volume.

The whole system on one page

Before any code, here’s the shape of what we’re designing.

System architecture: three sources, three pieces inside AWS At the top, three external boxes in a row. Far left, "Survey answers" — the free-text responses arriving from a form-submit lane that posts each response the moment it is sent, an inbox forwarding lane for emailed replies and exports, and a Drive sheet you paste exports into. Centre, "Rules and voice" — a Drive folder with a rules doc covering what counts as urgent, who to notify, and how big a theme has to be to report, plus a voice doc with the summary tone and the email layout. Far right, "Owner and on-call" — the owner who reads the weekly summary, and the on-call person who gets an urgent flag the moment an angry or at-risk answer lands. Each connects via an arrow to the AWS account container below. Survey answers have an outgoing arrow into AWS. Rules and voice feed in to ground every theme and every summary. The owner receives the weekly summary; the on-call person receives the urgent flag with the full answer text. Inside the AWS account are three components in a row, mirroring the layout above. On the left, the Answer intake — receives answers from each source, runs the quick urgent check, and writes the cleaned answer into the store. In the middle, the Grouper — runs weekly; turns each answer into a vector with Titan embeddings; clusters the vectors with plain code; asks Haiku to name each theme and pick a quote. On the right, the Summary — Sonnet writes the short summary from the grouped themes and SES emails it to the owner. Internal arrows flow left to right. A note at the bottom reads: the summary is written from the grouped themes — never from one cherry-picked answer — and anything urgent jumps the weekly queue. Survey answers form, inbox, sheet Rules and voice urgent rules, tone Owner + on-call where results land answers in grounds summary + urgent flag AWS account Answer intake clean, urgent-check, add to the store Grouper embed, cluster, name themes, pick quotes Summary Sonnet writes it, SES emails the owner answer themes The summary comes from the grouped themes — and anything urgent jumps the weekly queue.
Fig 1. Three sources outside, three pieces inside AWS. Answers flow in from a form-submit lane, an inbox forwarding lane, and a Drive sheet. The Grouper runs weekly and turns answers into themes. The Summary piece writes a short email and the urgent lane jumps the queue.

What you set up once (the outside)

  • Survey answers. The free-text responses to your survey. The backbone lane is the form-submit lane: your survey form posts each response to a small endpoint the moment someone hits send. Two other lanes (covered in Part 2) catch the rest — an inbox-forwarding lane (forward an email reply or an export to a dedicated address and the analyzer takes it from there) and a Drive sheet (paste an export in and a small sync picks it up). Each answer carries a little context if you have it: the survey it came from, the date, and a rating if your survey collected one.
  • A rules folder. Two short Google Docs in a Drive folder. The rules doc says what counts as urgent — an angry customer, a threat to leave, a safety or legal concern, a request for a callback — who gets the urgent flag, and how big a theme has to be before it’s worth reporting (so a single one-off comment doesn’t become a “theme”). The voice doc holds the tone for the summary and the email layout: how warm, how blunt, how long.
  • Owner and on-call. The owner gets the weekly summary email. The on-call person gets the urgent flag the moment an at-risk answer lands — with the full text and a plain note on why it tripped. In a small shop these are often the same person; the system just lets them be different.

What runs on each answer and each week (the inside)

  • The answer intake. Three sources feed the store. Each new answer is cleaned (strip signatures, drop blanks, trim), then run through a quick urgent check — a small, cheap model call that asks “does this need a human today?” If yes, it’s flagged right away (Part 5). Either way the cleaned answer is written to the store with its context.
  • The grouper. Runs once a week. Reads every answer from the period. Turns each one into a vector with Titan embeddings — a vector is just a list of numbers that captures what the answer is about, so answers that mean the same thing land close together even when the words differ. Plain-code clustering then groups the close ones. For each cluster, a Claude Haiku 4.5 call writes a short plain-English name (“Slow checkout”, “Loved the staff”) and picks one real answer that best represents the group. The clustering math is plain Python; the model only labels.
  • The summary. Reads the named themes, their counts, and their quotes. A Claude Sonnet 4.6 call writes a short summary: the response count, the overall mood, the handful of themes with counts and quotes, and a closing line on anything flagged urgent that week. SES emails it to the owner. The summary is written from the groups, never from one cherry-picked answer.

In plain words

You send a post-visit survey and 280 people reply to the open box. Through the week, each reply lands, gets cleaned, and gets a quick urgent check. On Tuesday, one reply says “third time the order was wrong, I’m done, cancel my account” — the urgent check trips and your on-call person gets it within a minute, full text attached, before it ever shows up in a summary. The other 279 wait. On Sunday night the grouper runs: it turns every answer into a vector, clusters them, and finds five real themes — “slow checkout” (61 answers), “friendly staff” (44), “parking is hard” (38), “prices crept up” (22), and a long tail. Monday morning you get one short email: the mood, those five themes with counts, one real quote each, and a note that one urgent answer came in Tuesday and was handled. You read the whole picture in ninety seconds instead of skimming twenty replies and guessing.

The cost of running this is about $3.20 a month at SMB volume. The cost of not running it is the survey you paid to send and then only half-read — and the angry customer on page nine you never saw until they were already gone.

Design rules that shaped every decision

  • The summary is written from the grouped themes, with counts and real quotes — never from one answer that caught the model’s eye.
  • Urgent answers jump the queue. An angry customer doesn’t wait until Sunday; they go to a human the minute they land.
  • The model labels and writes; the grouping math is plain code. A theme’s count is a real count, not a guess.
  • Every quote in the summary is a real answer somebody actually wrote, shown verbatim, not a paraphrase.
  • A theme has to clear a minimum size to be reported, so one stray comment never gets dressed up as a trend.
  • Every urgent flag and every weekly run is logged, so you can audit what the analyzer saw and did months later.

Why this shape

Most teams deal with open-ended survey answers in one of three ways: read a handful and call it a sense of the room, run every answer past a person who burns an afternoon tagging them, or never open the export at all. The handful misleads — the first twenty replies are not the other 260. The afternoon of tagging is real work that gets skipped the moment the week gets busy. And the unopened export is the most common outcome of all: you paid to gather feedback and then never actually heard it.

The setup above keeps the answers somewhere the team already has them, but adds a small system that reads every one. It uses AI exactly where plain code can’t cope — grouping messy free text by meaning — and plain code everywhere else, so the counts are trustworthy and the bill stays tiny. It surfaces the few themes that matter instead of a wall of raw text. And it never makes you wait a week for the one answer that needed you today. The analyzer is quiet on the weeks nothing stands out; it only speaks up with the short version and the thing that can’t wait.

The next four posts walk through each piece in turn: how a survey answer gets collected, how answers get grouped into themes, how a summary reaches the owner, and how an urgent answer gets flagged. One diagram per post. A cost breakdown and a final engineering reference at the end.

All posts