A survey analyzer on AWS for a few dollars a month

Key takeaways

Three sources for answers: a form-submit lane, an inbox forwarding lane, and a Drive sheet of exports.
Every answer gets a quick urgent check the moment it lands; everything else waits for the weekly grouping pass.
Grouping is embeddings plus plain-code clustering; a model only names each theme and picks one real quote.
The weekly summary names each theme, counts it, and quotes it — written from the groups, never one answer.
Designed on AWS for about $3.20/month at typical small-business volume.

The whole system on one page

Before any code, here’s the shape of what we’re designing.

Fig 1. Three sources outside, three pieces inside AWS. Answers flow in from a form-submit lane, an inbox forwarding lane, and a Drive sheet. The Grouper runs weekly and turns answers into themes. The Summary piece writes a short email and the urgent lane jumps the queue.

What you set up once (the outside)

Survey answers. The free-text responses to your survey. The backbone lane is the form-submit lane: your survey form posts each response to a small endpoint the moment someone hits send. Two other lanes (covered in Part 2) catch the rest — an inbox-forwarding lane (forward an email reply or an export to a dedicated address and the analyzer takes it from there) and a Drive sheet (paste an export in and a small sync picks it up). Each answer carries a little context if you have it: the survey it came from, the date, and a rating if your survey collected one.
A rules folder. Two short Google Docs in a Drive folder. The rules doc says what counts as urgent — an angry customer, a threat to leave, a safety or legal concern, a request for a callback — who gets the urgent flag, and how big a theme has to be before it’s worth reporting (so a single one-off comment doesn’t become a “theme”). The voice doc holds the tone for the summary and the email layout: how warm, how blunt, how long.
Owner and on-call. The owner gets the weekly summary email. The on-call person gets the urgent flag the moment an at-risk answer lands — with the full text and a plain note on why it tripped. In a small shop these are often the same person; the system just lets them be different.

What runs on each answer and each week (the inside)

The answer intake. Three sources feed the store. Each new answer is cleaned (strip signatures, drop blanks, trim), then run through a quick urgent check — a small, cheap model call that asks “does this need a human today?” If yes, it’s flagged right away (Part 5). Either way the cleaned answer is written to the store with its context.
The grouper. Runs once a week. Reads every answer from the period. Turns each one into a vector with Titan embeddings — a vector is just a list of numbers that captures what the answer is about, so answers that mean the same thing land close together even when the words differ. Plain-code clustering then groups the close ones. For each cluster, a Claude Haiku 4.5 call writes a short plain-English name (“Slow checkout”, “Loved the staff”) and picks one real answer that best represents the group. The clustering math is plain Python; the model only labels.
The summary. Reads the named themes, their counts, and their quotes. A Claude Sonnet 4.6 call writes a short summary: the response count, the overall mood, the handful of themes with counts and quotes, and a closing line on anything flagged urgent that week. SES emails it to the owner. The summary is written from the groups, never from one cherry-picked answer.

In plain words

You send a post-visit survey and 280 people reply to the open box. Through the week, each reply lands, gets cleaned, and gets a quick urgent check. On Tuesday, one reply says “third time the order was wrong, I’m done, cancel my account” — the urgent check trips and your on-call person gets it within a minute, full text attached, before it ever shows up in a summary. The other 279 wait. On Sunday night the grouper runs: it turns every answer into a vector, clusters them, and finds five real themes — “slow checkout” (61 answers), “friendly staff” (44), “parking is hard” (38), “prices crept up” (22), and a long tail. Monday morning you get one short email: the mood, those five themes with counts, one real quote each, and a note that one urgent answer came in Tuesday and was handled. You read the whole picture in ninety seconds instead of skimming twenty replies and guessing.

The cost of running this is about $3.20 a month at SMB volume. The cost of not running it is the survey you paid to send and then only half-read — and the angry customer on page nine you never saw until they were already gone.

Design rules that shaped every decision

The summary is written from the grouped themes, with counts and real quotes — never from one answer that caught the model’s eye.
Urgent answers jump the queue. An angry customer doesn’t wait until Sunday; they go to a human the minute they land.
The model labels and writes; the grouping math is plain code. A theme’s count is a real count, not a guess.
Every quote in the summary is a real answer somebody actually wrote, shown verbatim, not a paraphrase.
A theme has to clear a minimum size to be reported, so one stray comment never gets dressed up as a trend.
Every urgent flag and every weekly run is logged, so you can audit what the analyzer saw and did months later.

Why this shape

Most teams deal with open-ended survey answers in one of three ways: read a handful and call it a sense of the room, run every answer past a person who burns an afternoon tagging them, or never open the export at all. The handful misleads — the first twenty replies are not the other 260. The afternoon of tagging is real work that gets skipped the moment the week gets busy. And the unopened export is the most common outcome of all: you paid to gather feedback and then never actually heard it.

The setup above keeps the answers somewhere the team already has them, but adds a small system that reads every one. It uses AI exactly where plain code can’t cope — grouping messy free text by meaning — and plain code everywhere else, so the counts are trustworthy and the bill stays tiny. It surfaces the few themes that matter instead of a wall of raw text. And it never makes you wait a week for the one answer that needed you today. The analyzer is quiet on the weeks nothing stands out; it only speaks up with the short version and the thing that can’t wait.

The next four posts walk through each piece in turn: how a survey answer gets collected, how answers get grouped into themes, how a summary reaches the owner, and how an urgent answer gets flagged. One diagram per post. A cost breakdown and a final engineering reference at the end.

All posts