Part 3 of 7 · Review responder series ~5 min read

How the responder reads a review

A review is two words to a paragraph of free text, plus a star rating that may or may not match it. Before the responder can pick a move, it has to read the review three different ways: the score, the themes, and the specifics. Three small extractors run in parallel, each returning a confidence score the move-picker reads before it decides anything.

Key takeaways

  • Three extractors run in parallel against every review: rating with sentiment cross-check, themes, and specifics.
  • The rating extractor flags mismatches — a 5-star with strongly negative text, or a 1-star whose body is a typo of “love it”.
  • The themes extractor tags against a small business-specific list (food, service, cleanliness, value, etc.) plus an open-form bucket for anything novel.
  • The specifics extractor pulls names, dishes or products, dates, dollar amounts, order numbers, and which branch.
  • Each extractor returns a 0–1 confidence score; high confidence on all three is the gate to auto-reply, anything borderline becomes a reason to draft.

Three extractors in parallel

Three parallel extractors return a structured review A diagram with one input row at the top, three parallel extractor columns in the middle, and a single combined-output row at the bottom. Top input: the normalized review object from the intake — source, review ID, platform score, text, author, timestamp. Three arrows fan out from the input down to three extractor columns. Column one, "Rating with sentiment cross-check": reads the platform score, runs a small sentiment classifier on the text, flags mismatches like a five-star review with strongly-negative text or a one-star review whose text is a typo of "love it". Column two, "Themes": tags the review against a small business-specific list — for a restaurant that might be food quality, service speed, value, cleanliness, atmosphere, staff friendliness — plus an open-form bucket for anything novel. Column three, "Specifics": pulls names mentioned, dishes or products, dates and times, dollar amounts, order numbers, and which branch or location if the business has more than one. Each extractor returns its result with a confidence score from zero to one. Below them, all three results merge into a single "structured review" record. An output arrow on the right reads "to move-picker, with confidences attached". A note at the bottom: high confidence on all three means safe to auto; anything borderline becomes a reason to draft instead. Normalized review source · id · score · text · author · ts Rating + sentiment • Read platform score • Classify text sentiment positive / mixed / negative • Cross-check the two • Flag a mismatch (rare but real) {score:5, sentiment:"mixed", conf:0.91} Themes • Tag against a small list defined for your business • Multi-tag — one review can hit several themes • Open-form bucket for anything novel {food:+, staff:−, conf:0.86} Specifics • Names mentioned • Dishes, products • Dates, times, $ amounts • Order numbers • Branch or location {names:["Sarah"], dish:"the soup", conf:0.78} Structured review {score, sentiment, themes, specifics, confidences} → to move-picker, with confidences attached
Fig 3. Three extractors run in parallel against the same review. Each one returns its piece with a confidence score; the move-picker reads all three before it decides anything.

Why sentiment, not just stars

The platform-given score is a useful signal but not a complete one. The clearest counterexample is a 5-star review whose body says “food was cold, server was rude, will not be back — the only good thing was the door wasn’t locked.” That happens. So does the inverse: a 1-star review whose body reads “Five stars, sorry hit the wrong button.” Auto-replying to either based on the score alone is a small disaster — a chirpy “thanks for the kind words!” under a complaint, or a confused apology under a happy customer.

The cross-check is cheap. Read the score, run the text through a small sentiment classifier, and if they don’t agree, raise a mismatch flag. The flag doesn’t decide anything by itself; it’s a signal the move-picker uses to push the review out of the auto-reply lane. A draft for your eyes is the right move when the rating and the text disagree, no matter which direction the disagreement runs.

Themes that come back as tags

Themes are the categories a review touches: for a restaurant, that’s typically food quality, service speed, value, cleanliness, atmosphere, and staff friendliness. The list lives in the same Drive folder as the voice file — small, business-specific, and editable without a deploy. The extractor multi-tags: a single review about “great food, but we waited an hour” lights up food-quality (positive) and service-speed (negative), and the composer can address both in the reply.

An open-form bucket catches anything the pre-defined list doesn’t cover — “the parking situation”, “the new menu format”, “your dog-friendly patio”. Open-form themes don’t drive auto-reply (the responder won’t commit to addressing something it doesn’t have a policy for), but they do feed the themes log. After a quarter, the open-form bucket tells you what your customers are talking about that you haven’t named yet.

Specifics — names, dishes, dates, branches

The specifics extractor pulls the named bits a real human reply would acknowledge: a staff member’s first name (“Sarah was wonderful”, “Alex didn’t apologize”), a dish or product (“the soup”, “the new mug we ordered”), a numeric like an order number or a dollar figure (“they wouldn’t honour the $50 promo”), a date or time of visit, and — for a multi-location business — which branch the review was about.

These have two distinct uses. First, the composer reads them, so a draft can say “thanks for mentioning Sarah” instead of “thanks for mentioning our staff.” Second, the move-picker uses them as routing signals: a review that names a current staff member by first name is a draft (so a human reads it before it auto-posts under the business’s account), and a review that mentions a specific dollar amount or order number is also a draft (because compose-with-numbers is exactly where the AI most wants to make up the wrong figure).

Names are filtered against a roster — a small list of current staff first names in the policies file. “Sarah” mentioned but not on the roster is treated as a regular noun; “Alex” on the roster is flagged. The roster lives in Drive next to the voice file; updating it when someone joins or leaves is one edit, no deploy.

Confidence is the gate

Each extractor returns a confidence score between zero and one. The move-picker won’t auto-reply unless all three are above a configured threshold — a single low score is enough to push the review into the draft lane.

That’s the entire role of confidence in this system: a vote on whether the responder is sure enough to act without you reading first. It is not a measure of correctness, and the responder doesn’t pretend it is. It’s a humility threshold, deliberately tuned so the cost of a low-confidence false positive (you having to glance at one more draft) is much lower than the cost of a high-confidence false positive (a tone-deaf reply posted publicly under your business’s name).

What this hands to the next post

The move-picker now has a structured review with a score, a sentiment, a list of themes, a small bag of specifics, and a confidence on each piece. The next post is what it does with that — how it picks one of four moves, and why the boundaries between them are deliberately conservative.

All posts