Part 2 of 7 · Sentiment monitor series ~4 min read

How a mention gets collected

The monitor only reads what it collects. So the first job is making sure it actually sees the places your business is talked about. There are three ways a mention gets in: a poller checks a review-site feed on a schedule, a social-listening tool drops its export in a bucket, or your own site sends a comment straight to a webhook the moment it’s posted. The three lanes look different, but they all end the same way — one clean, de-duplicated mention in a queue, waiting to be read.

Key takeaways

  • Three intake lanes feed one queue: review feeds, a listening export, and comment webhooks.
  • Each new mention is de-duplicated against a record of everything already seen.
  • The queue evens out bursts — a viral post’s comments don’t overwhelm the reader.
  • Webhook comments arrive in seconds; feed and export lanes are checked on a schedule.
  • Adding or removing a source is a config-doc edit — no deploy needed.

Three lanes into one queue

Three intake lanes funnel into one queue A diagram with three vertical lane columns at the top and a single unified row at the bottom. Lane one, Review feeds: a poller Lambda checks each configured review-site feed on a schedule (every 15 minutes); for each item, it computes a stable id and skips anything already in the seen-table; new items are sent to the queue. Lane two, Listening export: a social-listening tool you already pay for drops a periodic export file into an S3 bucket; an S3 PUT triggers a parser Lambda that reads each row, de-duplicates the same way, and sends new mentions to the queue. Lane three, Comment webhooks: your own site or app posts a new comment to a Function URL the moment it appears; the handler validates the webhook secret, de-duplicates by id, and sends the mention to the queue within seconds. All three lanes converge on the same SQS queue, which evens out bursts and feeds the reader. A dead-letter queue catches anything that fails repeatedly. A note at the bottom: every mention is de-duplicated before it enters the queue — the same review is never read twice. Lane 1 · poll Review feeds • Poller checks each feed every 15 min • Stable id per item skips seen ones • New items sent to the queue • One feed per platform you list Lane 2 · S3 export Listening export • Tool drops export file in an S3 bucket • S3 PUT triggers a parser Lambda • Each row de-duped, new ones queued • Catches the wider social web Lane 3 · webhook Comment webhooks • Your site posts a comment on the spot • Function URL checks the webhook secret • De-duped by id, queued in seconds • Fastest lane — near real time SQS queue of new mentions (de-duplicated) id · source · author · text · posted-at · link · first-seen evens out bursts — the reader pulls from here, with a dead-letter queue behind it to reader, one by one Every mention is de-duplicated before the queue — the same review is never read twice.
Fig 2. Three lanes converge on one queue. Review feeds are polled, the listening export is parsed on arrival, and comment webhooks land in seconds. Each mention is de-duplicated first, then queued so the reader can pull them one at a time without being overwhelmed by a burst.

Lane 1: review-site feeds

The steadiest lane. For each review platform you care about, you give the monitor a feed — a URL it can read on a schedule. A small Lambda, the poller, runs every fifteen minutes (driven by EventBridge Scheduler), reads each feed, and looks at every item. For each one it computes a stable id from the platform and the review’s own identifier, then checks that id against the sm-seen table. If the id is already there, the item is skipped — the monitor has seen it before. If it’s new, the id is recorded and the mention is sent to the queue.

This lane covers the places that publish a feed and don’t move fast: a new review every few hours at most for a typical small business. Fifteen-minute polling is plenty, and it keeps the number of reads predictable and cheap.

Lane 2: the social-listening export

Most owners already pay for a tool that watches the wider social web — posts and mentions that don’t come with a tidy feed. Rather than rebuild that, the monitor reads its output. You point the tool at an S3 bucket (sm-listening-export) and have it drop a periodic export file there. The S3 PUT triggers a parser Lambda that reads each row of the file, computes a stable id the same way, de-duplicates against sm-seen, and queues anything new.

This lane is how the monitor sees beyond the platforms with feeds — the off-hand post on someone’s own account, the mention in a community group, the reply under an ad. The de-duplication matters most here, because listening tools often re-list the same mention across several exports. The seen-table makes sure each one is read once and only once, no matter how many times it shows up in the file.

Lane 3: comment webhooks

The fastest lane, for the comments you own outright. When someone posts a comment on your own site or app, your code sends it straight to a Lambda Function URL (a plain web address that runs a small function — no heavier gateway in front of it). The handler checks a shared secret so only your site can post, computes the id, de-duplicates, and queues the mention. From comment posted to mention queued is a couple of seconds.

This is the lane that catches a thread souring in real time. A review feed checked every fifteen minutes is fine for a slow trickle; a comment section turning hostile during a launch is exactly when you want the mention in the queue before the next refresh. Webhooks give you that, and they cost nothing to keep open because nothing runs until a comment actually arrives.

Why everything passes through one queue

Three lanes in, but only one queue out. That’s deliberate. If each lane fed the reader directly, a viral post producing four hundred comments in an hour would hit the reader four hundred times at once — and the reader makes a model call per mention, so that’s a spike in both load and cost at the worst possible moment. The queue absorbs the burst: lanes drop mentions in as fast as they arrive, and the reader pulls them out at a steady pace it can handle. Behind the queue sits a dead-letter queue — a holding pen for any mention that fails to process a few times — so one malformed item never blocks the rest or gets silently lost.

One queue also means one place to reason about. Every “why was this read?” question has a single answer: it was in the queue, and it was in the queue because exactly one lane put it there, once. Adding a new source — another review feed, another webhook — is an edit to the config doc, not a deploy.

Next post: how the reader takes a mention off the queue and reads its mood with a single model call, and how that score becomes a trend.

All posts