Part 7 of 7 · Review responder series ~6 min read

Engineering reference: the review responder architecture

Same system as the rest of the series, drawn purely for engineers. Service names, resource identifiers, region, Bedrock model IDs, Knowledge Base wiring, Google Business Profile and Meta Graph API specifics, and the actual flow operations — everything you’d need to recreate this in your own AWS account.

Key takeaways · verified May 2026

  • Single AWS account in ap-southeast-1 (Singapore); Bedrock via Global cross-Region inference.
  • Five subsystems: Build & Deploy, Knowledge Sync, Intake (3 lanes → SQS), Responder (parallel extractors + decision + composer), Dispatch & learning.
  • Models: global.anthropic.claude-haiku-4-5-20251001-v1:0 + amazon.titan-embed-text-v2:0; vector store is S3 Vectors (GA Dec 2, 2025).
  • Review sources: Google Business Profile via Pub/Sub push, Facebook via third-party aggregator (Meta deprecated their own webhook in Graph API v22.0, January 2025), Yelp via hourly Fusion API poll.
  • Day-one paperwork: Google Business Profile API Basic Access (~14-day approval), Facebook aggregator credentials, Drive service account.

Posts 1–6 walk through the system in plain language. This page is the dense version — nothing softened, just the architecture as you’d sketch it on a whiteboard during a design review.

Full technical architecture: serverless review responder in ap-southeast-1 A detailed engineering diagram of the entire review responder. Three external surfaces at the top: GitHub (repo and Actions runner, OIDC token requestor); Google Workspace (Drive folder containing the voice file, the policies file, the menu file if applicable, and the staff roster; reached via service account with domain-wide delegation); and the review platforms (Google Business Profile via OAuth and the Business Profile API with Pub/Sub push notifications; Facebook page reviews via a third-party aggregator since Meta deprecated their own webhook in January 2025; Yelp via the Fusion API polled on a schedule). Everything runs in a single AWS account in region ap-southeast-1 (Singapore). The AWS account contains five subsystems. Build and Deploy strip at the top: GitHub Actions exchanges with IAM OIDC Provider, assumes an IAM Role with a trust policy scoped to repo:owner/repo:ref:main, and runs SAM/CloudFormation to update the review-responder-prod stack. Knowledge sync strip below: a small fn-drive-sync Lambda mirrors the policies and voice files from Drive to an S3 prefix on a five-minute schedule (since Bedrock Knowledge Bases have no native Drive connector); a Bedrock Knowledge Base S3 connector reads that prefix, chunks the docs, embeds with Titan Text Embeddings v2, and stores vectors in the managed S3 Vectors index. Three runtime columns below. Intake (left): three intake paths each write to one shared SQS queue qu-reviews-in. Lambda fn-intake-gbp behind a Lambda Function URL receives Google Business Profile push notifications via Pub/Sub, fetches review details, normalizes, and writes to the queue. Lambda fn-intake-fb-webhook behind a Lambda Function URL receives push events from a third-party Facebook aggregator (Birdeye, Yext, or similar — Meta deprecated their own page recommendations webhook in Graph API v22.0 in January 2025), verifies the aggregator's signature, fetches review content, normalizes, and writes to the queue. Lambda fn-intake-yelp-poll runs on EventBridge cron 0 * ? * * * (hourly), polls the Yelp Fusion API or the public reviews endpoint per listing, diffs against the last-seen review IDs in DynamoDB tbl-reviews, and writes only-new normalized records to the queue. Each intake Lambda also runs the screen step: dedupe against tbl-reviews and the in-Lambda banned-words list. Responder (middle): an SQS event source on qu-reviews-in invokes fn-process, which runs the three extractors against Bedrock Claude Haiku 4.5 in parallel via async tool_use calls (rating-with-sentiment, themes, specifics), consolidates the structured review, picks one of four moves with safety-keyword override, and on auto-reply or draft or escalate calls fn-compose, which issues a Bedrock RetrieveAndGenerateStream against kb-policies with strict tool_use over four tool definitions answer, draft, escalate, ignore. The composer runtime verifies citations, strips PII, validates against the staff roster from the policies file, and writes the resulting record to DynamoDB tbl-actions. Dispatch and learning (right): on auto-reply, Lambda fn-post-reply calls the platform API for the original source — locations.reviews.updateReply on Google Business Profile is the only fully-automated reply lane in 2026; Facebook page recommendations have no public reply API since Meta deprecated that surface in Graph API v22.0 (January 2025), and Yelp's reply API is partner-gated for enterprise accounts only, so for both Facebook and Yelp the dispatch downgrades to a draft package regardless of confidence. On draft, Lambda fn-handoff packages the original review, the proposed reply, and the matched policies excerpt; writes to S3 review-responder-data slash drafts; and publishes to SNS topic t-drafts which fans out to email via SES and optionally Slack via Amazon Q Developer in chat applications (formerly AWS Chatbot). On escalate, the same package routes to SNS topic t-escalations with optional SMS via SNS. Lambda fn-themes-rollup runs on EventBridge cron 0 6 ? * MON * weekly, reads from DynamoDB tbl-themes, generates a rolling-trend Drive doc themes-week-NN. Cross-cutting bottom strip: DynamoDB tables tbl-reviews, tbl-actions, tbl-themes; CloudWatch Logs configured with RetentionInDays of 7 across every log group; SNS topics t-drafts, t-escalations, t-alarms; AWS Budgets has a $10 monthly alarm; Lambda fn-archive runs on EventBridge cron 0 3 ? * SUN * to move old draft and audit blobs to S3 Glacier Instant Retrieval; AWS Secrets Manager holds the platform credentials. GitHub github.com/owner/repo Actions runner · OIDC token requestor Google Workspace Drive: voice, policies, roster service account · domain-wide delegation Review platforms GBP · Meta Graph API · Yelp Fusion 2 push, 1 poll AWS Account Region: ap-southeast-1 (Singapore) · Bedrock via Global CRIS Build & Deploy IAM OIDC Provider token.actions.githubusercontent.com IAM Role trust: repo:owner/repo:ref:main SAM / CloudFormation stack: review-responder-prod git push & request token AssumeRole sam deploy → creates stack resources below Knowledge Sync Drive→S3 sync Lambda + KB fn-drive-sync · kb-policies Bedrock Titan Embeddings amazon.titan-embed-text-v2:0 S3 Vectors vector index (managed) Drive folder 5-min sync Intake (3 lanes → one queue) Lambda Function URL fn-intake-gbp (Pub/Sub push target) Lambda Function URL fn-intake-fb-webhook (HMAC verify) EventBridge cron + Lambda fn-intake-yelp-poll (hourly) normalize + screen SQS qu-reviews-in (DLQ: qu-reviews-dlq) dedupe via tbl-reviews DynamoDB tbl-reviews (PK source#review_id) → SQS event source feeds fn-process push (GBP, FB) poll (Yelp) Responder (per review) AWS Lambda fn-process (SQS event source) 3 parallel calls Bedrock Claude Haiku 4.5 extract: rating, themes, specifics consolidate In-Lambda decision pick move · safety override if auto / draft / escalate RetrieveAndGenerateStream fn-compose · kb-policies query strict tool_use, citation verify 4 tools, 1 pick answer · draft escalate · ignore → row written to tbl-actions, route by move (model id: global.anthropic.claude-haiku-4-5-20251001-v1:0) Dispatch & learning AWS Lambda fn-post-reply (on auto-reply) platform call Platform reply APIs GBP updateReply (auto) · FB / Yelp (draft only) on draft / escalate AWS Lambda fn-handoff · package + S3 + SNS fan-out SES + Amazon Q Developer + SNS-SMS t-drafts (email/Slack) · t-escalations (urgent) on every review DynamoDB tbl-themes (PK theme, SK week_iso) EventBridge weekly Lambda fn-themes-rollup writes Drive doc themes-week-NN Knowledge Base feeds fn-compose SQS event source Cross-cutting DynamoDB tbl-reviews, tbl-actions, tbl-themes CloudWatch Logs RetentionInDays: 7 SNS t-drafts, t-escalations, t-alarms AWS Budgets budget-monthly: $10 Lambda fn-archive EventBridge cron(0 3 ? * SUN *) → Glacier IR Secrets Manager secret/gbp-oauth, secret/fb-page-token, secret/yelp-fusion-key SQS DLQ qu-reviews-dlq 5 retries on qu-reviews-in → DLQ → SNS t-alarms
Fig 7. Full architecture, ap-southeast-1. White boxes = AWS resources; dashed AWS container; dashed grey boxes = subsystem groupings; dashed grey arrows = config feed and side branches.

Read this top-down, then column-by-column

Top row is the three external surfaces. Below it, the AWS account contains five subsystems: Build & Deploy across the top, then Knowledge Sync, then three runtime columns (Intake, Responder, Dispatch & learning), with a Cross-cutting strip at the bottom. Reviews enter through three intake paths (two webhooks behind Lambda Function URLs, one cron-driven poller) and all three write into a single SQS queue qu-reviews-in after deduplicating against tbl-reviews. The SQS event source invokes fn-process, which runs the three extractors in parallel against Bedrock Claude Haiku, picks one of four moves with safety-keyword override, and on auto-reply / draft / escalate calls fn-compose. The composer issues a Bedrock RetrieveAndGenerateStream against kb-policies with strict tool_use over four tools (answer, draft, escalate, ignore), verifies citations, strips PII, and writes the chosen action to tbl-actions. Dispatch routes by move: auto-reply via fn-post-reply to the originating platform’s reply API, draft and escalate via fn-handoff to S3 plus SNS fan-out. Themes are tallied on every review into tbl-themes and rolled up weekly by fn-themes-rollup.

Naming conventions used in the diagram

  • Lambda functions: fn-<purpose>fn-intake-gbp, fn-intake-fb-webhook, fn-intake-yelp-poll, fn-process, fn-compose, fn-post-reply, fn-handoff, fn-themes-rollup, fn-drive-sync, fn-archive.
  • Lambda runtimes: Python 3.13 for the responder, composer, themes rollup, drive sync, and archive functions (the Bedrock SDK is more ergonomic in Python). Python 3.14 has been available on Lambda since November 2025 and is fully supported; 3.13 is the safe production default in May 2026. Node.js 22.x is fine for fn-intake-fb-webhook if you prefer JS for HMAC verification; Node.js 24.x is also available since 2025 and either is current.
  • DynamoDB tables: tbl-reviews (partition key source#review_id, attribute set: seen_at, raw_payload, screen_verdict; used for dedupe and audit), tbl-actions (partition key review_id, sort key action_ts, with move, reply_text, cited_passages, guardrail_flags), tbl-themes (partition key theme, sort key week_iso, with rolling counts; theme values come from the policies file’s themes list).
  • SQS queues: qu-reviews-in (standard queue with 5-minute visibility timeout), qu-reviews-dlq (5 retries before failure goes to DLQ; CloudWatch alarm on DLQ depth > 0 fires t-alarms).
  • SNS topics: t-drafts for normal-priority human review fan-out (email, optional Slack), t-escalations for urgent fan-out (email + optional SMS), t-alarms for general failures.
  • S3 layout: single bucket review-responder-data with prefixes kb-source/ (Drive mirror), drafts/{date}/ (full draft packages), archive/.
  • Knowledge Base: kb-policies, a Bedrock managed Knowledge Base with an S3 connector pointed at the synced policies/voice/menu prefix. Bedrock KBs do not have a native Drive connector as of 2026-05 (current native connectors: S3, Confluence, SharePoint, Salesforce, Web Crawler, plus a custom-API option), so a small fn-drive-sync Lambda mirrors the Drive folder to S3 on a 5-minute schedule. Embeddings model is amazon.titan-embed-text-v2:0; vector store is Amazon S3 Vectors (GA December 2025 — cheapest quick-create option for small/medium KBs: no provisioned capacity, no monthly minimum, ~$0.06/GB-month for stored vectors plus per-query and per-PUT charges — provisioned and managed by Bedrock when you create the KB). OpenSearch Serverless and Aurora pgvector remain valid alternatives for higher query throughput.

Region, model access, platform APIs, and Drive auth

Everything runs in ap-southeast-1 (Singapore). Bedrock model invocations use the Global cross-Region inference profile (global. prefix on model IDs) — data at rest stays in Singapore; inference may route to other regions for capacity, billed at on-demand Singapore rates.

The intake Lambdas run as Lambda Function URLs to keep webhook ingress free of API Gateway. Each lane has its own current-2026 reality and the design accounts for the differences honestly.

Google Business Profile (lane 1, fully automated). Push notifications go through the My Business Notifications API v1 at mybusinessnotifications.googleapis.com/v1/accounts/{accountId}/notificationSetting; you create a Pub/Sub topic in your own GCP project, grant pubsub.topics.publish on that topic to mybusiness-api-pubsub@system.gserviceaccount.com, and PATCH the notification setting with pubsubTopic and notificationTypes: ["NEW_REVIEW", "UPDATED_REVIEW"]. The Pub/Sub subscription pushes to fn-intake-gbp, which verifies Google’s OIDC JWT before accepting the payload. Reading and replying to reviews stayed on the legacy v4 surface even after the broader v4 deprecation in 2024 — the canonical endpoints are still GET mybusiness.googleapis.com/v4/accounts/{accountId}/locations/{locationId}/reviews for list and the accounts.locations.reviews.updateReply method (PUT to {name}/reply) for the reply. Single OAuth scope: https://www.googleapis.com/auth/business.manage.

GBP API access is allowlist-gated, not partner-gated. A new GCP project starts at 0 queries-per-minute — every API call returns quotaExceeded — until you submit the GBP “Application for Basic API Access” form (free) and Google approves, typically within ~14 days. Approved projects are bumped to 300 QPM with a hard cap of 10 edits-per-minute per profile (the reply call counts as an edit). Prerequisites: a verified Business Profile that’s been active 60+ days, a website on the profile, and an applicant email ideally on the website’s domain. A regular single-location owner can apply directly; you don’t need Partner status. The 0-QPM trap is the #1 first-time gotcha and worth surfacing in the SAM template README so future-you doesn’t debug a half-day before realising the project is correctly enabled and just not allowlisted yet.

Facebook (lane 2, draft-only in 2026). Meta deprecated the Page recommendations webhook in Graph API v22.0 (January 21, 2025): the ratings field on the Page object no longer fires, and reading a recommendation returns error code 12. There is no v23+ replacement and no documented reply-to-recommendation API (the Recommendation node reference now states the endpoint “cannot be queried directly”). The realistic Facebook path in 2026 is therefore one of two patterns: a third-party aggregator (Birdeye, Yext, ReviewTrackers) that watches the page on your behalf and pushes normalized events to your webhook URL, or a periodic page-scraping fallback if you accept the fragility. Either way the Facebook lane is read-only from the platform’s perspective, which means the move-picker downgrades all Facebook reviews to draft regardless of confidence: fn-compose still produces the reply text, but fn-handoff drops the package in your Pages-app paste-in queue rather than calling a non-existent reply API. HMAC-SHA256 signature verification (X-Hub-Signature-256, App Secret as the key, computed over the raw request body, constant-time comparison) is still the right pattern for any Meta webhook you do subscribe to. Pin to v23.0 or v24.0 on outgoing calls; v22.0 is the youngest version that received the recommendations deprecation enforcement, and v18.0 already sunset on 2026-01-26.

Yelp (lane 3, draft-only for SMBs). The reply API exists — the Respond to Reviews API v2 at partner-api.yelp.com/reviews/v1/{review_id} — but it’s partner-gated and effectively enterprise-only: access requires either a Yelp + Listing Management subscription or a chain of 10+ Branded/Enhanced Profile locations. For a single-location SMB, the only programmatic surface is the public Fusion API GET /v3/businesses/{id}/reviews, which returns up to 3 truncated review excerpts per call, on Enhanced or Premium pricing tiers (the free tier was cut to 500 calls/day total in May 2023). fn-intake-yelp-poll runs on EventBridge cron cron(0 * * * ? *) (hourly), reads the truncated endpoint per listing, diffs against the latest review_id seen in tbl-reviews, and queues only-new IDs. Like the Facebook lane, the dispatch column treats Yelp as draft-only: the responder produces the reply, the human pastes it into biz.yelp.com. Auth: API key bearer (Authorization: Bearer <API_KEY>) on the public Fusion surface; OAuth on the partner surface if you ever upgrade.

Architecturally, a single per-source auto_reply_supported boolean flag in the lane config is enough to handle this: dispatch reads the flag and routes auto-reply moves through to either fn-post-reply (when supported) or fn-handoff as a draft (when not). The decision logic in the move-picker doesn’t change shape; only the destination of the produced reply does.

Google Drive authentication uses a service account with domain-wide delegation over a single scope: https://www.googleapis.com/auth/drive.readonly on the policies-and-voice folder only. The credential lives in AWS Secrets Manager. The fn-drive-sync Lambda runs on a 5-minute EventBridge schedule, pulls any changed docs from Drive, writes them to review-responder-data/kb-source/, and lets the Bedrock KB’s S3 connector index from there. Editing a doc and saving propagates within ~10 minutes (5 to sync + 5 to index); manual re-sync is one CLI call to StartIngestionJob.

The composer uses strict tool_use: four tool definitions (answer, draft, escalate, ignore) with required parameter schemas. The answer and draft tools require a citation_passages array referencing one or more retrieved passages by id; the runtime validates each citation against the retrieved set before allowing dispatch. If the model emits an answer with a citation that wasn’t in the retrieved set, the runtime downgrades to draft — the safer-by-default failure mode. The PII strip and the staff-roster check both run after the model returns and before the reply is dispatched anywhere.

What’s deliberately not on the diagram

  • IAM policy details — per-Lambda execution roles are minimal (one bucket prefix, one or two tables, a single Bedrock KB ID, InvokeModel on one model, the relevant platform-API outbound permissions via Secrets Manager).
  • Per-business policies layout — a flat Drive folder is fine for the first few months; subdivide by topic (refunds/, hours/, roster/) once the file count grows past a couple of dozen.
  • X-Ray tracing — on for fn-process and fn-compose, sampling 100% during tuning, 10% in steady state.
  • Bedrock Guardrails — managed contextual grounding (numeric grounding + relevance scores), PII redaction, prompt-attack/jailbreak filters, and the newer Automated Reasoning checks (formal-logic policy validation, GA in 2025). The custom citation-verify, PII-strip, and roster-check steps in fn-compose are roughly the contextual-grounding and PII ideas hand-rolled; turning on Guardrails moves the threshold into console configuration and adds prompt-attack defence on every model call. Worth enabling once thresholds are stable.
  • Multi-language replies — the composer reads the language of the inbound review and falls back to ignore if the language isn’t in the configured set. Adding a language is a config edit and a translated voice-file section, not a code change.
  • Multi-tenant variant — if running this on behalf of multiple SMBs, namespace the KB and tables per tenant and inject tenant_id into every record. The architecture doesn’t change shape; the IDs do.
  • Step Functions vs in-Lambda orchestration — the per-review pipeline (extract → pick → compose → dispatch) fits comfortably inside a single Lambda invocation under the 15-minute limit. Step Functions becomes worth it only if you need long-poll waits between human approval and post; for the synchronous draft package pattern shown here, in-Lambda is simpler and cheaper.
  • Retroactive backfill — on day one the system is empty of historical reviews. A one-shot backfill script can populate tbl-reviews with existing review IDs (so they’re marked “seen, not actioned”) without triggering a flood of belated drafts. Off the diagram because it runs once.

If you’re recreating this

Day-one paperwork: submit the Google Business Profile API “Application for Basic API Access” on day one of the project — approval takes ~14 days, your GCP project sits at 0 QPM until then, and there’s nothing technical you can do to skip the wait. If you’re planning to use a Facebook aggregator (Birdeye, Yext, ReviewTrackers), get the credential / webhook URL from them on day one too; their onboarding can be a few business days.

Start with Build & Deploy alone (a single Lambda, no triggers). Once git push reliably updates an empty stack, wire up fn-drive-sync with one short policies doc and confirm the doc lands in S3 within five minutes. Create the Bedrock Knowledge Base over that S3 prefix and confirm a one-shot RetrieveAndGenerateStream call returns a passage. Then one intake lane — the Yelp poller is the easiest, since it’s a pure cron and doesn’t require a webhook URL to be reachable from the public internet. Then the SQS-driven fn-process with the three extractors and the in-Lambda decision step. Then fn-compose with strict tool_use and citation verification (this is the part most worth integration-testing — intentionally try to make the model cite a passage outside the retrieved set and confirm the runtime downgrades to draft). Then fn-handoff for drafts. Add the GBP intake lane (assuming your allowlisting came through) and the Facebook aggregator lane once the offline path works. Cross-cutting (audit, logs, alarms, budget, archive) goes in from day one.

All posts