Key takeaways · verified May 2026

Single AWS account in ap-southeast-1 (Singapore); Bedrock via Global cross-Region inference.
Five subsystems: Build & Deploy, Knowledge Sync, Intake (3 lanes → SQS), Responder (parallel extractors + decision + composer), Dispatch & learning.
Models: global.anthropic.claude-haiku-4-5-20251001-v1:0 + amazon.titan-embed-text-v2:0; vector store is S3 Vectors (GA Dec 2, 2025).
Review sources: Google Business Profile via Pub/Sub push, Facebook via third-party aggregator (Meta deprecated their own webhook in Graph API v22.0, January 2025), Yelp via hourly Fusion API poll.
Day-one paperwork: Google Business Profile API Basic Access (~14-day approval), Facebook aggregator credentials, Drive service account.

Posts 1–6 walk through the system in plain language. This page is the dense version — nothing softened, just the architecture as you’d sketch it on a whiteboard during a design review.

Fig 7. Full architecture, ap-southeast-1. White boxes = AWS resources; dashed AWS container; dashed grey boxes = subsystem groupings; dashed grey arrows = config feed and side branches.

Read this top-down, then column-by-column

Top row is the three external surfaces. Below it, the AWS account contains five subsystems: Build & Deploy across the top, then Knowledge Sync, then three runtime columns (Intake, Responder, Dispatch & learning), with a Cross-cutting strip at the bottom. Reviews enter through three intake paths (two webhooks behind Lambda Function URLs, one cron-driven poller) and all three write into a single SQS queue qu-reviews-in after deduplicating against tbl-reviews. The SQS event source invokes fn-process, which runs the three extractors in parallel against Bedrock Claude Haiku, picks one of four moves with safety-keyword override, and on auto-reply / draft / escalate calls fn-compose. The composer issues a Bedrock RetrieveAndGenerateStream against kb-policies with strict tool_use over four tools (answer, draft, escalate, ignore), verifies citations, strips PII, and writes the chosen action to tbl-actions. Dispatch routes by move: auto-reply via fn-post-reply to the originating platform’s reply API, draft and escalate via fn-handoff to S3 plus SNS fan-out. Themes are tallied on every review into tbl-themes and rolled up weekly by fn-themes-rollup.

Naming conventions used in the diagram

Lambda functions: fn-<purpose> — fn-intake-gbp, fn-intake-fb-webhook, fn-intake-yelp-poll, fn-process, fn-compose, fn-post-reply, fn-handoff, fn-themes-rollup, fn-drive-sync, fn-archive.
Lambda runtimes: Python 3.13 for the responder, composer, themes rollup, drive sync, and archive functions (the Bedrock SDK is more ergonomic in Python). Python 3.14 has been available on Lambda since November 2025 and is fully supported; 3.13 is the safe production default in May 2026. Node.js 22.x is fine for fn-intake-fb-webhook if you prefer JS for HMAC verification; Node.js 24.x is also available since 2025 and either is current.
DynamoDB tables: tbl-reviews (partition key source#review_id, attribute set: seen_at, raw_payload, screen_verdict; used for dedupe and audit), tbl-actions (partition key review_id, sort key action_ts, with move, reply_text, cited_passages, guardrail_flags), tbl-themes (partition key theme, sort key week_iso, with rolling counts; theme values come from the policies file’s themes list).
SQS queues: qu-reviews-in (standard queue with 5-minute visibility timeout), qu-reviews-dlq (5 retries before failure goes to DLQ; CloudWatch alarm on DLQ depth > 0 fires t-alarms).
SNS topics: t-drafts for normal-priority human review fan-out (email, optional Slack), t-escalations for urgent fan-out (email + optional SMS), t-alarms for general failures.
S3 layout: single bucket review-responder-data with prefixes kb-source/ (Drive mirror), drafts/{date}/ (full draft packages), archive/.
Knowledge Base: kb-policies, a Bedrock managed Knowledge Base with an S3 connector pointed at the synced policies/voice/menu prefix. Bedrock KBs do not have a native Drive connector as of 2026-05 (current native connectors: S3, Confluence, SharePoint, Salesforce, Web Crawler, plus a custom-API option), so a small fn-drive-sync Lambda mirrors the Drive folder to S3 on a 5-minute schedule. Embeddings model is amazon.titan-embed-text-v2:0; vector store is Amazon S3 Vectors (GA December 2025 — cheapest quick-create option for small/medium KBs: no provisioned capacity, no monthly minimum, ~$0.06/GB-month for stored vectors plus per-query and per-PUT charges — provisioned and managed by Bedrock when you create the KB). OpenSearch Serverless and Aurora pgvector remain valid alternatives for higher query throughput.

Region, model access, platform APIs, and Drive auth

Everything runs in ap-southeast-1 (Singapore). Bedrock model invocations use the Global cross-Region inference profile (global. prefix on model IDs) — data at rest stays in Singapore; inference may route to other regions for capacity, billed at on-demand Singapore rates.

The intake Lambdas run as Lambda Function URLs to keep webhook ingress free of API Gateway. Each lane has its own current-2026 reality and the design accounts for the differences honestly.

Google Business Profile (lane 1, fully automated). Push notifications go through the My Business Notifications API v1 at mybusinessnotifications.googleapis.com/v1/accounts/{accountId}/notificationSetting; you create a Pub/Sub topic in your own GCP project, grant pubsub.topics.publish on that topic to mybusiness-api-pubsub@system.gserviceaccount.com, and PATCH the notification setting with pubsubTopic and notificationTypes: ["NEW_REVIEW", "UPDATED_REVIEW"]. The Pub/Sub subscription pushes to fn-intake-gbp, which verifies Google’s OIDC JWT before accepting the payload. Reading and replying to reviews stayed on the legacy v4 surface even after the broader v4 deprecation in 2024 — the canonical endpoints are still GET mybusiness.googleapis.com/v4/accounts/{accountId}/locations/{locationId}/reviews for list and the accounts.locations.reviews.updateReply method (PUT to {name}/reply) for the reply. Single OAuth scope: https://www.googleapis.com/auth/business.manage.

GBP API access is allowlist-gated, not partner-gated. A new GCP project starts at 0 queries-per-minute — every API call returns quotaExceeded — until you submit the GBP “Application for Basic API Access” form (free) and Google approves, typically within ~14 days. Approved projects are bumped to 300 QPM with a hard cap of 10 edits-per-minute per profile (the reply call counts as an edit). Prerequisites: a verified Business Profile that’s been active 60+ days, a website on the profile, and an applicant email ideally on the website’s domain. A regular single-location owner can apply directly; you don’t need Partner status. The 0-QPM trap is the #1 first-time gotcha and worth surfacing in the SAM template README so future-you doesn’t debug a half-day before realising the project is correctly enabled and just not allowlisted yet.

Facebook (lane 2, draft-only in 2026). Meta deprecated the Page recommendations webhook in Graph API v22.0 (January 21, 2025): the ratings field on the Page object no longer fires, and reading a recommendation returns error code 12. There is no v23+ replacement and no documented reply-to-recommendation API (the Recommendation node reference now states the endpoint “cannot be queried directly”). The realistic Facebook path in 2026 is therefore one of two patterns: a third-party aggregator (Birdeye, Yext, ReviewTrackers) that watches the page on your behalf and pushes normalized events to your webhook URL, or a periodic page-scraping fallback if you accept the fragility. Either way the Facebook lane is read-only from the platform’s perspective, which means the move-picker downgrades all Facebook reviews to draft regardless of confidence: fn-compose still produces the reply text, but fn-handoff drops the package in your Pages-app paste-in queue rather than calling a non-existent reply API. HMAC-SHA256 signature verification (X-Hub-Signature-256, App Secret as the key, computed over the raw request body, constant-time comparison) is still the right pattern for any Meta webhook you do subscribe to. Pin to v23.0 or v24.0 on outgoing calls; v22.0 is the youngest version that received the recommendations deprecation enforcement, and v18.0 already sunset on 2026-01-26.

Yelp (lane 3, draft-only for SMBs). The reply API exists — the Respond to Reviews API v2 at partner-api.yelp.com/reviews/v1/{review_id} — but it’s partner-gated and effectively enterprise-only: access requires either a Yelp + Listing Management subscription or a chain of 10+ Branded/Enhanced Profile locations. For a single-location SMB, the only programmatic surface is the public Fusion API GET /v3/businesses/{id}/reviews, which returns up to 3 truncated review excerpts per call, on Enhanced or Premium pricing tiers (the free tier was cut to 500 calls/day total in May 2023). fn-intake-yelp-poll runs on EventBridge cron cron(0 * * * ? *) (hourly), reads the truncated endpoint per listing, diffs against the latest review_id seen in tbl-reviews, and queues only-new IDs. Like the Facebook lane, the dispatch column treats Yelp as draft-only: the responder produces the reply, the human pastes it into biz.yelp.com. Auth: API key bearer (Authorization: Bearer <API_KEY>) on the public Fusion surface; OAuth on the partner surface if you ever upgrade.

Architecturally, a single per-source auto_reply_supported boolean flag in the lane config is enough to handle this: dispatch reads the flag and routes auto-reply moves through to either fn-post-reply (when supported) or fn-handoff as a draft (when not). The decision logic in the move-picker doesn’t change shape; only the destination of the produced reply does.

Google Drive authentication uses a service account with domain-wide delegation over a single scope: https://www.googleapis.com/auth/drive.readonly on the policies-and-voice folder only. The credential lives in AWS Secrets Manager. The fn-drive-sync Lambda runs on a 5-minute EventBridge schedule, pulls any changed docs from Drive, writes them to review-responder-data/kb-source/, and lets the Bedrock KB’s S3 connector index from there. Editing a doc and saving propagates within ~10 minutes (5 to sync + 5 to index); manual re-sync is one CLI call to StartIngestionJob.

The composer uses strict tool_use: four tool definitions (answer, draft, escalate, ignore) with required parameter schemas. The answer and draft tools require a citation_passages array referencing one or more retrieved passages by id; the runtime validates each citation against the retrieved set before allowing dispatch. If the model emits an answer with a citation that wasn’t in the retrieved set, the runtime downgrades to draft — the safer-by-default failure mode. The PII strip and the staff-roster check both run after the model returns and before the reply is dispatched anywhere.

What’s deliberately not on the diagram

IAM policy details — per-Lambda execution roles are minimal (one bucket prefix, one or two tables, a single Bedrock KB ID, InvokeModel on one model, the relevant platform-API outbound permissions via Secrets Manager).
Per-business policies layout — a flat Drive folder is fine for the first few months; subdivide by topic (refunds/, hours/, roster/) once the file count grows past a couple of dozen.
X-Ray tracing — on for fn-process and fn-compose, sampling 100% during tuning, 10% in steady state.
Bedrock Guardrails — managed contextual grounding (numeric grounding + relevance scores), PII redaction, prompt-attack/jailbreak filters, and the newer Automated Reasoning checks (formal-logic policy validation, GA in 2025). The custom citation-verify, PII-strip, and roster-check steps in fn-compose are roughly the contextual-grounding and PII ideas hand-rolled; turning on Guardrails moves the threshold into console configuration and adds prompt-attack defence on every model call. Worth enabling once thresholds are stable.
Multi-language replies — the composer reads the language of the inbound review and falls back to ignore if the language isn’t in the configured set. Adding a language is a config edit and a translated voice-file section, not a code change.
Multi-tenant variant — if running this on behalf of multiple SMBs, namespace the KB and tables per tenant and inject tenant_id into every record. The architecture doesn’t change shape; the IDs do.
Step Functions vs in-Lambda orchestration — the per-review pipeline (extract → pick → compose → dispatch) fits comfortably inside a single Lambda invocation under the 15-minute limit. Step Functions becomes worth it only if you need long-poll waits between human approval and post; for the synchronous draft package pattern shown here, in-Lambda is simpler and cheaper.
Retroactive backfill — on day one the system is empty of historical reviews. A one-shot backfill script can populate tbl-reviews with existing review IDs (so they’re marked “seen, not actioned”) without triggering a flood of belated drafts. Off the diagram because it runs once.

If you’re recreating this

Day-one paperwork: submit the Google Business Profile API “Application for Basic API Access” on day one of the project — approval takes ~14 days, your GCP project sits at 0 QPM until then, and there’s nothing technical you can do to skip the wait. If you’re planning to use a Facebook aggregator (Birdeye, Yext, ReviewTrackers), get the credential / webhook URL from them on day one too; their onboarding can be a few business days.

Start with Build & Deploy alone (a single Lambda, no triggers). Once git push reliably updates an empty stack, wire up fn-drive-sync with one short policies doc and confirm the doc lands in S3 within five minutes. Create the Bedrock Knowledge Base over that S3 prefix and confirm a one-shot RetrieveAndGenerateStream call returns a passage. Then one intake lane — the Yelp poller is the easiest, since it’s a pure cron and doesn’t require a webhook URL to be reachable from the public internet. Then the SQS-driven fn-process with the three extractors and the in-Lambda decision step. Then fn-compose with strict tool_use and citation verification (this is the part most worth integration-testing — intentionally try to make the model cite a passage outside the retrieved set and confirm the runtime downgrades to draft). Then fn-handoff for drafts. Add the GBP intake lane (assuming your allowlisting came through) and the Facebook aggregator lane once the offline path works. Cross-cutting (audit, logs, alarms, budget, archive) goes in from day one.

All posts