Engineering reference: the chat assistant architecture
Same system as the rest of the series, drawn purely for engineers. Service names, resource identifiers, region, Bedrock model IDs, Knowledge Base wiring, and the actual flow operations — everything you’d need to recreate this in your own AWS account.
Key takeaways · verified May 2026
- Single AWS account in
ap-southeast-1(Singapore); Bedrock via Global cross-Region inference. - Five subsystems: Build & Deploy, Knowledge Sync, Conversation gateway (Lambda Function URL + API Gateway WebSocket), Answerer (RetrieveAndGenerateStream + strict tool_use), Handoff & learning.
- Models:
global.anthropic.claude-haiku-4-5-20251001-v1:0+amazon.titan-embed-text-v2:0; vector store is S3 Vectors (managed by Bedrock). - WebSocket routes
$connect/$default/$disconnectonwss-chat; replies stream viaApiGatewayManagementApi.PostToConnection; widget mints a short-lived JWT fromfn-mint-tokento keep long-lived secrets out of the browser. - Citation enforced at runtime: if the model emits
answerwith a citation outside the retrieved set, the runtime downgrades tohand_off.
Posts 1–6 walk through the system in plain language. This page is the dense version — nothing softened, just the architecture as you’d sketch it on a whiteboard during a design review.
Read this top-down, then column-by-column
Top row is the three external surfaces. Below it, the AWS account contains five subsystems: Build & Deploy across the top, then Knowledge Sync, then three runtime columns (Conversation gateway, Answerer, Handoff & learning), with a Cross-cutting strip at the bottom. A visitor opens the widget, the page calls fn-mint-token for a short-lived JWT, and connects to wss-chat. The $connect route writes a session row; each $default message invokes fn-ws-message which appends to the session scratchpad and invokes fn-answerer. The answerer issues a Bedrock RetrieveAndGenerateStream against kb-website-knowledge with strict tool_use over four tools (answer, clarify, hand_off, decline), streams the reply back via PostToConnection, and writes the turn into tbl-transcripts. hand_off invokes fn-handoff; clarify and decline append to tbl-gaps for weekly review.
Naming conventions used in the diagram
- Lambda functions:
fn-<purpose>—fn-mint-token,fn-ws-connect,fn-ws-message,fn-ws-disconnect,fn-answerer,fn-handoff,fn-gaps-batch,fn-drive-sync,fn-archive. - Lambda runtimes: Python 3.13 for the answerer, handoff, gaps batch, drive sync, and archive functions; Node.js 22.x is fine for
fn-mint-tokenand the WebSocket route handlers if you prefer JS. Both runtimes are current LTS-equivalents for Lambda as of 2026-05. - DynamoDB tables:
tbl-sessions(partition keyconnection_id, with ascratchpadlist trimmed to the last few turns and a TTL on idle),tbl-transcripts(partition keysession_id, sort keyturn_index),tbl-gaps(partition keyweek_iso, sort keycreated_at#turn_idwith the visitor turn, page URL, closest-passage scores). - SNS topics:
t-handoffsfor human-handoff fan-out (email, optional Slack),t-alarmsfor general failures. - S3 layout: single bucket
chat-assistant-datawith prefixestranscripts/{date}/,archive/. - Knowledge Base:
kb-website-knowledge, a Bedrock managed Knowledge Base with an S3 connector pointed at the synced help/policies prefix. Bedrock KBs do not have a native Drive connector as of 2026-05, so a smallfn-drive-syncLambda mirrors the Drive folder to S3 on a 5-minute schedule. Embeddings model isamazon.titan-embed-text-v2:0; vector store is Amazon S3 Vectors (the cheapest quick-create option since mid-2025 — zero idle cost — provisioned and managed by Bedrock when you create the KB). OpenSearch Serverless and Aurora PostgreSQL Serverless remain valid alternatives if you outgrow S3 Vectors’ query throughput.
Region, model access, websocket details, and Drive auth
Everything runs in ap-southeast-1 (Singapore). Bedrock model invocations use the Global cross-Region inference profile (global. prefix on model IDs) — data at rest stays in Singapore; inference may route to other regions for capacity, billed at on-demand Singapore rates.
The widget mints its session token from fn-mint-token rather than authenticating directly against API Gateway; the JWT is short-lived (a few minutes) and is checked in fn-ws-connect via a Lambda authorizer. This keeps long-lived secrets out of browsers entirely. Streaming replies use ApiGatewayManagementApi.PostToConnection with chunked writes — the answerer flushes partial responses every few tokens so the visitor sees words appear within a second.
Google Drive authentication uses a service account with domain-wide delegation over a single scope: https://www.googleapis.com/auth/drive.readonly on the help-docs folder only. The credential lives in AWS Secrets Manager. The fn-drive-sync Lambda runs on a 5-minute EventBridge schedule, pulls any changed docs from Drive, writes them to chat-assistant-data/kb-source/, and lets the Bedrock KB’s S3 connector index from there. Editing a doc and saving propagates within ~10 minutes (5 to sync + 5 to index); manual re-sync is one CLI call to StartIngestionJob.
The answerer uses strict tool_use: four tool definitions (answer, clarify, hand_off, decline) with required parameter schemas. The answer tool requires a citation_id parameter referencing one of the retrieved passages by id; the runtime validates the citation against the retrieved set before allowing PostToConnection to flush. If the model emits an answer with a citation that wasn’t in the retrieved set, the runtime downgrades to hand_off — the safer-by-default failure mode.
What’s deliberately not on the diagram
- IAM policy details — per-Lambda execution roles are minimal (one bucket prefix, one or two tables, a single Bedrock KB ID,
InvokeModelon one model,execute-api:ManageConnectionson one API). - Per-business knowledge layout — a flat Drive folder is fine for the first few months; subdivide by topic (
shipping/,returns/,pricing/) once it grows past a couple of dozen docs, so writers know where new paragraphs go. - X-Ray tracing — on for
fn-answererandfn-handoff, sampling 100% during tuning, 10% in steady state. - Bedrock Guardrails — managed contextual grounding (numeric grounding + relevance scores), PII redaction, prompt-attack/jailbreak filters, and the newer Automated Reasoning checks (formal-logic policy validation, GA in 2025). The custom citation-verification step in
fn-answereris roughly the contextual-grounding idea hand-rolled; turning on Guardrails moves the threshold into console configuration and adds PII redaction and prompt-attack defence on every model call. Worth enabling once thresholds are stable. - Long-lived visitor identity — for logged-in customers who want their order history available, swap
connection_idfor an authenticatedcustomer_idat$connectand bind the scratchpad to that identity. Keep it opt-in. - Multi-tenant variant — if running this on behalf of multiple SMBs, namespace the KB and tables per tenant and inject
tenant_idinto every record. The architecture doesn’t change shape; the IDs do. - Slack two-way handoff — the diagram fans out to Slack as a notification only. A bidirectional Slack-to-visitor reply path (agent types in Slack, visitor sees it in the widget) is an additional Lambda + Slack Events Subscription; off the default diagram to keep the per-message cost in the always-free band.
If you’re recreating this
Start with Build & Deploy alone (a single Lambda, no triggers). Once git push reliably updates an empty stack, wire up fn-drive-sync with one help doc and confirm the doc lands in S3 within five minutes. Create the Bedrock Knowledge Base over that S3 prefix and confirm a one-shot RetrieveAndGenerateStream call returns a passage. Then the WebSocket API with stub $connect/$default/$disconnect handlers that just echo back. Then the real fn-answerer with strict tool_use and citation verification (this is the part most worth integration-testing — intentionally try to make the model cite a passage outside the retrieved set and confirm the runtime downgrades to hand_off). Then the handoff fan-out and the gaps log. Cross-cutting (audit, logs, alarms, budget, archive) goes in from day one.