Engineering reference: the chat assistant architecture
Same system as the rest of the series, drawn purely for engineers. Service names, resource identifiers, region, Bedrock model IDs, Knowledge Base wiring, and the actual flow operations — everything you’d need to recreate this in your own AWS account.
Posts 1–6 walk through the system in plain language. This page is the dense version — nothing softened, just the architecture as you’d sketch it on a whiteboard during a design review.
Read this top-down, then column-by-column
Top row is the three external surfaces. Below it, the AWS account contains five subsystems: Build & Deploy across the top, then Knowledge Sync, then three runtime columns (Conversation gateway, Answerer, Handoff & learning), with a Cross-cutting strip at the bottom. A visitor opens the widget, the page calls fn-mint-token for a short-lived JWT, and connects to wss-chat. The $connect route writes a session row; each $default message invokes fn-ws-message which appends to the session scratchpad and invokes fn-answerer. The answerer issues a Bedrock RetrieveAndGenerate against kb-website-knowledge with strict tool_use over four tools (answer, clarify, hand_off, decline), streams the reply back via PostToConnection, and writes the turn into tbl-transcripts. hand_off invokes fn-handoff; clarify and decline append to tbl-gaps for weekly review.
Naming conventions used in the diagram
- Lambda functions:
fn-<purpose>—fn-mint-token,fn-ws-connect,fn-ws-message,fn-ws-disconnect,fn-answerer,fn-handoff,fn-gaps-batch,fn-archive. - DynamoDB tables:
tbl-sessions(partition keyconnection_id, with ascratchpadlist trimmed to the last few turns and a TTL on idle),tbl-transcripts(partition keysession_id, sort keyturn_index),tbl-gaps(partition keyweek_iso, sort keycreated_at#turn_idwith the visitor turn, page URL, closest-passage scores). - SNS topics:
t-handoffsfor human-handoff fan-out (email, optional Slack),t-alarmsfor general failures. - S3 layout: single bucket
chat-assistant-datawith prefixestranscripts/{date}/,archive/. - Knowledge Base:
kb-website-knowledge, a Bedrock managed Knowledge Base with a Drive connector pointed at the help/policies folder, embeddings modelamazon.titan-embed-text-v2:0, vector store on Amazon OpenSearch Serverless (provisioned and managed by Bedrock when you create the KB).
Region, model access, websocket details, and Drive auth
Everything runs in ap-southeast-1 (Singapore). Bedrock model invocations use the Global cross-Region inference profile (global. prefix on model IDs) — data at rest stays in Singapore; inference may route to other regions for capacity, billed at on-demand Singapore rates.
The widget mints its session token from fn-mint-token rather than authenticating directly against API Gateway; the JWT is short-lived (a few minutes) and is checked in fn-ws-connect via a Lambda authorizer. This keeps long-lived secrets out of browsers entirely. Streaming replies use ApiGatewayManagementApi.PostToConnection with chunked writes — the answerer flushes partial responses every few tokens so the visitor sees words appear within a second.
Google Drive authentication uses a service account with domain-wide delegation over a single scope: https://www.googleapis.com/auth/drive.readonly on the help-docs folder only. The Bedrock Knowledge Base Drive connector consumes that credential out of AWS Secrets Manager. Editing a doc and saving triggers a re-sync within minutes; manual re-sync is one CLI call.
The answerer uses strict tool_use: four tool definitions (answer, clarify, hand_off, decline) with required parameter schemas. The answer tool requires a citation_id parameter referencing one of the retrieved passages by id; the runtime validates the citation against the retrieved set before allowing PostToConnection to flush. If the model emits an answer with a citation that wasn’t in the retrieved set, the runtime downgrades to hand_off — the safer-by-default failure mode.
What’s deliberately not on the diagram
- IAM policy details — per-Lambda execution roles are minimal (one bucket prefix, one or two tables, a single Bedrock KB ID,
InvokeModelon one model,execute-api:ManageConnectionson one API). - Per-business knowledge layout — a flat Drive folder is fine for the first few months; subdivide by topic (
shipping/,returns/,pricing/) once it grows past a couple of dozen docs, so writers know where new paragraphs go. - X-Ray tracing — on for
fn-answererandfn-handoff, sampling 100% during tuning, 10% in steady state. - Bedrock Guardrails contextual grounding check — managed grounding-and-relevance scoring. The custom citation-verification step in
fn-answereris roughly the same idea hand-rolled; turning on Guardrails moves the threshold into console configuration and adds PII redaction on every model call. Worth enabling once thresholds are stable. - Long-lived visitor identity — for logged-in customers who want their order history available, swap
connection_idfor an authenticatedcustomer_idat$connectand bind the scratchpad to that identity. Keep it opt-in. - Multi-tenant variant — if running this on behalf of multiple SMBs, namespace the KB and tables per tenant and inject
tenant_idinto every record. The architecture doesn’t change shape; the IDs do. - Slack two-way handoff — the diagram fans out to Slack as a notification only. A bidirectional Slack-to-visitor reply path (agent types in Slack, visitor sees it in the widget) is an additional Lambda + Slack Events Subscription; off the default diagram to keep the per-message cost in the always-free band.
If you’re recreating this
Start with Build & Deploy alone (a single Lambda, no triggers). Once git push reliably updates an empty stack, create the Bedrock Knowledge Base with one Drive doc and confirm a one-shot RetrieveAndGenerate call returns a passage. Then the WebSocket API with stub $connect/$default/$disconnect handlers that just echo back. Then the real fn-answerer with strict tool_use and citation verification (this is the part most worth integration-testing — intentionally try to make the model cite a passage outside the retrieved set and confirm the runtime downgrades to hand_off). Then the handoff fan-out and the gaps log. Cross-cutting (audit, logs, alarms, budget, archive) goes in from day one.