Part 7 of 7 · Website chat assistant series ~5 min read

Engineering reference: the chat assistant architecture

Same system as the rest of the series, drawn purely for engineers. Service names, resource identifiers, region, Bedrock model IDs, Knowledge Base wiring, and the actual flow operations — everything you’d need to recreate this in your own AWS account.

Key takeaways · verified May 2026

  • Single AWS account in ap-southeast-1 (Singapore); Bedrock via Global cross-Region inference.
  • Five subsystems: Build & Deploy, Knowledge Sync, Conversation gateway (Lambda Function URL + API Gateway WebSocket), Answerer (RetrieveAndGenerateStream + strict tool_use), Handoff & learning.
  • Models: global.anthropic.claude-haiku-4-5-20251001-v1:0 + amazon.titan-embed-text-v2:0; vector store is S3 Vectors (managed by Bedrock).
  • WebSocket routes $connect/$default/$disconnect on wss-chat; replies stream via ApiGatewayManagementApi.PostToConnection; widget mints a short-lived JWT from fn-mint-token to keep long-lived secrets out of the browser.
  • Citation enforced at runtime: if the model emits answer with a citation outside the retrieved set, the runtime downgrades to hand_off.

Posts 1–6 walk through the system in plain language. This page is the dense version — nothing softened, just the architecture as you’d sketch it on a whiteboard during a design review.

Full technical architecture: serverless website chat assistant in ap-southeast-1 A detailed engineering diagram of the entire chat assistant. Three external surfaces at the top: GitHub (repo and Actions runner, OIDC token requestor); Google Workspace (Drive folder of help docs, FAQ, policies; reached via service account with domain-wide delegation); and the public internet (visitors loading the marketing site, the embedded widget script, and a websocket connection to the chat endpoint). Everything runs in a single AWS account in region ap-southeast-1 (Singapore). The AWS account contains five subsystems. Build and Deploy strip at the top: GitHub Actions exchanges with IAM OIDC Provider, assumes an IAM Role with a trust policy scoped to repo:owner/repo:ref:main, and runs SAM/CloudFormation to update the chat-assistant-prod stack. Knowledge sync strip below: a small sync Lambda mirrors the Drive folder to an S3 prefix on a schedule (since Bedrock KBs have no native Drive connector); a Bedrock Knowledge Base S3 connector reads that prefix, chunks the docs, embeds with Titan Text Embeddings v2, and stores vectors in the managed S3 Vectors index that Bedrock provisions; manual re-sync is one CLI call away. Three runtime columns below. Conversation gateway (left): the embedded widget posts an auth token request to a Lambda Function URL fn-mint-token which returns a short-lived signed token; the widget then opens a WSS connection to the API Gateway WebSocket API wss-chat; the $connect route invokes Lambda fn-ws-connect which writes a session row to DynamoDB tbl-sessions; the $default route invokes Lambda fn-ws-message which appends the visitor turn to the session scratchpad and invokes the answerer; the $disconnect route invokes Lambda fn-ws-disconnect which expires the scratchpad. Answerer (middle): Lambda fn-answerer issues a Bedrock retrieve-and-generate call against the managed Knowledge Base kb-website-knowledge; the model is invoked via global.anthropic.claude-haiku-4-5-20251001-v1:0 with strict tool_use over four tool definitions answer, clarify, hand_off, decline; on the answer tool the runtime verifies the cited passage was in the retrieved set, streams the reply back over the websocket, and writes the turn to DynamoDB tbl-transcripts; on hand_off it invokes the Handoff column. Handoff and learning (right): Lambda fn-handoff packages the transcript, writes it to S3 chat-assistant-data/transcripts/, publishes a notification to SNS topic t-handoffs which fans out to email and optionally Slack via Amazon Q Developer in chat applications (formerly AWS Chatbot); on clarify, decline, and low-confidence answer it appends a row to DynamoDB tbl-gaps; a separate Lambda fn-gaps-batch runs on EventBridge cron 0 6 ? * MON * weekly, groups similar gaps using Titan Embeddings cosine similarity, and writes a Drive doc grouped-gaps-week-NN. Cross-cutting bottom strip: DynamoDB tables tbl-sessions, tbl-transcripts, tbl-gaps with appropriate TTL on session rows; CloudWatch Logs are configured with RetentionInDays of 7 across every log group; SNS topics t-handoffs and t-alarms; AWS Budgets has a $10 monthly alarm; Lambda fn-archive runs on a separate weekly cron 0 3 ? * SUN * to move old transcript blobs to S3 Glacier Instant Retrieval storage class. GitHub github.com/owner/repo Actions runner · OIDC token requestor Google Workspace Drive: help, FAQ, policies service account · domain-wide delegation Visitors embed script + WSS connection streaming text over websocket AWS Account Region: ap-southeast-1 (Singapore) · Bedrock via Global CRIS Build & Deploy IAM OIDC Provider token.actions.githubusercontent.com IAM Role trust: repo:owner/repo:ref:main SAM / CloudFormation stack: chat-assistant-prod git push & request token AssumeRole sam deploy → creates stack resources below Knowledge Sync Drive→S3 sync Lambda + KB fn-drive-sync · kb-website-knowledge Bedrock Titan Embeddings amazon.titan-embed-text-v2:0 S3 Vectors vector index (managed) Drive folder 5-min sync Conversation gateway Lambda Function URL fn-mint-token (signed JWT) API Gateway WebSocket wss-chat ($connect/$default/$disconnect) route invoke AWS Lambda fn-ws-connect / message / disconnect session + scratchpad DynamoDB tbl-sessions (TTL on idle) → per-turn invoke fn-answerer visitor opens widget & types Answerer (per turn) AWS Lambda fn-answerer (per turn) Retrieve RetrieveAndGenerateStream kb-website-knowledge query strict tool_use Bedrock Claude Haiku 4.5 global.anthropic.claude-haiku-4-5 verify citation 4 tools, 1 pick answer · clarify hand_off · decline → stream reply via PostToConnection turn appended to tbl-transcripts Handoff & learning AWS Lambda fn-handoff (on hand_off tool) package transcript S3 + SNS transcripts/ · t-handoffs fan-out SES + Amazon Q Developer email + optional Slack on clarify/decline DynamoDB tbl-gaps (visitor turn + URL) EventBridge cron Lambda fn-gaps-batch weekly group + Drive doc write → gaps grouped, ready for review Knowledge Base feeds fn-answerer Cross-cutting DynamoDB tbl-sessions, tbl-transcripts, tbl-gaps CloudWatch Logs RetentionInDays: 7 SNS t-handoffs, t-alarms AWS Budgets budget-monthly: $10 Lambda fn-archive EventBridge cron(0 3 ? * SUN *) → old transcripts to S3 Glacier Instant Retrieval
Fig 7. Full architecture, ap-southeast-1. White boxes = AWS resources; dashed AWS container; dashed grey boxes = subsystem groupings; dashed grey arrows = config feed and side branches.

Read this top-down, then column-by-column

Top row is the three external surfaces. Below it, the AWS account contains five subsystems: Build & Deploy across the top, then Knowledge Sync, then three runtime columns (Conversation gateway, Answerer, Handoff & learning), with a Cross-cutting strip at the bottom. A visitor opens the widget, the page calls fn-mint-token for a short-lived JWT, and connects to wss-chat. The $connect route writes a session row; each $default message invokes fn-ws-message which appends to the session scratchpad and invokes fn-answerer. The answerer issues a Bedrock RetrieveAndGenerateStream against kb-website-knowledge with strict tool_use over four tools (answer, clarify, hand_off, decline), streams the reply back via PostToConnection, and writes the turn into tbl-transcripts. hand_off invokes fn-handoff; clarify and decline append to tbl-gaps for weekly review.

Naming conventions used in the diagram

  • Lambda functions: fn-<purpose>fn-mint-token, fn-ws-connect, fn-ws-message, fn-ws-disconnect, fn-answerer, fn-handoff, fn-gaps-batch, fn-drive-sync, fn-archive.
  • Lambda runtimes: Python 3.13 for the answerer, handoff, gaps batch, drive sync, and archive functions; Node.js 22.x is fine for fn-mint-token and the WebSocket route handlers if you prefer JS. Both runtimes are current LTS-equivalents for Lambda as of 2026-05.
  • DynamoDB tables: tbl-sessions (partition key connection_id, with a scratchpad list trimmed to the last few turns and a TTL on idle), tbl-transcripts (partition key session_id, sort key turn_index), tbl-gaps (partition key week_iso, sort key created_at#turn_id with the visitor turn, page URL, closest-passage scores).
  • SNS topics: t-handoffs for human-handoff fan-out (email, optional Slack), t-alarms for general failures.
  • S3 layout: single bucket chat-assistant-data with prefixes transcripts/{date}/, archive/.
  • Knowledge Base: kb-website-knowledge, a Bedrock managed Knowledge Base with an S3 connector pointed at the synced help/policies prefix. Bedrock KBs do not have a native Drive connector as of 2026-05, so a small fn-drive-sync Lambda mirrors the Drive folder to S3 on a 5-minute schedule. Embeddings model is amazon.titan-embed-text-v2:0; vector store is Amazon S3 Vectors (the cheapest quick-create option since mid-2025 — zero idle cost — provisioned and managed by Bedrock when you create the KB). OpenSearch Serverless and Aurora PostgreSQL Serverless remain valid alternatives if you outgrow S3 Vectors’ query throughput.

Region, model access, websocket details, and Drive auth

Everything runs in ap-southeast-1 (Singapore). Bedrock model invocations use the Global cross-Region inference profile (global. prefix on model IDs) — data at rest stays in Singapore; inference may route to other regions for capacity, billed at on-demand Singapore rates.

The widget mints its session token from fn-mint-token rather than authenticating directly against API Gateway; the JWT is short-lived (a few minutes) and is checked in fn-ws-connect via a Lambda authorizer. This keeps long-lived secrets out of browsers entirely. Streaming replies use ApiGatewayManagementApi.PostToConnection with chunked writes — the answerer flushes partial responses every few tokens so the visitor sees words appear within a second.

Google Drive authentication uses a service account with domain-wide delegation over a single scope: https://www.googleapis.com/auth/drive.readonly on the help-docs folder only. The credential lives in AWS Secrets Manager. The fn-drive-sync Lambda runs on a 5-minute EventBridge schedule, pulls any changed docs from Drive, writes them to chat-assistant-data/kb-source/, and lets the Bedrock KB’s S3 connector index from there. Editing a doc and saving propagates within ~10 minutes (5 to sync + 5 to index); manual re-sync is one CLI call to StartIngestionJob.

The answerer uses strict tool_use: four tool definitions (answer, clarify, hand_off, decline) with required parameter schemas. The answer tool requires a citation_id parameter referencing one of the retrieved passages by id; the runtime validates the citation against the retrieved set before allowing PostToConnection to flush. If the model emits an answer with a citation that wasn’t in the retrieved set, the runtime downgrades to hand_off — the safer-by-default failure mode.

What’s deliberately not on the diagram

  • IAM policy details — per-Lambda execution roles are minimal (one bucket prefix, one or two tables, a single Bedrock KB ID, InvokeModel on one model, execute-api:ManageConnections on one API).
  • Per-business knowledge layout — a flat Drive folder is fine for the first few months; subdivide by topic (shipping/, returns/, pricing/) once it grows past a couple of dozen docs, so writers know where new paragraphs go.
  • X-Ray tracing — on for fn-answerer and fn-handoff, sampling 100% during tuning, 10% in steady state.
  • Bedrock Guardrails — managed contextual grounding (numeric grounding + relevance scores), PII redaction, prompt-attack/jailbreak filters, and the newer Automated Reasoning checks (formal-logic policy validation, GA in 2025). The custom citation-verification step in fn-answerer is roughly the contextual-grounding idea hand-rolled; turning on Guardrails moves the threshold into console configuration and adds PII redaction and prompt-attack defence on every model call. Worth enabling once thresholds are stable.
  • Long-lived visitor identity — for logged-in customers who want their order history available, swap connection_id for an authenticated customer_id at $connect and bind the scratchpad to that identity. Keep it opt-in.
  • Multi-tenant variant — if running this on behalf of multiple SMBs, namespace the KB and tables per tenant and inject tenant_id into every record. The architecture doesn’t change shape; the IDs do.
  • Slack two-way handoff — the diagram fans out to Slack as a notification only. A bidirectional Slack-to-visitor reply path (agent types in Slack, visitor sees it in the widget) is an additional Lambda + Slack Events Subscription; off the default diagram to keep the per-message cost in the always-free band.

If you’re recreating this

Start with Build & Deploy alone (a single Lambda, no triggers). Once git push reliably updates an empty stack, wire up fn-drive-sync with one help doc and confirm the doc lands in S3 within five minutes. Create the Bedrock Knowledge Base over that S3 prefix and confirm a one-shot RetrieveAndGenerateStream call returns a passage. Then the WebSocket API with stub $connect/$default/$disconnect handlers that just echo back. Then the real fn-answerer with strict tool_use and citation verification (this is the part most worth integration-testing — intentionally try to make the model cite a passage outside the retrieved set and confirm the runtime downgrades to hand_off). Then the handoff fan-out and the gaps log. Cross-cutting (audit, logs, alarms, budget, archive) goes in from day one.

All posts