Part 7 of 7 · Website chat assistant series ~5 min read

Engineering reference: the chat assistant architecture

Same system as the rest of the series, drawn purely for engineers. Service names, resource identifiers, region, Bedrock model IDs, Knowledge Base wiring, and the actual flow operations — everything you’d need to recreate this in your own AWS account.

Posts 1–6 walk through the system in plain language. This page is the dense version — nothing softened, just the architecture as you’d sketch it on a whiteboard during a design review.

Full technical architecture: serverless website chat assistant in ap-southeast-1 A detailed engineering diagram of the entire chat assistant. Three external surfaces at the top: GitHub (repo and Actions runner, OIDC token requestor); Google Workspace (Drive folder of help docs, FAQ, policies; reached via service account with domain-wide delegation); and the public internet (visitors loading the marketing site, the embedded widget script, and a websocket connection to the chat endpoint). Everything runs in a single AWS account in region ap-southeast-1 (Singapore). The AWS account contains five subsystems. Build and Deploy strip at the top: GitHub Actions exchanges with IAM OIDC Provider, assumes an IAM Role with a trust policy scoped to repo:owner/repo:ref:main, and runs SAM/CloudFormation to update the chat-assistant-prod stack. Knowledge sync strip below: Bedrock Knowledge Bases connector reads the Drive folder, chunks the docs, embeds with Titan Embeddings, and stores vectors in the managed OpenSearch Serverless index that Bedrock provisions; sync runs on a schedule with manual re-sync available. Three runtime columns below. Conversation gateway (left): the embedded widget posts an auth token request to a Lambda Function URL fn-mint-token which returns a short-lived signed token; the widget then opens a WSS connection to the API Gateway WebSocket API wss-chat; the $connect route invokes Lambda fn-ws-connect which writes a session row to DynamoDB tbl-sessions; the $default route invokes Lambda fn-ws-message which appends the visitor turn to the session scratchpad and invokes the answerer; the $disconnect route invokes Lambda fn-ws-disconnect which expires the scratchpad. Answerer (middle): Lambda fn-answerer issues a Bedrock retrieve-and-generate call against the managed Knowledge Base kb-website-knowledge; the model is invoked via global.anthropic.claude-haiku-4-5-20251001-v1:0 with strict tool_use over four tool definitions answer, clarify, hand_off, decline; on the answer tool the runtime verifies the cited passage was in the retrieved set, streams the reply back over the websocket, and writes the turn to DynamoDB tbl-transcripts; on hand_off it invokes the Handoff column. Handoff and learning (right): Lambda fn-handoff packages the transcript, writes it to S3 chat-assistant-data/transcripts/, publishes a notification to SNS topic t-handoffs which fans out to email and optionally Slack via Chatbot; on clarify, decline, and low-confidence answer it appends a row to DynamoDB tbl-gaps; a separate Lambda fn-gaps-batch runs on EventBridge cron 0 6 ? * MON * weekly, groups similar gaps using Titan Embeddings cosine similarity, and writes a Drive doc grouped-gaps-week-NN. Cross-cutting bottom strip: DynamoDB tables tbl-sessions, tbl-transcripts, tbl-gaps with appropriate TTL on session rows; CloudWatch Logs are configured with RetentionInDays of 7 across every log group; SNS topics t-handoffs and t-alarms; AWS Budgets has a $10 monthly alarm; Lambda fn-archive runs on a separate weekly cron 0 3 ? * SUN * to move old transcript blobs to S3 Glacier Instant Retrieval storage class. GitHub github.com/owner/repo Actions runner · OIDC token requestor Google Workspace Drive: help, FAQ, policies service account · domain-wide delegation Visitors embed script + WSS connection streaming text over websocket AWS Account Region: ap-southeast-1 (Singapore) · Bedrock via Global CRIS Build & Deploy IAM OIDC Provider token.actions.githubusercontent.com IAM Role trust: repo:owner/repo:ref:main SAM / CloudFormation stack: chat-assistant-prod git push & request token AssumeRole sam deploy → creates stack resources below Knowledge Sync Bedrock Knowledge Base kb-website-knowledge (managed) Bedrock Titan Embeddings amazon.titan-embed-text-v2:0 OpenSearch Serverless vector index (managed) Drive connector scheduled sync Conversation gateway Lambda Function URL fn-mint-token (signed JWT) API Gateway WebSocket wss-chat ($connect/$default/$disconnect) route invoke AWS Lambda fn-ws-connect / message / disconnect session + scratchpad DynamoDB tbl-sessions (TTL on idle) → per-turn invoke fn-answerer visitor opens widget & types Answerer (per turn) AWS Lambda fn-answerer (per turn) Retrieve Bedrock RetrieveAndGenerate kb-website-knowledge query strict tool_use Bedrock Claude Haiku 4.5 global.anthropic.claude-haiku-4-5 verify citation 4 tools, 1 pick answer · clarify hand_off · decline → stream reply via PostToConnection turn appended to tbl-transcripts Handoff & learning AWS Lambda fn-handoff (on hand_off tool) package transcript S3 + SNS transcripts/ · t-handoffs fan-out SES + AWS Chatbot email + optional Slack on clarify/decline DynamoDB tbl-gaps (visitor turn + URL) EventBridge cron Lambda fn-gaps-batch weekly group + Drive doc write → gaps grouped, ready for review Knowledge Base feeds fn-answerer Cross-cutting DynamoDB tbl-sessions, tbl-transcripts, tbl-gaps CloudWatch Logs RetentionInDays: 7 SNS t-handoffs, t-alarms AWS Budgets budget-monthly: $10 Lambda fn-archive EventBridge cron(0 3 ? * SUN *) → old transcripts to S3 Glacier Instant Retrieval
Fig 7. Full architecture, ap-southeast-1. White boxes = AWS resources; dashed AWS container; dashed grey boxes = subsystem groupings; dashed grey arrows = config feed and side branches.

Read this top-down, then column-by-column

Top row is the three external surfaces. Below it, the AWS account contains five subsystems: Build & Deploy across the top, then Knowledge Sync, then three runtime columns (Conversation gateway, Answerer, Handoff & learning), with a Cross-cutting strip at the bottom. A visitor opens the widget, the page calls fn-mint-token for a short-lived JWT, and connects to wss-chat. The $connect route writes a session row; each $default message invokes fn-ws-message which appends to the session scratchpad and invokes fn-answerer. The answerer issues a Bedrock RetrieveAndGenerate against kb-website-knowledge with strict tool_use over four tools (answer, clarify, hand_off, decline), streams the reply back via PostToConnection, and writes the turn into tbl-transcripts. hand_off invokes fn-handoff; clarify and decline append to tbl-gaps for weekly review.

Naming conventions used in the diagram

  • Lambda functions: fn-<purpose>fn-mint-token, fn-ws-connect, fn-ws-message, fn-ws-disconnect, fn-answerer, fn-handoff, fn-gaps-batch, fn-archive.
  • DynamoDB tables: tbl-sessions (partition key connection_id, with a scratchpad list trimmed to the last few turns and a TTL on idle), tbl-transcripts (partition key session_id, sort key turn_index), tbl-gaps (partition key week_iso, sort key created_at#turn_id with the visitor turn, page URL, closest-passage scores).
  • SNS topics: t-handoffs for human-handoff fan-out (email, optional Slack), t-alarms for general failures.
  • S3 layout: single bucket chat-assistant-data with prefixes transcripts/{date}/, archive/.
  • Knowledge Base: kb-website-knowledge, a Bedrock managed Knowledge Base with a Drive connector pointed at the help/policies folder, embeddings model amazon.titan-embed-text-v2:0, vector store on Amazon OpenSearch Serverless (provisioned and managed by Bedrock when you create the KB).

Region, model access, websocket details, and Drive auth

Everything runs in ap-southeast-1 (Singapore). Bedrock model invocations use the Global cross-Region inference profile (global. prefix on model IDs) — data at rest stays in Singapore; inference may route to other regions for capacity, billed at on-demand Singapore rates.

The widget mints its session token from fn-mint-token rather than authenticating directly against API Gateway; the JWT is short-lived (a few minutes) and is checked in fn-ws-connect via a Lambda authorizer. This keeps long-lived secrets out of browsers entirely. Streaming replies use ApiGatewayManagementApi.PostToConnection with chunked writes — the answerer flushes partial responses every few tokens so the visitor sees words appear within a second.

Google Drive authentication uses a service account with domain-wide delegation over a single scope: https://www.googleapis.com/auth/drive.readonly on the help-docs folder only. The Bedrock Knowledge Base Drive connector consumes that credential out of AWS Secrets Manager. Editing a doc and saving triggers a re-sync within minutes; manual re-sync is one CLI call.

The answerer uses strict tool_use: four tool definitions (answer, clarify, hand_off, decline) with required parameter schemas. The answer tool requires a citation_id parameter referencing one of the retrieved passages by id; the runtime validates the citation against the retrieved set before allowing PostToConnection to flush. If the model emits an answer with a citation that wasn’t in the retrieved set, the runtime downgrades to hand_off — the safer-by-default failure mode.

What’s deliberately not on the diagram

  • IAM policy details — per-Lambda execution roles are minimal (one bucket prefix, one or two tables, a single Bedrock KB ID, InvokeModel on one model, execute-api:ManageConnections on one API).
  • Per-business knowledge layout — a flat Drive folder is fine for the first few months; subdivide by topic (shipping/, returns/, pricing/) once it grows past a couple of dozen docs, so writers know where new paragraphs go.
  • X-Ray tracing — on for fn-answerer and fn-handoff, sampling 100% during tuning, 10% in steady state.
  • Bedrock Guardrails contextual grounding check — managed grounding-and-relevance scoring. The custom citation-verification step in fn-answerer is roughly the same idea hand-rolled; turning on Guardrails moves the threshold into console configuration and adds PII redaction on every model call. Worth enabling once thresholds are stable.
  • Long-lived visitor identity — for logged-in customers who want their order history available, swap connection_id for an authenticated customer_id at $connect and bind the scratchpad to that identity. Keep it opt-in.
  • Multi-tenant variant — if running this on behalf of multiple SMBs, namespace the KB and tables per tenant and inject tenant_id into every record. The architecture doesn’t change shape; the IDs do.
  • Slack two-way handoff — the diagram fans out to Slack as a notification only. A bidirectional Slack-to-visitor reply path (agent types in Slack, visitor sees it in the widget) is an additional Lambda + Slack Events Subscription; off the default diagram to keep the per-message cost in the always-free band.

If you’re recreating this

Start with Build & Deploy alone (a single Lambda, no triggers). Once git push reliably updates an empty stack, create the Bedrock Knowledge Base with one Drive doc and confirm a one-shot RetrieveAndGenerate call returns a passage. Then the WebSocket API with stub $connect/$default/$disconnect handlers that just echo back. Then the real fn-answerer with strict tool_use and citation verification (this is the part most worth integration-testing — intentionally try to make the model cite a passage outside the retrieved set and confirm the runtime downgrades to hand_off). Then the handoff fan-out and the gaps log. Cross-cutting (audit, logs, alarms, budget, archive) goes in from day one.

All posts