Part 1 of 7 · Website chat assistant series ~5 min read

A website chat assistant on AWS for a few dollars a month

A visitor lands on your site at 9pm with a question. They don’t want to fill out a form, they don’t want to email and wait, and they definitely don’t want to read your FAQ. Here’s how to design a small chat widget that answers them from your own knowledge in real time, hands the rest to you cleanly, and quietly logs every question it couldn’t answer so the assistant gets smarter every week.

The whole system on one page

Before any code, here’s the shape of what we’re building.

System architecture: four outside surfaces, three inside AWS At the top, four external surfaces in a row. Far left, "The visitor" — the person on your website who opens the chat widget. Centre-left, "Your knowledge" — a Drive folder with help articles, policies, FAQ, pricing, hours. Centre-right, "Your team" — where the assistant escalates anything beyond its remit. Far right, "Your gaps log" — a small Drive doc or sheet where unanswered questions accumulate for weekly review. Each connects via an arrow to the AWS account container below. Visitor has a two-way arrow representing both the inbound message and the streaming reply. Knowledge feeds in. Team receives handoffs with the full transcript. Gaps log receives unanswered or low-confidence turns. Inside the AWS account are three components in a row, mirroring the layout above. On the left, the Conversation gateway — accepts the websocket, opens a session, holds short-term memory. In the middle, the Answerer — searches your knowledge, requires a citation, picks one of four moves: answer, clarify, hand off, or decline. On the right, the Handoff and learning — packages the transcript for a human, queues low-confidence turns into the gaps log. Internal arrows flow left to right. A note at the bottom reads: every visitor turn ends in one of four outcomes, and the assistant never invents. The visitor opens the widget Your knowledge help docs, FAQ, policies Your team when a human is needed Your gaps log unanswered questions message in & out grounds handoff with transcript misses queued weekly AWS account Conversation gateway opens the session, holds short-term memory Answerer picks one of four: answer, clarify, hand off, decline Handoff & learning transcript out, gaps logged turn decision Every turn ends in one of four outcomes — and the assistant never invents.
Fig 1. Four outside surfaces, three pieces inside AWS. Visitor opens the widget, the Answerer replies from your knowledge or hands off, and every miss feeds back into a weekly review.

What you set up once (the outside)

  • A small embed snippet — a few lines of script you paste into your site template once. The widget bubble appears in the corner; clicking it opens a chat panel that reuses your site’s own font and colour. The snippet does nothing on its own — the work happens after the websocket opens.
  • A knowledge folder — the help docs, FAQ, policies, pricing, hours, return rules, and anything else you’d want a new hire to read on day one. Lives in a Google Drive folder; you edit a doc, the assistant picks up the change on its next refresh, no deploy.
  • A handoff destination — the inbox, Slack channel, or shared queue you already use. The assistant packages the transcript, adds a one-line summary of what the visitor wanted, and drops it where you’ll see it.
  • A gaps log — a small Drive doc or sheet that fills up with the visitor turns the assistant couldn’t confidently answer. It’s the to-do list for next week’s knowledge updates.

What runs on every conversation (the inside)

  • The conversation gateway — accepts the websocket from the widget, opens a session keyed to a short-lived token, and holds the last few turns of context so follow-up questions don’t need to repeat the whole story. Idle conversations time out cleanly.
  • The answerer — on every visitor turn, searches your knowledge for relevant passages, hands the cleanest passages to a small AI, and asks for one of four moves: answer with a citation, ask one short clarifying question, hand off to a human, or politely decline (off-topic, unsafe, or out of remit). The answer streams back word by word so the visitor isn’t staring at a spinner.
  • Handoff and learning — when the answerer picks “hand off,” the transcript and a short summary go to your inbox or Slack. When confidence is low or no useful passage was found, the turn lands in the gaps log instead, with a timestamp and the question. You batch-review weekly.

In plain words

A visitor opens the chat. The widget connects to the cloud, the cloud loads a tiny scratchpad for the conversation, and the visitor types “do you ship to Canada?” The cloud searches your help docs, finds the shipping policy, and streams back a one-line answer with a small “from: shipping policy” citation. The visitor follows up — “how long does it take?” — and the cloud uses the same scratchpad, no need to repeat “to Canada.” Two turns later they ask something the docs don’t cover; the assistant doesn’t guess. It says it’ll get a human and drops the transcript in your inbox. The unanswered turn lands in your gaps log so you can write a paragraph about it and the next visitor won’t have to wait.

Total cost runs in coffee-money territory at typical small-business volume — cents per conversation, going up smoothly with how often the widget gets opened.

Design rules that shaped every decision

  • The assistant answers from your knowledge file only — never invents. No citation, no auto-answer.
  • Streaming first. The visitor sees words within a second; spinners on a chat widget are how visitors give up.
  • Short-term memory only. The session knows the last few turns; it does not remember the visitor next week. No long-lived profiles unless the visitor signs in.
  • Confidence gates the route. High-confidence with citation auto-answers; borderline becomes a clarifying question or a handoff; off-topic gets a polite decline.
  • Configuration lives in a Drive folder. Updating the FAQ, hours, or return policy never needs a deploy.
  • Misses are not failures — they are the input to next week’s knowledge update.

Why this shape

Most chat widgets fall into one of two traps. The first kind is a generic AI bot — happy to answer anything, including things about your business that aren’t true. The second kind is a glorified contact form — every message becomes a ticket in your queue, even the ones that have a one-line answer in your FAQ. Neither feels good for a visitor at 9pm who just wants to know if you ship to Canada.

The setup above splits the difference. A small AI grounded strictly in your own docs handles the answerable questions in real time and cites where the answer came from. Anything beyond its remit becomes a clean human handoff with the full transcript — no “please tell me more” loops. And the questions it can’t answer don’t disappear; they pile up in a single review queue you spend ten minutes on each week, which means month two’s assistant is meaningfully better than month one’s.

The next four posts walk through each piece in turn — how a conversation starts and stays alive, how the answerer stays grounded in your docs, how a handoff to a human works without making the visitor repeat themselves, and how gaps become better answers. One diagram per post. A cost breakdown and a final engineering reference at the end.

All posts