A website chat assistant on AWS for a few dollars a month

The whole system on one page

Before any code, here’s the shape of what we’re building.

Fig 1. Four outside surfaces, three pieces inside AWS. Visitor opens the widget, the Answerer replies from your knowledge or hands off, and every miss feeds back into a weekly review.

What you set up once (the outside)

A small embed snippet — a few lines of script you paste into your site template once. The widget bubble appears in the corner; clicking it opens a chat panel that reuses your site’s own font and colour. The snippet does nothing on its own — the work happens after the websocket opens.
A knowledge folder — the help docs, FAQ, policies, pricing, hours, return rules, and anything else you’d want a new hire to read on day one. Lives in a Google Drive folder; you edit a doc, the assistant picks up the change on its next refresh, no deploy.
A handoff destination — the inbox, Slack channel, or shared queue you already use. The assistant packages the transcript, adds a one-line summary of what the visitor wanted, and drops it where you’ll see it.
A gaps log — a small Drive doc or sheet that fills up with the visitor turns the assistant couldn’t confidently answer. It’s the to-do list for next week’s knowledge updates.

What runs on every conversation (the inside)

The conversation gateway — accepts the websocket from the widget, opens a session keyed to a short-lived token, and holds the last few turns of context so follow-up questions don’t need to repeat the whole story. Idle conversations time out cleanly.
The answerer — on every visitor turn, searches your knowledge for relevant passages, hands the cleanest passages to a small AI, and asks for one of four moves: answer with a citation, ask one short clarifying question, hand off to a human, or politely decline (off-topic, unsafe, or out of remit). The answer streams back word by word so the visitor isn’t staring at a spinner.
Handoff and learning — when the answerer picks “hand off,” the transcript and a short summary go to your inbox or Slack. When confidence is low or no useful passage was found, the turn lands in the gaps log instead, with a timestamp and the question. You batch-review weekly.

In plain words

A visitor opens the chat. The widget connects to the cloud, the cloud loads a tiny scratchpad for the conversation, and the visitor types “do you ship to Canada?” The cloud searches your help docs, finds the shipping policy, and streams back a one-line answer with a small “from: shipping policy” citation. The visitor follows up — “how long does it take?” — and the cloud uses the same scratchpad, no need to repeat “to Canada.” Two turns later they ask something the docs don’t cover; the assistant doesn’t guess. It says it’ll get a human and drops the transcript in your inbox. The unanswered turn lands in your gaps log so you can write a paragraph about it and the next visitor won’t have to wait.

Total cost runs in coffee-money territory at typical small-business volume — cents per conversation, going up smoothly with how often the widget gets opened.

Design rules that shaped every decision

The assistant answers from your knowledge file only — never invents. No citation, no auto-answer.
Streaming first. The visitor sees words within a second; spinners on a chat widget are how visitors give up.
Short-term memory only. The session knows the last few turns; it does not remember the visitor next week. No long-lived profiles unless the visitor signs in.
Confidence gates the route. High-confidence with citation auto-answers; borderline becomes a clarifying question or a handoff; off-topic gets a polite decline.
Configuration lives in a Drive folder. Updating the FAQ, hours, or return policy never needs a deploy.
Misses are not failures — they are the input to next week’s knowledge update.

Why this shape

Most chat widgets fall into one of two traps. The first kind is a generic AI bot — happy to answer anything, including things about your business that aren’t true. The second kind is a glorified contact form — every message becomes a ticket in your queue, even the ones that have a one-line answer in your FAQ. Neither feels good for a visitor at 9pm who just wants to know if you ship to Canada.

The setup above splits the difference. A small AI grounded strictly in your own docs handles the answerable questions in real time and cites where the answer came from. Anything beyond its remit becomes a clean human handoff with the full transcript — no “please tell me more” loops. And the questions it can’t answer don’t disappear; they pile up in a single review queue you spend ten minutes on each week, which means month two’s assistant is meaningfully better than month one’s.

The next four posts walk through each piece in turn — how a conversation starts and stays alive, how the answerer stays grounded in your docs, how a handoff to a human works without making the visitor repeat themselves, and how gaps become better answers. One diagram per post. A cost breakdown and a final engineering reference at the end.

All posts