Part 2 of 7 · Website chat assistant series ~4 min read

How a conversation starts and stays alive

A chat widget feels live, but underneath it’s a careful little dance — the page loads a tiny script, the visitor clicks, a websocket opens, a scratchpad gets minted, and somewhere in between the cloud has to decide how much it’s willing to remember. Here’s the lifecycle of a single conversation, end to end.

Three moments in a conversation

Three moments: connect, exchange, idle A horizontal three-stage flow. Stage one, "Connect": the visitor clicks the widget bubble; the page mints a short-lived session token and opens a websocket to the cloud; the cloud creates a session row in the conversation store; the widget receives a small greeting. Stage two, "Exchange": the visitor types a message; it streams over the websocket; the cloud appends it to the session scratchpad, hands it to the answerer, streams the reply back word by word; the scratchpad is trimmed to the last few turns so it never grows unbounded. Stage three, "Idle": no message for a few minutes; the cloud closes the websocket gently, marks the session ended; the scratchpad expires after a short window so nothing visitor-typed lingers. A small note below: the session scratchpad is short-term memory only — the system does not remember the visitor next week. Stage 1 Connect • Visitor clicks the bubble • Page mints a short-lived session token • WebSocket opens to cloud • Cloud creates session row • Widget shows a small greeting Stage 2 Exchange • Visitor types a message • Cloud appends it to the session scratchpad • Answerer runs (next post) • Reply streams back word by word • Scratchpad trimmed to the last few turns Stage 3 Idle • No message for a few minutes • Cloud closes the websocket gently • Session marked ended • Scratchpad expires after a short window Short-term memory only — the system does not remember the visitor next week. No long-lived profile unless the visitor explicitly signs in.
Fig 2. Connect, exchange, idle. A websocket and a small scratchpad — that’s the whole shape.

Why a websocket and not a plain request

A chat widget is the one place on a website where a visitor expects words to appear in real time, the way a human typing would. Plain request/response works for a single question, but it falls apart on the second turn — the visitor sees a frozen UI while the cloud thinks, and follow-up questions feel laggy. A websocket makes the connection feel alive: messages stream back word by word as they’re generated, and the visitor can type again without waiting for a full reply to arrive.

It also keeps the per-message overhead low. Once the connection is open, each turn is a few bytes over the wire instead of a fresh TLS handshake plus authentication. That’s the difference between a 200ms chat and a 1.5s chat — a difference visitors notice without being able to name.

The session scratchpad

Every conversation has a small scratchpad — a short list of recent turns that the answerer reads before each reply. It exists so a visitor doesn’t have to re-establish context on every message: if turn one was “do you ship to Canada?” and turn two is “how long does it take?”, the scratchpad lets the answerer connect “it” to “shipping to Canada.”

The rules for the scratchpad are deliberately strict:

  • Short. Only the last few turns. Long histories make replies slower, more expensive, and weirdly off-topic as the AI reads things from earlier in the conversation that aren’t relevant anymore.
  • Trimmed automatically. Once the scratchpad reaches its limit, the oldest turn falls off. No manual cleanup, no growth without bound.
  • Session-scoped. The scratchpad belongs to one websocket session. When the session ends, it’s gone. The next time the visitor opens the widget, it’s a fresh start.
  • Expires quickly. Even before the session ends, idle scratchpads time out. Nothing the visitor typed lingers in storage longer than the conversation lasted.

The reason for all this restraint is that “memory” in a chat assistant is the source of half its problems. Long-lived memory invites privacy questions (what did the assistant store about that visitor?), surprise behaviours (the assistant brings up something from three weeks ago), and expensive prompts (every turn pays for a long history). Short-term memory, scoped to a session, sidesteps all of that. If you want long-lived memory — for a logged-in customer who wants their order history available — that’s a separate, opt-in feature, layered on top.

What happens at the edges

Three edge cases are worth designing for from the start, because they’re common and silent if mishandled:

  • The visitor refreshes the page. The websocket drops; the widget reopens it on the new page; the new connection gets a fresh scratchpad. Treating “same visitor” across page loads adds complexity that almost no SMB needs.
  • The visitor opens two tabs. Each tab gets its own session. They don’t share a scratchpad. This is the simplest behaviour and the one visitors expect — if they want to compare two threads, they expect them to be independent.
  • The connection drops mid-reply. The cloud finishes generating the reply, stores it on the (now-ended) session, and on next reconnect the widget shows it as the last turn. The visitor sees their answer the moment they come back online.

How this plugs into the next post

Everything in this post is plumbing — how a turn arrives at the answerer with the right context attached. The next post is about what the answerer actually does: how it searches your knowledge, how it requires a citation, and how it picks one of four moves on every visitor turn. The session and scratchpad are what let it focus on the current message without losing the thread.

All posts