Part 4 of 7 · Website chat assistant series ~4 min read

How a handoff to a human works

The handoff is the most important part of a chat assistant, and the part most assistants get wrong. The bad version makes the visitor repeat themselves to a human; the worse version drops the transcript on the floor; the worst version pretends to escalate and never does. Here’s how to design a clean one.

What a handoff has to do, in order

From the moment the assistant picks the “hand off” tool, four things have to happen, in this order, before the visitor closes the tab:

Four moves on every handoff: tell, package, deliver, hold A vertical flow of four numbered steps. Step one, "Tell the visitor": the assistant streams a single short line back, naming a realistic response window ("a human will follow up within business hours"); the widget asks for an email or phone number if one wasn't captured, and accepts it inline. Step two, "Package the transcript": the cloud assembles a small payload — the full conversation, a one-line summary the AI just wrote, the page the visitor was on, the contact, and the reason the handoff happened. Step three, "Deliver to one place": the payload goes to a single destination — the team's existing inbox, Slack channel, or shared queue — never two destinations at once. Step four, "Hold the rest of the session": the websocket stays open for a couple more turns in case the visitor adds context, and gracefully closes after that. Below: a banner reading "The visitor never repeats themselves to the human. The transcript is the ticket." Step 1 Tell the visitor A single short line, a realistic window, an inline ask for email or phone if needed. Step 2 Package the transcript Full conversation + one-line summary + page URL + contact + reason for handoff. Step 3 Deliver to one place Inbox, Slack, or shared queue — whichever the team already watches. One destination, never two. Step 4 Hold the rest of the session WebSocket stays open for any extra context, then closes gracefully. The visitor never repeats themselves to the human. The transcript is the ticket.
Fig 4. Four steps. The visitor experience is built around “tell first, deliver second” — the message lands before the queue does.

Step 1: tell the visitor first

The very first thing that happens, before any backend work, is the assistant streams a short, plain line back to the visitor: “Got it — I’ll have a human follow up. We usually reply within business hours.” Two things matter here. The window is realistic (not “within an hour” if it’s 9pm Friday) and the language is human (not “a ticket has been created”).

If the assistant doesn’t already have a way to reach the visitor — no email captured during onboarding, no phone — this is the moment to ask, inline. One field, one prompt: “What’s the best email to follow up on?” Not a form, not a popup. A chat line that the visitor types into the same chat. The point is to make “getting in touch” feel like part of the conversation, not a tax for getting an answer.

Step 2: package the transcript

Once the visitor knows a human is coming, the cloud assembles a small payload. The contents are unglamorous but specific:

  • The full transcript — every visitor turn and every assistant turn from this session. Not a summary, not the “important parts.” The whole thing, so the human can read it the way the visitor experienced it.
  • A one-line summary — the AI’s last useful job. What did the visitor want? “Visitor asked about a refund on order #12345 placed two weeks ago, after we couldn’t find a matching policy.” This is what the human reads in the notification preview.
  • Where they were — the page URL the chat opened on. “On the pricing page” is meaningfully different from “on the contact page.”
  • How to reach them — the email or phone they shared, or a session reference if they’re still on the page.
  • Why the handoff happened — one of: out of remit, low confidence, explicit request from the visitor, or hostile/unsafe message. The team uses this to spot patterns.

Step 3: deliver to one place

The package goes to exactly one destination. The team’s existing inbox, or Slack channel, or shared queue — whichever they’re already watching. Not the inbox and Slack: in practice that means two people see it, both assume the other will reply, and neither does. Pick one place per business and route everything there.

The format matches the destination. To Slack, the summary becomes a one-line preview with a button to expand the full transcript. To email, the summary is the subject line and the transcript is the body. To a queue, it’s a structured row. Same payload underneath, three packagings.

Step 4: hold the session for a moment

After step three, the websocket doesn’t close immediately. The visitor often types one more thing — the order number they forgot to include, the email they meant to send, “actually never mind, I figured it out.” The session stays open for a couple of turns; any new turns get appended to the transcript and forwarded as a follow-up note to the same destination. After that, the websocket closes and the session ends.

The “actually never mind” case is the underrated one. The system should let the visitor cancel cleanly — the assistant acknowledges, the team gets a small “visitor self-resolved” note appended to the original handoff, and the human knows not to send a wasted follow-up.

What this is not

This is intentionally not a full inbound ticketing system. There’s no priority queue, no SLA timer, no auto-assignment, no agent dashboard. For an SMB, those are overkill on day one and a maintenance burden every day after. The handoff is a thin pipe from a chat session into your existing inbox; the inbox you already use becomes the “agent dashboard.” If you outgrow that, you can layer a real ticketing tool later — but most businesses won’t.

All posts