Series · 7 parts Published April 28, 2026

Voice agent

A serverless voice agent on AWS that answers your business phone, replies from your own knowledge in real time, and politely passes the rest to a human. Seven posts on the same system — one diagram at a time — with an engineering reference at the end.

  1. 01

    A voice agent on AWS for the price of a phone plan

    The whole system on one page — a listener, a brain, a speaker, and the under-one-second loop they share.

  2. 02

    How a call connects

    Three ways a call can go: voicemail after hours, AI session in business hours, or direct human transfer for VIP numbers.

  3. 03

    How the listener hears

    Streaming transcription, partial guesses refined live, locked the moment the caller pauses.

  4. 04

    How the brain decides what to say

    Four tools, one pick per turn: answer, book, transfer, end. The AI is allowed to be confident or to defer — never to invent.

  5. 05

    How the speaker stays natural

    The latency budget for a one-second-or-less reply. Where each millisecond goes, and what happens when the budget blows.

  6. 06

    What the voice agent costs

    Phone-bill territory at SMB volume. The phone number is the floor; everything else scales with how often it rings.

  7. 07

    Engineering reference: the voice agent architecture

    Same system, drawn purely for engineers. Service names, resource identifiers, region, Bedrock model IDs.

What does the voice agent do?
It answers your business phone, replies from your own knowledge file in real time (hours, services, prices, FAQs), and politely transfers anything beyond its remit to a human. The brain has exactly four tools per turn — answer from the knowledge file, book an appointment, transfer to a human, or end the call gracefully — and never invents a price, hour, or promise.
How much does it cost to run?
About $35–$50 per month at typical SMB volume of around 200 call minutes. The phone number is the floor (~$22/month), Secrets Manager runs ~40¢ per secret, and per-minute costs (Connect inbound minutes, Transcribe streaming, Polly synthesis) plus per-call Bedrock Haiku invocations scale with how often the line rings. An $80 monthly Budget alarm catches anything weird.
Which phone number provider does it use?
Amazon Connect. You claim a DID directly on Connect (or port your existing number), and the call lands on a Contact Flow that greets, screens for time-of-day and VIP rules, and routes. Connect Audio Stream pipes both directions of audio through Kinesis Video Streams into the AI session.
How does the agent avoid making things up?
Strict tool_use with four tool definitions (answer_from_knowledge, book_appointment, transfer_to_human, end_call) and required parameter schemas, so the model can only emit a structured tool call — never free text. The answer tool reads only from the knowledge file in S3 Vectors. If a fact isn’t in the file or the brain isn’t confident, it transfers rather than guessing.
What happens for after-hours calls?
The Contact Flow checks the current time against your business hours from the knowledge file. Outside hours, the caller hears your closed-hours message and gets a chance to leave a voicemail with a callback number — the AI never spins up. You see the voicemail the next morning. After-hours is essentially free.
How fast does the voice agent reply?
Under one second from when the caller stops talking to when the bot starts speaking, end-to-end. The latency budget is ~100ms for the listener to lock the final transcript, ~400ms for the brain to pick a tool and write the reply, ~200ms for the speaker’s first byte of audio, ~150ms for network and audio buffer, and ~150ms spare. Real traffic typically lands around 1.0–1.2 seconds once cold-start and jitter are included.
Which AWS services does it use?
Amazon Connect (DID + Contact Flow), Kinesis Video Streams (two-way audio bridge), Lambda (Python 3.14 for the orchestrator, brain, speaker, booking, config sync, and archive), Amazon Transcribe Streaming, Amazon Bedrock (Claude Haiku 4.5 via Global cross-Region inference) with S3 Vectors for the knowledge index, Amazon Polly Generative voices via the Bidirectional Streaming API, DynamoDB on-demand, S3, EventBridge, SNS, Secrets Manager, CloudWatch Logs with seven-day retention, and AWS Budgets. No always-on compute, no NAT Gateway.
All posts