Part 7 of 7 · Ticket router series ~8 min read

Engineering reference: the ticket router architecture

Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the SES inbound rule set, the SQS queue config, the DynamoDB schemas, and the Slack interactive flow. Read alongside the previous six posts; this one’s the build sheet.

Region and account shape

Default region: ap-southeast-1 (Singapore). SES inbound, Bedrock cross-Region inference, SQS, and Lambda Function URLs are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for an SMB is a ticket sitting in the wrong queue for an hour, not a regional outage. One AWS account dedicated to the router (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.

Topology

AWS topology of the ticket router A topology diagram with three regions stacked vertically inside one AWS account boundary. Top region: ingress. Three boxes show the three intake lanes — an SES inbound rule set with action S3 PUT to s3://tr-raw-mail/ plus the intake-mail Lambda that builds a ticket, a web-form Function URL on the intake-form Lambda that builds a ticket from posted form fields, and a chat Function URL on the intake-chat Lambda that folds a finished conversation into a ticket. All three write the ticket once to DynamoDB tr-tickets, de-duplicate against recent tickets, and send the ticket id to the tr-intake SQS queue. Middle region: processing. The read Lambda is triggered by the SQS queue; it reads the ticket body and calls Bedrock Haiku 4.5 to return topic, urgency, and tone, writing those tags back to tr-tickets; then the router Lambda reads the rules and voice objects from s3://tr-rules-source/, checks the VIP list and priority rules, and picks one of four moves — route, priority, escalate, or hold — writing a row to DynamoDB tr-routes. Bottom region: dispatch and correction. The dispatch Lambda resolves the team, sets the queue position, runs the duplicate check, composes the hand-off card, and posts to Slack via chat.postMessage or sends an email via SES outbound. Slack interactive button clicks (Reassign, Bump, Split) land on a Function URL Lambda correct-handler that updates tr-tickets and tr-routes, writes a row to tr-corrections and tr-audit, and on reassign refreshes the labelled examples in s3://tr-rules-source/examples.jsonl. CloudWatch Logs collects from every Lambda at 7-day retention. Across the right edge: a small box labelled AWS Budgets alarm at $15 monthly threshold, posting to SNS topic tr-cost-alarm. A note at the bottom: the router only sorts and routes — and every correction is logged to tr-corrections. Ingress SES inbound rule set tr-inbound-rules action: S3 PUT s3://tr-raw-mail/ trigger: intake-mail Function URL · web form intake-form Lambda AuthType: NONE shared-secret check → builds a ticket Function URL · chat intake-chat Lambda folds chat into one ticket → tr-tickets tr-intake SQS queue ticket id · de-duped · with DLQ Processing Lambda · read SQS event source Bedrock Haiku 4.5: topic, urgency, tone → tr-tickets tags Lambda · router reads rules.csv + voice.txt from S3 checks VIPs, picks one of four moves DynamoDB tr-routes move: route move: priority move: escalate move: hold Dispatch & correction Lambda · dispatch resolves team, queue pos, de-dupe; Slack post or SES outbound Slack interactive card with [Reassign] [Bump] [Split] button clicks → Function URL Lambda · correct-handler writes tr-corrections, tr-audit; on reassign refreshes the labelled examples in S3 The router only sorts and routes — and every correction is logged to tr-corrections.
Fig 7. AWS topology, in three regions of the diagram: ingress (three lanes onto one queue), processing (the read call then the router picking a move), dispatch and correction (the ticket lands and a human’s correction is recorded and fed back). Every Lambda is event-driven; nothing is synchronous-chained.

Lambda functions

All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.

  • intake-mail — S3 PUT trigger on s3://tr-raw-mail/. Parses the MIME, extracts sender/subject/body, strips quoted reply history, matches thread headers against open tickets in tr-tickets to merge replies, writes a new ticket (or appends to an existing one), and sends the ticket id to the tr-intake SQS queue. Memory: 256 MB. Timeout: 30 s.
  • intake-form — Lambda Function URL, AuthType: NONE, verifies a shared secret (in Secrets Manager under tr/form/secret) on the POST body. Builds a ticket from the form fields in the same shape as the mail lane, writes to tr-tickets, enqueues the id. Memory: 256 MB. Timeout: 15 s.
  • intake-chat — Lambda Function URL, signature-verified against the chat tool’s secret. Folds a finished conversation into one ticket body, writes to tr-tickets, enqueues the id. Memory: 256 MB. Timeout: 15 s.
  • read — SQS event source on tr-intake (batch size 5, partial-batch responses enabled). For each ticket, calls Bedrock Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0 via global.anthropic.claude-haiku-4-5-20251001-v1:0) with the ticket body, the topic list from rules.csv, and the labelled examples from examples.jsonl; parses the returned JSON (topic, urgency, tone); writes the tags back to tr-tickets. On a malformed model response, retries once with a stricter prompt, then tags topic: unsure so the router holds it. Memory: 512 MB. Timeout: 30 s. This is the only Bedrock callsite.
  • router — invoked by read after tagging (or as a second SQS stage). Reads s3://tr-rules-source/rules.csv (topic-to-team map) and voice.txt (urgency words, VIP list, priority rules). Applies the decision flow from Part 3, picks one of route, priority, escalate, hold, and writes a row to tr-routes. No Bedrock calls. Memory: 256 MB. Timeout: 15 s.
  • dispatch — triggered on new tr-routes rows (DynamoDB Streams). Resolves the team, sets the queue position, runs the duplicate check against recent open tickets, formats the hand-off card, and ships via Slack chat.postMessage (tr/slack/bot-token in Secrets Manager) or SES SendRawEmail to the team’s shared inbox. Writes the dispatch outcome back to tr-routes. Memory: 256 MB. Timeout: 30 s.
  • correct-handler — Lambda Function URL, public with AuthType: NONE; verifies a Slack signature on the request body. Triggered by Slack interactive button clicks (Reassign/Bump/Split) and by email-link clicks. Updates tr-tickets and tr-routes; writes to tr-corrections and tr-audit; on reassign or bump, refreshes examples.jsonl in s3://tr-rules-source/ with the corrected label (capped to the most recent N examples per topic). On split, creates two new tickets and re-enqueues both. Memory: 256 MB. Timeout: 15 s.
  • drive-sync — EventBridge Scheduler target, fires every 15 minutes. Uses the Google Sheets API + Docs API (service-account credentials in Secrets Manager under tr/drive/sa) to export the rules sheet and the rules doc, writing rules.csv and voice.txt to s3://tr-rules-source/ only if changed since the last sync. Memory: 256 MB. Timeout: 30 s.
  • digest — EventBridge Scheduler target, weekly Sunday 6pm. Reads tr-routes and tr-corrections for the past week; posts a summary to a configured Slack channel: volume by topic and team, the correction rate, and the slowest queues. No Bedrock; a plain summary table. Memory: 256 MB.
  • summary — EventBridge Scheduler target, monthly on the first Monday at 9am. Reads the past month’s tr-routes, tr-corrections, and tr-audit; calls Bedrock Haiku 4.5 to write a one-paragraph narrative (busiest topics, slowest queues, what the corrections taught); emails it via SES to the configured stakeholder list. Memory: 512 MB.

Storage

  • DynamoDB · tr-tickets — one row per ticket. PK ticket_id; attributes: customer, source (mail/form/chat), subject, body, topic, urgency, tone, status, received_at. GSI on (customer, topic) for the duplicate check. On-demand.
  • DynamoDB · tr-routes — one row per routing decision. PK ticket_id; attributes: topic, urgency, tone, team, move (route/priority/escalate/hold), queue_pos, dispatched_via, decided_at. DynamoDB Streams enabled to trigger dispatch. On-demand.
  • DynamoDB · tr-corrections — one row per human correction. PK (ticket_id, ts); attributes: action (reassign/bump/split), orig_topic, orig_team, new_topic, new_team, by_user. This table feeds the labelled-example refresh and the correction-rate metric. On-demand.
  • DynamoDB · tr-audit — one row per write action of any kind. PK (ticket_id, ts); attributes: action, by_user, before, after. On-demand. No TTL — this is the long-term audit trail.
  • S3 · tr-raw-mail — raw inbound MIME from the email lane. Lifecycle to Glacier at 30 days; expiry at 1 year.
  • S3 · tr-rules-source — mirrored rules.csv, voice.txt, and examples.jsonl. Versioning enabled so a bad rules edit or example flood can be rolled back in one click.

SQS

  • tr-intake — standard queue between the three intake lanes and the read Lambda. Visibility timeout 60 s (6× the read function timeout). Absorbs bursts so a spike of tickets never overruns Bedrock’s rate limit.
  • tr-intake-dlq — dead-letter queue, maxReceiveCount 3. A ticket that fails the read three times (malformed body, model error) lands here and pages the on-call admin instead of silently disappearing. Redrive back to tr-intake once the cause is fixed.

Bedrock

  • Foundation model. anthropic.claude-haiku-4-5-20251001-v1:0 via the Global cross-Region inference profile global.anthropic.claude-haiku-4-5-20251001-v1:0. Two callsites: read for the per-ticket topic/urgency/tone classification, and summary for the monthly narrative. Sonnet 4.6 is not used — classification is well within Haiku’s reach, and a heavier model on the hot path would multiply the dominant cost for no gain.
  • Embeddings. Not used. Routing is a fresh read plus a sheet lookup; deterministic mapping beats vector retrieval here. No Knowledge Base, no S3 Vectors.
  • Quotas. Default account quotas cover SMB volume comfortably. SQS in front of read smooths bursts so the per-ticket calls stay under the on-demand throughput limit.

SES inbound and outbound

  • Set the MX record on a dedicated subdomain (e.g. support.your-company.com) to inbound-smtp.ap-southeast-1.amazonaws.com.
  • SES inbound rule set tr-inbound-rules: one rule with recipient support@your-company.com → spam scan → S3 PUT to s3://tr-raw-mail/<message-id> → stop. The S3 PUT triggers intake-mail.
  • SES outbound for the email-fallback hand-offs and the monthly summary: verify a sender identity at router@your-company.com with DKIM and SPF on the parent domain. Out of sandbox by request.

IAM (least privilege per Lambda)

Each Lambda has its own role with policies scoped to exact ARNs. Sketch:

  • read role: sqs:ReceiveMessage + DeleteMessage on tr-intake; bedrock:InvokeModel on the Haiku ARN; s3:GetObject on the rules and examples keys; dynamodb:UpdateItem on tr-tickets.
  • router role: s3:GetObject on rules.csv and voice.txt; dynamodb:GetItem on tr-tickets; dynamodb:PutItem on tr-routes. No bedrock:*.
  • dispatch role: dynamodb:GetRecords on the tr-routes stream; dynamodb:Query on the tr-tickets GSI for the duplicate check; secretsmanager:GetSecretValue on the Slack bot token; ses:SendRawEmail from the verified sender; outbound network to slack.com.
  • correct-handler role: dynamodb:UpdateItem on tr-tickets and tr-routes; dynamodb:PutItem on tr-corrections and tr-audit; s3:GetObject + PutObject on examples.jsonl; sqs:SendMessage on tr-intake (for split). Verifies the Slack signing secret on every request.
  • intake-* roles: s3:GetObject on tr-raw-mail (mail only); dynamodb:PutItem + Query on tr-tickets; sqs:SendMessage on tr-intake; the secret for the lane’s shared secret or signature.
  • drive-sync role: secretsmanager:GetSecretValue on the Google service-account secret; s3:PutObject on tr-rules-source; outbound network to www.googleapis.com.

Slack interactive flow

Hand-off cards are posted via the chat.postMessage Web API with Block Kit blocks containing the action buttons (Reassign, Bump, Split). Button clicks are sent by Slack to the configured Interactivity request URL, which is the correct-handler Function URL. correct-handler verifies the Slack signing secret on the inbound request, parses the action_id (reassign, bump, split), opens a menu or modal where needed (Reassign opens a team menu; Split opens a two-field modal; Bump is one-tap), and processes the response on submit.

The Slack app needs chat:write and the Interactivity URL configured. The bot token lives in Secrets Manager under tr/slack/bot-token; the signing secret under tr/slack/signing-secret.

Observability and cost gates

  • CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on "error" + "throttle" + "timeout" to a CloudWatch metric for alerting.
  • Alarms: tr-intake-dlq depth > 0 (a ticket failed to read three times); read Bedrock throttle rate > 1% in 24h; dispatch failure rate > 1% in 24h; correct-handler signature-verification failures > 5/hour (might mean the Slack secret rotated).
  • Custom metric: correction rate — tr-corrections writes over tr-routes writes — tracked weekly. A rising correction rate on a topic means the examples for it need a refresh or the rules sheet needs a new row.
  • X-Ray: off by default. Not worth the cost at SMB volume.
  • AWS Budgets: $15/month threshold for a typical SMB, alarm at 80% and 100%, posts to SNS topic tr-cost-alarm subscribed to the on-call admin’s email and Slack. Raise the ceiling to match higher steady volume.

Config and secrets

Service-account credentials for the Sheets and Docs APIs live in Secrets Manager under tr/drive/sa. Slack bot token and signing secret under tr/slack/*. The web-form and chat lane secrets under tr/form/secret and tr/chat/secret. SES sender identity lives in IAM and the verified-domain config. The topic list, the VIP list reference, the priority rules, and the admin fallback team all live in Parameter Store under /tr/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.

Deploy

GitHub Actions with OIDC into a deploy role (no long-lived keys) running AWS SAM. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for tr-rules-source so a bad rules edit or example flood can be rolled back in one click, and keep the DLQ alarm wired before going live so a read failure pages someone instead of vanishing. Total deployable surface: around ten Lambdas, four DDB tables, two S3 buckets, two SQS queues (one DLQ), one SES rule set, and one Budgets alarm.

That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.

All posts