Part 7 of 7 · Renewal negotiator series ~8 min read

Engineering reference: the renewal negotiator architecture

Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the SES inbound rule set, EventBridge Scheduler config, the DynamoDB schemas, and the Slack interactive flow. Read alongside the previous six posts; this one’s the build sheet.

Region and account shape

Default region: ap-southeast-1 (Singapore). SES inbound, Bedrock cross-Region inference, and EventBridge Scheduler are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at SMB volume — the failure mode for an SMB is a renewal offer not getting drafted on time, not a regional outage. One AWS account dedicated to the negotiator (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.

Topology

AWS topology of the renewal negotiator A topology diagram with three regions stacked vertically inside one AWS account boundary. Top region: ingress. Three boxes show the three intake lanes — a Drive sheet sync via the drive-sync Lambda triggered every 15 minutes by EventBridge Scheduler that mirrors the registry CSV to s3://rn-registry-source/, an SES inbound rule set with action S3 PUT to s3://rn-raw-mime/ plus the parser Lambda intake-ses-parser that runs Textract on contract PDFs and Bedrock Haiku 4.5 to propose an account row for Slack approval, and a calendar-sync Lambda triggered hourly by EventBridge Scheduler that polls Google Calendars for events tagged hashtag-renews and proposes rows the same way. Middle region: scheduled processing. The drafter Lambda is triggered daily by EventBridge Scheduler; it reads s3://rn-registry-source/registry.csv, iterates rows, computes days_to_renewal per account, looks up the plan menu and caps in s3://rn-rules-source/rules.txt, reads offer state from DynamoDB, picks the plan and discount with plain Python, calls Bedrock to write the offer, re-checks the discount against the cap, and emits one of three events to the EventBridge default bus per account needing a draft: rn.upgrade, rn.loyalty, or rn.right_size. Bottom region: queue and approval. The queue Lambda is triggered by an EventBridge rule on those three event types; it resolves the owner, re-checks the cap, checks quiet hours, fetches the voice template from s3://rn-rules-source/voice.txt, writes the queue card to DynamoDB rn-queue, and notifies the owner via Slack with Approve, Edit, and Skip buttons or via SES outbound. Slack interactive button clicks land on a Function URL Lambda action-handler that, on approve or a saved edit, sends the offer to the customer via SES outbound and updates the registry sheet via the Google Sheets API; on skip, records the reason. CloudWatch Logs collects from every Lambda at 7-day retention. Across the right edge: a small box labelled AWS Budgets alarm at $20 monthly threshold, posting to SNS topic rn-cost-alarm. A note at the bottom: only a human send reaches the customer — and every interaction is logged to rn-audit. Ingress Lambda · drive-sync every 15 min Sheets API → s3://rn-registry-source/ registry.csv SES inbound rule set rn-inbound-rules action: S3 PUT s3://rn-raw-mime/ trigger: intake-ses-parser Lambda · calendar-sync hourly poll Calendar API for events tagged #renews → Slack proposal Drive account registry canonical store · mirrored to S3 Scheduled processing EventBridge Scheduler cron(0 8 * * ? *) in TZ_NAME target: drafter Lambda + deferred one-offs Lambda · drafter reads CSV from S3 + rules.txt + voice.txt picks plan + discount, Bedrock writes the offer EventBridge default bus rn.upgrade rn.loyalty rn.right_size (no offer → no event) Queue & approval Lambda · queue resolves owner, cap + quiet hours; writes rn-queue, notifies owner Slack interactive DM with [Approve] [Edit] [Skip] button clicks → Function URL Lambda · action-handler on send: SES to customer; updates Sheet via Sheets API; writes rn-audit Only a human send reaches the customer — and every interaction is logged to rn-audit.
Fig 7. AWS topology, in three regions of the diagram: ingress (three lanes into the registry), scheduled processing (the daily drafter emitting draft events), queue and approval (the offer waits, then a human send reaches the customer). Every Lambda is event- or schedule-driven; nothing is synchronous-chained.

Lambda functions

All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.

  • drive-sync — EventBridge Scheduler target, fires every 15 minutes. Uses the Google Drive API + Sheets API (service-account credentials in Secrets Manager under rn/drive/sa) to export the registry sheet as CSV and write to s3://rn-registry-source/registry.csv only if the sheet has changed since the last sync. Same pattern syncs the rules and voice docs to s3://rn-rules-source/. Memory: 256 MB. Timeout: 30 s.
  • calendar-sync — EventBridge Scheduler target, hourly. Uses the Google Calendar API events.list to scan configured calendars for events with #renews in the description; for any new events, creates a Slack interactive proposal message. For lower-latency setups you can switch to events.watch and have Calendar push notifications to a Function URL instead of polling, at the cost of renewing the channel before it expires (Calendar push channels have a finite TTL and need a small refresh job). Memory: 256 MB. Timeout: 30 s.
  • intake-ses-parser — S3 PUT trigger on s3://rn-raw-mime/. Parses MIME, extracts the attachment. For PDFs, runs Textract via StartDocumentTextDetection + StartDocumentAnalysis (asynchronously to handle multi-page contracts); on Textract completion (via SNS notification), reads the structured text and calls Bedrock Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0 via global.anthropic.claude-haiku-4-5-20251001-v1:0) to propose an account row. For CSV or XLSX usage exports (Textract isn’t needed), reads rows directly with the CSV module or openpyxl. Posts the proposal to Slack via the incoming webhook with Approve/Edit/Discard buttons. Both packages are stable and widely used in 2026, though their maintenance velocity is light — acceptable for a path that runs a few times a month. Memory: 512 MB. Timeout: 60 s.
  • drafter — EventBridge Scheduler target, daily at 8am local time (the schedule expression runs in TZ_NAME set to the SMB’s timezone, e.g. Asia/Singapore). Reads s3://rn-registry-source/registry.csv and the rules and voice docs. For each row, computes days_to_renewal; for accounts inside the prepare-ahead window, reads offer state from rn-offers, picks the plan and discount band with plain Python from the rules, calls Bedrock to write the offer, then re-checks the discount against the cap and floor. Emits one event per account needing a draft: rn.upgrade, rn.loyalty, or rn.right_size, with the account context and the fixed plan/discount as the event payload. No-offer accounts emit nothing. Memory: 512 MB. Timeout: 120 s. Bedrock callsite.
  • queue — EventBridge rule on the three draft events. Resolves owner, re-checks the cap and floor, checks quiet hours, formats the queue card from the voice template, writes it to rn-queue, and notifies the owner via Slack chat.postMessage (rn/slack/bot-token in Secrets Manager) or SES SendRawEmail. On a quiet-hours defer, creates a one-off EventBridge Scheduler rule that re-invokes the owner notification at the next available business minute; the card itself is queued immediately. Writes a row to rn-offers marking the account drafted-and-waiting. Memory: 256 MB. Timeout: 30 s. No Bedrock; no customer contact.
  • action-handler — Lambda Function URL, public with AuthType: NONE; verifies a Slack signature on the request body. Triggered by Slack interactive button clicks (Approve/Edit/Skip) and by email-link clicks. On approve or a saved edit, sends the offer to the customer via SES SendRawEmail from the owner’s verified identity, updates the Drive sheet via the Sheets API (mark offered, plan, discount, last_offered), and writes rn-audit. An edit re-checks any discount change against the cap before sending. On skip, records the reason in rn-offers and rn-audit; not-ready schedules a re-draft via a one-off rule. Memory: 256 MB. Timeout: 15 s.
  • digest — EventBridge Scheduler target, weekly Sunday 6pm. Reads rn-offers and rn-queue for the past week and the registry; sends a digest message to a configured Slack channel summarizing offers sent, accepted, skipped, and renewals coming up. No Bedrock; the message is a plain summary table. Memory: 256 MB.
  • summary — EventBridge Scheduler target, monthly on the first Monday at 9am. Reads the past month’s rn-offers and rn-audit; calls Bedrock Haiku 4.5 to write a one-paragraph board narrative on renewals offered, accepted, and lost; emails it via SES to the configured stakeholder list. Memory: 512 MB.

Storage

  • DynamoDB · rn-offers — one row per account per cycle. PK (account_id, cycle_id); attributes: state (drafted/queued/sent/skipped), plan, discount, drafted_at, skip_reason (if skipped). On-demand. No TTL.
  • DynamoDB · rn-queue — one row per queued draft awaiting the owner. PK account_id; attributes: card_json, owner, email_body, plan, discount, queued_at. On-demand. Items deleted on send or skip.
  • DynamoDB · rn-audit — one row per write action of any kind. PK (account_id, ts); attributes: action (sent/edited/skipped/undo), by_user, before, after. On-demand. No TTL — this is the long-term audit trail.
  • DynamoDB · rn-state — small per-account scheduling state (last sync hash, next re-draft time for not-ready skips). PK account_id. On-demand.
  • S3 · rn-registry-source — mirrored CSV from the Drive registry sheet. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 7 years.
  • S3 · rn-rules-source — mirrored rules and voice docs as plain text. Versioning enabled.
  • S3 · rn-raw-mime — raw inbound MIME from forwarded contracts and usage exports. Lifecycle to Glacier at 30 days; expiry at 7 years.
  • S3 · rn-source-files — parsed source contracts and exports after the inbound parser handles them, kept for reference if the registry row links to one.

Bedrock

  • Foundation models. anthropic.claude-haiku-4-5-20251001-v1:0 via the Global cross-Region inference profile global.anthropic.claude-haiku-4-5-20251001-v1:0 for the inbound parsing, the per-renewal draft, and the monthly summary. For accounts flagged tricky (long negotiation history, sensitive note), the drafter escalates to anthropic.claude-sonnet-4-6-20250930-v1:0 via global.anthropic.claude-sonnet-4-6-20250930-v1:0 for the heavier reasoning. The model receives the plan and discount as fixed inputs and writes prose only; it never sets a number.
  • Embeddings. Not used. The registry is structured rows; deterministic lookup beats vector retrieval here. No Knowledge Base, no S3 Vectors.
  • Quotas. Default account quotas are more than enough at SMB volume. The daily check doesn’t call Bedrock; drafts fire roughly one-twelfth of accounts per month.

EventBridge Scheduler config

  • rn-daily-checkcron(0 8 * * ? *) in the SMB’s timezone. Target: drafter Lambda.
  • rn-drive-syncrate(15 minutes). Target: drive-sync Lambda.
  • rn-calendar-syncrate(1 hour). Target: calendar-sync Lambda.
  • rn-weekly-digestcron(0 18 ? * SUN *) in TZ. Target: digest Lambda.
  • rn-monthly-summarycron(0 9 ? * 2#1 *) (first Monday at 9am) in TZ. Target: summary Lambda.
  • One-off rules — created on the fly by queue (deferred owner ping) and action-handler (re-draft after a not-ready skip). Use at(YYYY-MM-DDTHH:MM:SS) expressions with --action-after-completion DELETE so the rule self-cleans.

SES inbound and outbound

  • Set the MX record on a dedicated subdomain (e.g. renewals.your-company.com) to inbound-smtp.ap-southeast-1.amazonaws.com.
  • SES inbound rule set rn-inbound-rules: one rule with recipient renewals@your-company.com → spam scan → S3 PUT to s3://rn-raw-mime/<message-id> → stop. The S3 PUT triggers intake-ses-parser.
  • SES outbound for the offer emails and the email-fallback queue notifications: verify each owner’s sender identity (e.g. maria@your-company.com) with DKIM and SPF on the parent domain, so customer replies land in the owner’s real inbox. Out of sandbox by request.

IAM (least privilege per Lambda)

Each Lambda has its own role with policies scoped to exact ARNs. Sketch:

  • drafter role: s3:GetObject on the registry, rules, and voice keys; dynamodb:Query + GetItem on rn-offers, rn-state; bedrock:InvokeModel on the Haiku and Sonnet ARNs; events:PutEvents on the default bus.
  • queue role: events:ListSchedules + CreateSchedule for deferred owner pings; secretsmanager:GetSecretValue on the Slack bot token; ses:SendRawEmail for email-fallback notifications; dynamodb:PutItem on rn-queue, rn-offers; outbound network access to slack.com. No customer-facing send rights.
  • action-handler role: dynamodb:PutItem on rn-audit, rn-offers; dynamodb:DeleteItem on rn-queue; secretsmanager:GetSecretValue on the Sheets-API and Slack secrets; ses:SendRawEmail from the verified owner identities; outbound network access to sheets.googleapis.com; events:CreateSchedule for not-ready re-drafts.
  • intake-ses-parser role: s3:GetObject on rn-raw-mime; textract:StartDocumentTextDetection + StartDocumentAnalysis; bedrock:InvokeModel on the Haiku ARN; secretsmanager:GetSecretValue on the Slack bot token.
  • drive-sync and calendar-sync roles: secretsmanager:GetSecretValue on the Google service-account secret; s3:PutObject on the registry and rules buckets; outbound network to www.googleapis.com.

Slack interactive flow

The Slack incoming webhook is the simplest delivery surface but doesn’t support interactive button responses. So the queue cards are posted via the chat.postMessage Web API instead, with Block Kit blocks containing the action buttons. Button clicks are sent by Slack to the configured Interactivity request URL, which is the action-handler Function URL. action-handler verifies the Slack signing secret on the inbound request, parses the action_id (approve_send, edit, skip), opens a modal if needed (Edit and Skip open modals; Approve is one-tap), and processes the response when the modal is submitted. The Edit modal enforces the discount cap before allowing submit.

The Slack app needs chat:write, im:write, and the Interactivity URL configured. The bot token lives in Secrets Manager under rn/slack/bot-token. The signing secret is rn/slack/signing-secret.

Observability and cost gates

  • CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on "error" + "throttle" + "timeout" to a CloudWatch metric for alerting.
  • Alarms: drafter Lambda failures > 0 in a day (the daily check is the one piece that has to run); action-handler signature-verification failures > 5/hour (might mean the Slack secret rotated); any cap-breach hold logged by the queue Lambda (should be rare; a spike means a drafter bug).
  • X-Ray: off by default. Not worth the cost at SMB volume.
  • AWS Budgets: $20/month threshold, alarm at 80% and 100%, posts to SNS topic rn-cost-alarm subscribed to the on-call admin’s email and Slack.

Config and secrets

Service-account credentials for Drive, Sheets, and Calendar APIs all live in Secrets Manager under rn/drive/sa (one service account with scopes for all three APIs). Slack bot token and signing secret under rn/slack/*. SES sender identities live in IAM and the verified-domain config. The configured timezone, quiet-hours window, prepare-ahead window, per-tier discount caps and floor prices, and admin fallback owner all live in Parameter Store under /rn/config/ (the rules doc is the human-editable source; Parameter Store holds the deploy-time defaults). Lambdas fetch config on cold start and cache for the lifetime of the execution environment.

Deploy

GitHub Actions with OIDC into a deploy role (no long-lived keys) running AWS SAM. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for both rn-registry-source and rn-rules-source so a bad Drive edit can be rolled back in one click, and version the EventBridge Scheduler timezone setting so you don’t accidentally start running the daily check in UTC after a CI rotation. Total deployable surface: around eight Lambdas, four DDB tables, four S3 buckets, one EventBridge rule on the default bus (plus the Scheduler rules), one SES rule set, and one Budgets alarm.

That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.

All posts