Engineering reference: the shift scheduler architecture
Same system, drawn for engineers. Region, service names, resource identifiers, Bedrock model IDs, Lambda inventory, IAM scopes, the SES inbound rule set, EventBridge Scheduler config, the DynamoDB schemas, and the Slack interactive flow. Read alongside the previous six posts; this one’s the build sheet.
Region and account shape
Default region: ap-southeast-1 (Singapore). SES inbound, Bedrock cross-Region inference, and EventBridge Scheduler are all in good shape there. A second region for multi-region resilience isn’t worth the extra setup work at small-team volume — the failure mode for a small team is a manager publishing a rota a few hours late, not a regional outage. One AWS account dedicated to the scheduler (separate from your other workloads) keeps the IAM blast radius small and lets a single AWS Budgets alarm cover the whole system.
Topology
Lambda functions
All Lambdas use the arm64 architecture, the smallest memory size that meets latency targets (typically 256 MB), Python 3.14 runtime, and CloudWatch Logs at 7-day retention. Each function has its own least-privilege IAM role. None run inside a VPC.
drive-sync— EventBridge Scheduler target, fires every 15 minutes. Uses the Google Drive API + Sheets API (service-account credentials in Secrets Manager underss/drive/sa) to export the roster sheet as CSV and write tos3://ss-roster-source/roster.csvonly if the sheet has changed since the last sync. Same pattern syncs the rules and voice docs tos3://ss-rules-source/. Memory: 256 MB. Timeout: 30 s.template-sync— EventBridge Scheduler target, weekly (a few hours before the draft). Copies the standing weekly pattern tab into next week’s tab via the Sheets API, then checks the result against approved time-off; any clash becomes a Slack interactive proposal for the manager. Memory: 256 MB. Timeout: 30 s.intake-timeoff— S3 PUT trigger ons3://ss-raw-mime/. Parses MIME, extracts the note text, and calls Bedrock Haiku 4.5 (anthropic.claude-haiku-4-5-20251001-v1:0viaglobal.anthropic.claude-haiku-4-5-20251001-v1:0) to read the plain-English request into{start_date, end_date, reason}, resolving relative dates against the configured timezone. Posts the proposal to the manager’s Slack with Approve/Edit/Decline buttons. No Textract — the notes are plain text, not documents. Memory: 512 MB. Timeout: 30 s.drafter— EventBridge Scheduler target, weekly on Thursday at 2pm local time (the schedule expression runs inTZ_NAMEset to the team’s timezone, e.g.Asia/Singapore). Readss3://ss-roster-source/roster.csvand the rules and voice docs. Sorts shifts, lists qualified-and-available candidates per shift, ranks by hours-below-target, places each, and tracks running hours inss-hours. Emits the assembled draft plus one event per flagged shift:ss.short_staffedorss.held; the whole draft goes to the manager asss.draft_ready. Memory: 512 MB. Timeout: 60 s. No Bedrock calls.publish— EventBridge rule on the manager’s approval event. Splits the rota per person, checks quiet hours, formats each person’s own shifts from the voice template, attaches calendar invites, and ships via Slack incoming webhook (ss/slack/webhookin Secrets Manager) or SESSendRawEmail. On a quiet-hours defer, creates a one-off EventBridge Scheduler rule that re-invokespublishat the next reasonable minute. Writes rows toss-shiftsafter a successful send. Memory: 256 MB. Timeout: 30 s.action-handler— Lambda Function URL, public withAuthType: NONE; verifies a Slack signature on the request body. Triggered by Slack interactive clicks (Approve/Edit/Re-draft and the swap actions Cover/Drop/Time-off) and by email-link clicks. On approve, fires the publish event. On a swap, reuses the drafter’s candidate logic to propose a replacement back to the manager, then on the manager’s yes updatesss-shiftsandss-hours. Writes toss-auditon every action. Memory: 256 MB. Timeout: 15 s.digest— EventBridge Scheduler target, weekly Sunday 6pm. Readsss-shiftsfor the coming week; sends each person a short reminder of their upcoming shifts and the manager a heads-up on any still-open shifts. No Bedrock; the message is a plain summary table. Memory: 256 MB.summary— EventBridge Scheduler target, weekly Friday 5pm. Reads the week’sss-hoursandss-audit; calls Bedrock Haiku 4.5 to write a short fairness narrative (each person’s hours against target, any drift); posts it to the manager’s Slack. Memory: 512 MB.
Storage
- DynamoDB ·
ss-shifts— one row per published shift. PK(week_id, shift_id); attributes:day,start,end,role,assigned_to,status(filled/open/held),dispatched_via(slack/email). On-demand. No TTL. - DynamoDB ·
ss-hours— running placed-hours per person per week. PKperson_id; sort keyweek_id; attributes:hours_placed,hours_target,cap. On-demand. - DynamoDB ·
ss-audit— one row per write action of any kind. PK(shift_id, ts); attributes:action(drafted/approved/swapped/dropped/timeoff),by_user,before,after. On-demand. No TTL — this is the long-term audit trail. - DynamoDB ·
ss-shifts-archive— archived weeks after they pass. Same shape asss-shifts; PK(week_id, shift_id). On-demand. - S3 ·
ss-roster-source— mirrored CSV from the Drive roster sheet. Versioning enabled. Lifecycle to Glacier at 90 days; expiry at 3 years. - S3 ·
ss-rules-source— mirrored rules and voice docs as plain text. Versioning enabled. - S3 ·
ss-raw-mime— raw inbound MIME from forwarded time-off notes. Lifecycle to Glacier at 30 days; expiry at 3 years. - S3 ·
ss-published— a snapshot of each approved weekly rota as published, kept for reference and for re-sending a person their week on request.
Bedrock
- Foundation model.
anthropic.claude-haiku-4-5-20251001-v1:0via the Global cross-Region inference profileglobal.anthropic.claude-haiku-4-5-20251001-v1:0. Two callsites:intake-timeofffor reading plain-English time-off notes, andsummaryfor the weekly fairness narrative. The heavieranthropic.claude-sonnet-4-6profile is wired but unused by default — the tasks here are light enough for Haiku, and a model isn’t on the hot path at all. - Embeddings. Not used. The roster is structured rows; rule-based matching beats vector retrieval here. No Knowledge Base, no S3 Vectors.
- Quotas. Default account quotas are more than enough at small-team volume. The drafter itself doesn’t call Bedrock; the time-off lane fires a few times a week at most.
EventBridge Scheduler config
ss-weekly-draft—cron(0 14 ? * 5 *)(Thursday 2pm) in the team’s timezone. Target:drafterLambda.ss-drive-sync—rate(15 minutes). Target:drive-syncLambda.ss-template-sync—cron(0 10 ? * 5 *)(Thursday 10am, before the draft) in TZ. Target:template-syncLambda.ss-weekly-digest—cron(0 18 ? * SUN *)in TZ. Target:digestLambda.ss-weekly-summary—cron(0 17 ? * 6 *)(Friday 5pm) in TZ. Target:summaryLambda.- One-off rules — created on the fly by
publishwhen a quiet-hours defer is needed. Useat(YYYY-MM-DDTHH:MM:SS)expressions with--action-after-completion DELETEso the rule self-cleans.
SES inbound and outbound
- Set the MX record on a dedicated subdomain (e.g.
timeoff.your-company.com) toinbound-smtp.ap-southeast-1.amazonaws.com. - SES inbound rule set
ss-inbound-rules: one rule with recipienttimeoff@your-company.com→ spam scan → S3 PUT tos3://ss-raw-mime/<message-id>→ stop. The S3 PUT triggersintake-timeoff. - SES outbound for the email-fallback schedules: verify a sender identity at
rota@your-company.comwith DKIM and SPF on the parent domain. Out of sandbox by request.
IAM (least privilege per Lambda)
Each Lambda has its own role with policies scoped to exact ARNs. Sketch:
- drafter role:
s3:GetObjecton the roster, rules, and voice keys;dynamodb:Query+GetItem+PutItemonss-hours;events:PutEventson the default bus. Nobedrock:*. - publish role:
events:ListSchedules+CreateSchedulefor the deferred-publish one-offs;secretsmanager:GetSecretValueon the Slack webhook secret;ses:SendRawEmailfrom the verified sender identity;dynamodb:PutItemonss-shifts; outbound network access tohooks.slack.com. - action-handler role:
dynamodb:PutItemonss-shifts,ss-hours, andss-audit;secretsmanager:GetSecretValueon the Sheets-API service-account secret; outbound network access tosheets.googleapis.com;dynamodb:Queryfor candidate lookup;dynamodb:BatchWriteItemfor archiving a passed week toss-shifts-archive. - intake-timeoff role:
s3:GetObjectonss-raw-mime;bedrock:InvokeModelon the Haiku ARN;secretsmanager:GetSecretValueon the Slack webhook. - drive-sync and template-sync roles:
secretsmanager:GetSecretValueon the Google service-account secret;s3:PutObjecton the roster and rules buckets; outbound network towww.googleapis.com.
Slack interactive flow
The Slack incoming webhook is the simplest delivery surface but doesn’t support interactive button responses. So the manager-facing messages and the per-person schedules are posted via the chat.postMessage Web API instead, with Block Kit blocks containing the action buttons. Button clicks are sent by Slack to the configured Interactivity request URL, which is the action-handler Function URL. action-handler verifies the Slack signing secret on the inbound request, parses the action_id (approve, edit, redraft, cover, drop, timeoff), opens a modal if needed (Edit and Cover open modals; Approve is one-tap), and processes the response when the modal is submitted.
The Slack app needs chat:write, im:write, and the Interactivity URL configured. The bot token lives in Secrets Manager under ss/slack/bot-token. The signing secret is ss/slack/signing-secret.
Observability and cost gates
- CloudWatch Logs: all Lambdas, 7-day retention, structured JSON. Subscription filter on
"error"+"throttle"+"timeout"to a CloudWatch metric for alerting. - Alarms: drafter Lambda failures > 0 in a week (the weekly draft is the one piece that has to run); publish failure rate > 1% in 24h; action-handler signature-verification failures > 5/hour (might mean the Slack secret rotated).
- X-Ray: off by default. Not worth the cost at small-team volume.
- AWS Budgets: $15/month threshold, alarm at 80% and 100%, posts to SNS topic
ss-cost-alarmsubscribed to the on-call admin’s email and Slack.
Config and secrets
Service-account credentials for Drive, Sheets, and Calendar APIs all live in Secrets Manager under ss/drive/sa (one service account with scopes for all three APIs). Slack bot token, signing secret, and webhook URL all under ss/slack/*. SES sender identity lives in IAM and the verified-domain config. The configured timezone, quiet-hours window, default max-hours cap, rest gap, and admin fallback all live in Parameter Store under /ss/config/. Lambdas fetch config on cold start and cache for the lifetime of the execution environment.
Deploy
GitHub Actions with OIDC into a deploy role (no long-lived keys) and AWS SAM. The opinionated bits: deploy the SES rule set as a separate stack (rule-set changes affect mail flow), turn on S3 versioning for both ss-roster-source and ss-rules-source so a bad Drive edit can be rolled back in one click, and version the EventBridge Scheduler timezone setting so you don’t accidentally start running the weekly draft in UTC after a CI rotation. SAM fits this surface well; CDK with a Python stack file also works. Total deployable surface: around eight Lambdas, four DDB tables, four S3 buckets, one EventBridge rule on the default bus (plus the Scheduler rules), one SES rule set, and one Budgets alarm.
That’s the full system. Six narrative posts and this engineering reference. If you want to talk about adapting it for your business, see Work with me.
All posts