Part 2 of 7 · Weekly report builder series ~4 min read

How the numbers get gathered

The report is only as good as the numbers behind it. So the first job is reading every source cleanly and turning it into one set of weekly figures, without ever depending on a live system at 7am on a Monday. There are three kinds of source a small business usually has — a sheet someone keeps by hand, a CSV that a tool like Stripe or the bank produces, and a point-of-sale roll-up. Each gets mirrored to S3 on its own schedule, and then the gather step reads them all from one place.

Key takeaways

  • Three kinds of source feed one set of figures: a hand-kept sheet, a tool CSV, and a point-of-sale roll-up.
  • A small source-sync Lambda mirrors each source to S3 on a schedule, so the Monday run never hits a live API.
  • The gather step reads everything from S3 and normalizes it into one clean shape.
  • It computes this week, last week, and the four-week average, so each figure arrives with its comparison.
  • S3 versioning means a bad source edit can be rolled back in one click.

Three sources into one set of figures

Three kinds of source funnel into one figure set A diagram with three vertical source columns at the top and a single unified row at the bottom. Source one, Hand-kept sheet: someone updates a Google Sheet by hand with sales or customer rows; the source-sync Lambda exports it to S3 on a schedule, and the gather step reads from there. Source two, Tool CSV: a tool like Stripe or the bank drops a CSV in a Drive folder; source-sync mirrors the file to S3 when it changes; the gather step parses the columns named in the config doc, with no model involved. Source three, Point-of-sale roll-up: the POS produces a daily summary file; source-sync pulls it to S3 on a schedule; the gather step reads the daily totals and the top-selling items. All three sources converge into one normalized figure set, which the gather step turns into this week's numbers plus last week and the four-week average. A note at the bottom: each source is mirrored to S3 first, so the Monday run never depends on a live system being up at 7am. Source 1 · by hand Hand-kept sheet • Someone updates a Google Sheet • source-sync exports to S3 on schedule • Gather reads from S3 • Columns named in config doc Source 2 · CSV export Tool CSV • Stripe or bank drops a CSV • source-sync mirrors to S3 on change • Gather parses the named columns • Plain Python, no model Source 3 · daily roll-up Point-of-sale • POS writes a daily summary file • source-sync pulls it to S3 on schedule • Gather reads totals and top items • Same figure shape as the others One normalized figure set (this week) sales · orders · new customers · cash in · cash out · refunds · top items plus last week and the four-week average, computed alongside — plain Python to writer, weekly run Each source is mirrored to S3 first — the Monday run never depends on a live system being up at 7am.
Fig 2. Three kinds of source converge into one figure set. Each is mirrored to S3 on its own schedule; the gather step reads everything from S3 and normalizes it into one shape, with last week and the four-week average computed alongside this week.

Source 1: the hand-kept sheet

The most common source in a small business is a sheet somebody keeps by hand — the sales log, the new-customer list, the running tally of jobs done. The config doc names the sheet, the tab, and which columns hold which figure: which column is the amount, which is the date, which marks a new customer. A small Lambda — source-sync — runs on a schedule, exports the sheet as plain CSV via the Google Sheets API, and writes it to s3://wr-source-data/sales-sheet.csv only if the sheet has changed since the last sync. The gather step reads from S3, not Drive directly. That keeps Sheets API calls predictable and gives you S3 versioning for free, so a bad bulk-edit on Friday can be rolled back in one click on Monday.

This covers the figures a person types in as the week goes on. The builder never edits the sheet — it only reads a copy.

Source 2: the tool CSV (the one most owners already have)

Almost every business has at least one tool that already produces a clean export: Stripe for card payments, the bank for the account statement, the invoicing app for what was billed and what was paid. These arrive as CSV files. You set up the tool (or a tiny scheduled export) to drop the file in a Drive folder, and source-sync mirrors it to s3://wr-source-data/ whenever it changes. The gather step parses the columns named in the config doc — amount, date, customer, status — and pulls only the figures the report needs. No model reads the CSV; it’s plain column-by-column Python, which is exactly what you want for money figures: predictable, checkable, and the same every run.

Because these exports are the system of record for cash, the builder treats them as authoritative for the cash-in and cash-out figures. If the sales sheet and the Stripe export disagree on a total, that disagreement is itself something the checks in Part 5 will surface — rather than the builder silently picking one.

Source 3: the point-of-sale roll-up

A shop or a café has a point-of-sale system that already knows the daily totals and the best-selling items. Most of them can write a daily summary — a small file with the day’s sales, order count, and top products. The config doc points source-sync at wherever that file lands, and it’s pulled to S3 on a schedule like the others. The gather step reads the daily totals for the week and the top-item list, so the report can say not just “sales were up” but “the lunch special drove most of it.”

Point-of-sale is the most optional of the three. A business without one loses nothing; a shop that has one gets the top-items line in the report, which is often the most useful sentence in it.

Why everything funnels into one figure set

Three sources in, but only one set of figures the writer ever sees. That’s deliberate. If the writer had to reach into three different files in three different shapes, every “where did this number come from?” question would mean checking three places and three formats. Normalizing everything into one figure set — with a fixed list of fields and a fixed shape — means there is exactly one place the report’s numbers come from, and exactly one place to look when a figure seems wrong. The gather step does the messy reading; everything downstream works from one clean set.

Next post: how the weekly report gets written — how plain Python computes every comparison first, and how exactly one Bedrock call turns those real numbers into a short paragraph.

All posts