How an application gets read

Key takeaways

Three intake lanes feed one queue: an apply inbox, a careers-form upload, and a job-board export.
Word and text resumes read straight through; scanned PDFs go through Textract to pull the words off the page.
Personal fields — name, age, gender, photo, address, school — are stripped before the resume is scored.
The original resume is always kept for the human; only the stripped text is sent to the reader.
Every application lands in one queue, so “why was this scored that way?” has one place to look.

Three lanes into one queue

Fig 2. Three lanes converge on one queue. The apply inbox, the careers form, and the job-board export all feed the same place. Each resume is turned into plain text and the personal fields are stripped before scoring; the original is kept in S3 for the human.

Lane 1: the apply inbox

Set up a dedicated address — something like apply@your-company.com — through Amazon SES. A candidate emails their resume to it. SES writes the raw message to s3://as-raw-mail/. The S3 write triggers the intake Lambda, which walks the message to the resume attachment and turns it into plain text. A Word file or a text resume reads straight through. A PDF that was made from a document reads directly too. A PDF that’s really a scanned photo of a printed page goes through Amazon Textract first — Textract reads the words off the image so the resume becomes text the reader can work with.

This lane is the one most candidates use without being told to. People email resumes. Meeting them where they already are means you don’t lose applicants to a form they didn’t feel like filling out.

Lane 2: the careers form (the tidy one)

Your careers page has an upload box: the candidate picks the role, attaches a resume, and submits. The file is sent straight to s3://as-uploads/ along with the role they applied for and the date. The same intake Lambda runs over it — same to-text step, same Textract fallback for scanned PDFs. The form lane gives you cleaner data than the inbox (you know exactly which role they applied for, because they picked it), which is why it’s worth offering even though the inbox alone would work.

Nothing about the form judges the candidate. It just collects the resume and the role. Everything that follows — the stripping, the reading, the scoring — is the same no matter which lane the resume came in through.

Lane 3: the job-board export

If you post the role to a job board, candidates apply there and the board collects their resumes. Most boards let you export those applications — either as a download or to a folder. A small sync drops each exported resume into s3://as-jobboard/ on a schedule, and the same intake flow turns each one into text and adds it to the queue. This lane saves you from copying resumes by hand out of a job board’s dashboard, which is exactly the kind of dull task that gets skipped when the pile is large.

Job-board exports are the messiest lane — formats vary, and some boards strip the resume down to their own template. The intake handles what it can and flags anything it can’t read cleanly for a human to look at, rather than guessing.

Stripping the personal fields, before anything is scored

This is the step that makes the whole thing fair, so it happens before the reader ever sees the resume. The rubric names the fields to remove: name, age or date of birth, gender, photo, home address, and the name of the school the candidate attended. The intake finds and blanks those out of the text it passes on. What reaches the reader is the work history, the skills, the certifications, the dates — the things the job actually depends on.

Two things are worth saying plainly. First, the original resume is never changed; the full version is kept in S3 for the human, who will of course see the name when they decide to interview. Stripping only affects the copy that gets scored. Second, no stripper is perfect — a name might appear in an email address inside the resume body. The point isn’t a flawless scrub; it’s removing the obvious signals that have nothing to do with the job, so the reader can’t lean on them. Where the screener is unsure, it strips more rather than less.

Next post: how the reader checks each must-have against the stripped text, and how plain Python turns that into a yes, a maybe, or a no using your pass marks.

All posts