Part 2 of 7 · Applicant screener series ~4 min read

How an application gets read

The screener can only screen what reaches its queue. So the first job is getting every application in, no matter how a candidate chose to apply. There are three lanes: somebody emails a resume to your apply address, somebody uploads one through your careers form, or a resume arrives from a job board you post to. Whatever the lane, the resume is turned into plain text and the personal fields that don’t belong in a fair screen are removed before any scoring happens. The candidate’s original is kept safe for the human; the reader only ever sees the stripped version.

Key takeaways

  • Three intake lanes feed one queue: an apply inbox, a careers-form upload, and a job-board export.
  • Word and text resumes read straight through; scanned PDFs go through Textract to pull the words off the page.
  • Personal fields — name, age, gender, photo, address, school — are stripped before the resume is scored.
  • The original resume is always kept for the human; only the stripped text is sent to the reader.
  • Every application lands in one queue, so “why was this scored that way?” has one place to look.

Three lanes into one queue

Three intake lanes funnel into one queue A diagram with three vertical lane columns at the top and a single unified row at the bottom. Lane one, Apply inbox: a candidate emails a resume to a dedicated address; SES writes the raw message to S3; an intake Lambda pulls the attachment, turns it into plain text (Textract for scanned PDFs), and adds it to the queue. Lane two, Careers form: a candidate uploads a resume through the careers page; the file lands directly in S3; the same intake Lambda turns it into text and adds it to the queue. Lane three, Job board: resumes exported from a job board you post to are dropped into an S3 folder on a schedule; the same intake flow runs over each one. All three lanes converge on the same queue. Before scoring, the intake strips the personal fields named in the rubric — name, age, gender, photo, address, school — so the reader sees work history and skills only. The original resume is kept untouched in S3 for the human. A note at the bottom: every application lands in one queue — the personal fields are stripped before scoring, the original is kept for the human. Lane 1 · SES inbox Apply inbox • Candidate emails a resume • SES writes message to S3 • Intake pulls the attachment to text • Added to the queue Lane 2 · upload Careers form • Candidate uploads on careers page • File lands in S3 directly • Same intake turns it into text • Added to the queue Lane 3 · export Job board • Job board exports resumes to S3 • Drop runs on a schedule • Same intake flow over each one • Added to the queue Application queue (one place for every resume) resume text · role · source · received-at · original file kept in S3 personal fields stripped before scoring — reader sees skills and history only to reader, one per app Every application lands in one queue — personal fields stripped before scoring, original kept for the human.
Fig 2. Three lanes converge on one queue. The apply inbox, the careers form, and the job-board export all feed the same place. Each resume is turned into plain text and the personal fields are stripped before scoring; the original is kept in S3 for the human.

Lane 1: the apply inbox

Set up a dedicated address — something like apply@your-company.com — through Amazon SES. A candidate emails their resume to it. SES writes the raw message to s3://as-raw-mail/. The S3 write triggers the intake Lambda, which walks the message to the resume attachment and turns it into plain text. A Word file or a text resume reads straight through. A PDF that was made from a document reads directly too. A PDF that’s really a scanned photo of a printed page goes through Amazon Textract first — Textract reads the words off the image so the resume becomes text the reader can work with.

This lane is the one most candidates use without being told to. People email resumes. Meeting them where they already are means you don’t lose applicants to a form they didn’t feel like filling out.

Lane 2: the careers form (the tidy one)

Your careers page has an upload box: the candidate picks the role, attaches a resume, and submits. The file is sent straight to s3://as-uploads/ along with the role they applied for and the date. The same intake Lambda runs over it — same to-text step, same Textract fallback for scanned PDFs. The form lane gives you cleaner data than the inbox (you know exactly which role they applied for, because they picked it), which is why it’s worth offering even though the inbox alone would work.

Nothing about the form judges the candidate. It just collects the resume and the role. Everything that follows — the stripping, the reading, the scoring — is the same no matter which lane the resume came in through.

Lane 3: the job-board export

If you post the role to a job board, candidates apply there and the board collects their resumes. Most boards let you export those applications — either as a download or to a folder. A small sync drops each exported resume into s3://as-jobboard/ on a schedule, and the same intake flow turns each one into text and adds it to the queue. This lane saves you from copying resumes by hand out of a job board’s dashboard, which is exactly the kind of dull task that gets skipped when the pile is large.

Job-board exports are the messiest lane — formats vary, and some boards strip the resume down to their own template. The intake handles what it can and flags anything it can’t read cleanly for a human to look at, rather than guessing.

Stripping the personal fields, before anything is scored

This is the step that makes the whole thing fair, so it happens before the reader ever sees the resume. The rubric names the fields to remove: name, age or date of birth, gender, photo, home address, and the name of the school the candidate attended. The intake finds and blanks those out of the text it passes on. What reaches the reader is the work history, the skills, the certifications, the dates — the things the job actually depends on.

Two things are worth saying plainly. First, the original resume is never changed; the full version is kept in S3 for the human, who will of course see the name when they decide to interview. Stripping only affects the copy that gets scored. Second, no stripper is perfect — a name might appear in an email address inside the resume body. The point isn’t a flawless scrub; it’s removing the obvious signals that have nothing to do with the job, so the reader can’t lean on them. Where the screener is unsure, it strips more rather than less.

Next post: how the reader checks each must-have against the stripped text, and how plain Python turns that into a yes, a maybe, or a no using your pass marks.

All posts