Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

$pwd:

extraction-check

Name: Extraction Check
Author: wcygan

// ATS text-extraction regression gate for the resume repo. Runs the compiled PDF through three open-source parsers (pdftotext, pdftotext -layout, Apache Tika) that real ATSs actually use, asserts clean extraction across nine structural checks, and surfaces disagreement between parsers as diagnostic signal. Use when the user asks about ATS parseability, text extraction, PDF parsing reliability, whether a resume variant breaks for ATSs, how the extraction gate works, running or debugging extraction-check, comparing extractor outputs, adding assertions, writing new broken fixtures, interpreting cross-extractor disagreement, or the text-extraction hypothesis at .claude/context/text-extraction-hypothesis.md. Keywords ATS, applicant tracking system, text extraction, PDF parsing, pdftotext, poppler, tika, Apache Tika, reading order, mojibake, cross-extractor, extraction check, regression gate, resume parser, parseability, soft hyphen, url dedup.

Exécuter dans Manus

$ git log --oneline --stat

stars:0

forks:0

updated:15 avril 2026 à 16:09

Explorateur de fichiers

6 fichiers

SKILL.md

readonly

name	extraction-check
description	ATS text-extraction regression gate for the resume repo. Runs the compiled PDF through three open-source parsers (pdftotext, pdftotext -layout, Apache Tika) that real ATSs actually use, asserts clean extraction across nine structural checks, and surfaces disagreement between parsers as diagnostic signal. Use when the user asks about ATS parseability, text extraction, PDF parsing reliability, whether a resume variant breaks for ATSs, how the extraction gate works, running or debugging extraction-check, comparing extractor outputs, adding assertions, writing new broken fixtures, interpreting cross-extractor disagreement, or the text-extraction hypothesis at .claude/context/text-extraction-hypothesis.md. Keywords ATS, applicant tracking system, text extraction, PDF parsing, pdftotext, poppler, tika, Apache Tika, reading order, mojibake, cross-extractor, extraction check, regression gate, resume parser, parseability, soft hyphen, url dedup.
allowed-tools	Read, Write, Edit, Grep, Glob, Bash

Extraction Check

Deterministic gate on will_cygan_resume.pdf. Shells to three open-source PDF-to-text parsers — pdftotext, pdftotext -layout, Apache Tika — that enterprise ATSs sit on, runs nine assertions on the output, and fails if any parser sees something structurally wrong.

The underlying hypothesis is in .claude/context/text-extraction-hypothesis.md: text-extraction fidelity is the single highest-leverage ATS-side optimization, and real-world ATSs virtually all run on the same handful of open-source parsers. Beating those parsers is equivalent to passing ATS parsing at the file-format level.

When to invoke this skill

"Is this resume change safe for ATSs?" / "Will this parse cleanly?"
"Why did extraction-check fail on [assertion X]?"
"Run the extraction check" / "Compare what each extractor sees"
"Add a new assertion / broken fixture / extractor"
"How does the gate work?"
"The CI Extraction Check job is failing — help me diagnose"
Any mention of pdftotext, poppler, Tika, mojibake, reading order, or cross-extractor divergence

Quickstart

just compile            # produce will_cygan_resume.pdf
just extraction-check   # run the 7-assertion gate on the compiled PDF
just test               # run the negative-fixture regression suite (6 pytest cases)

Prereqs on macOS: brew install poppler tika typst. CI installs pinned Tika 3.3.0.

The nine assertions

#	Name	What it checks
1	`1-non-empty`	Extracted text is >500 bytes (catches image-only or encoding-broken PDFs)
2	`2-section-order`	Declared section headers appear in the expected visual order
3	`3-name-contact`	Name and email are present, not glued together (ATS field-map footgun)
4	`4-job-contiguity`	Title / company / date-start / date-end co-occur within a 300-char window per job
5	`5-date-format`	At least one date range matches a consistent "Mon YYYY – …" format
6	`6-mojibake`	Zero replacement chars, no flagged smart-quote/em-dash, and no Private Use Area codepoints U+E000–U+F8FF (icon-font tofu from stray `fa-icon(...)` calls)
7	`7-cross-extractor`	All available extractors agree on section order and job count, and no two extractor outputs differ by more than 1.5× in byte count
8	`8-soft-hyphen`	No U+00AD soft hyphens in any extractor's output (breaks hyphenated words across paragraphs in Tika)
9	`9-keyword-roundtrip`	Every ATS-searchable keyword declared in `[keywords].required` survives extraction as an exact substring (guards against ligature collapse and font-substitution regressions)
10	`10-url-dedup`	Every URL that appears in extractor output appears exactly once (redundant title-links triple-emit in Tika's URL block)
11	`11-section-boundary`	Every adjacent pair of top-level section headers is separated by at least one blank line (prevents section-boundary parsers from merging sections)

Assertion 7 is the canonical reading-order-scramble signal. When pdftotext says 2 jobs but pdftotext -layout says 1, the layout has a bug real ATSs will hit. The byte-ratio extension catches quieter divergence where extractors agree on structure but one is emitting dramatically more or less text than the others.

Reading further

For anything deeper, read the relevant reference:

Architecture — script layout, data model, how evaluate_pdf orchestrates parallel extractors
Running checks — local commands, CI workflow, interpreting output
Comparing outputs — what each extractor tells you, resolving disagreement, manual cross-checks
Extending — adding an assertion, a new extractor, or a broken fixture; updating fixtures when the resume changes
Failure playbook — one entry per assertion: how failures look, root causes, how to fix

Ground rules

Do not silently loosen an assertion to make it pass. If the 300-char window fires on a legitimate layout, investigate the layout first. Adjust thresholds only with a commit-message reason.
Do not add a new extractor unless it represents a real ATS parser family. pypdf / pdfplumber are Python-native and don't match ATS behavior — see the hypothesis spec for why.
The real resume is the ground truth for fixtures. scripts/extraction-check.fixtures.toml is hand-maintained against will_cygan_resume.typ. When the resume gains a job or renames a section, update the TOML in the same commit.
Negative fixtures must fail cleanly. Each broken .typ exists to prove one assertion fires. If a fixture trips multiple assertions unintentionally, tighten or split the fixture before shipping.

related-skills.json

même dépôt

resume-panel.md

from "wcygan/resume"

Fan out a 12-persona review panel against will_cygan_resume.typ in parallel, with work-experience/*.md supplied as the source-of-truth evidence base, and produce a consolidated dashboard with consensus verdict, dissent highlights, and deduplicated top issues. Use when the user wants a comprehensive multi-angle resume review, asks for a "panel review", "full review", "multi-angle review", or wants every reviewer to weigh in at once. Covers ATS parsing, recruiter triage, hiring manager, technical screener, bar raiser, staff IC peer, leveling committee, skip-level exec, domain SME, typography, future-self skeptic, and career coach perspectives.

2026-04-190

resume-debate.md

from "wcygan/resume"

Structured 3-round debate about a single resume bullet. Runs hiring-manager, bar-raiser, and future-self-skeptic through opening critique, rebuttal, and rewrite rounds. Produces either a consensus rewrite or an articulated disagreement for the user to adjudicate. Accepts either the bullet text or a line number as $ARGUMENTS. Use when the user is uncertain about a specific bullet, asks "is this bullet good?", wants multiple perspectives on one line, or says a reviewer flagged a specific bullet.

2026-04-190

resume-interview-rehearsal.md

from "wcygan/resume"

Generate a ranked interview prep sheet from the resume. Three interviewer personas (hiring-manager, bar-raiser, domain-sme-systems) each generate 5 questions they would ask based on specific resume bullets. The questions are then deduplicated and ranked by difficulty tier (softball, standard, hard, trap). Includes a coverage audit flagging bullets no persona asked about. Optionally accepts a job description as $ARGUMENTS to target the questions for a specific role. Use when the user has an interview coming up, asks for interview prep, or wants to rehearse what they'd be asked about their resume.

2026-04-190

resume-optimizer.md

from "wcygan/resume"

Optimize engineering resumes using proven STAR/XYZ methodologies, ATS best practices, and hiring manager insights. Use when reviewing resumes, improving bullet points, tailoring to job descriptions, or enhancing professional presentation. Keywords: resume, CV, bullet points, STAR, XYZ, ATS, job description, optimize, tailor, action verbs, quantify, achievements

2026-04-190

resume-panel-focus.md

from "wcygan/resume"

Run a section-focused resume review panel. Routes to a curated subset of the 12 reviewer sub-agents based on the section argument (work-experience, skills, projects, formatting, narrative, or header). Faster and more targeted than a full panel — uses 3-6 relevant personas instead of all 12. Use when the user wants to review only a specific part of their resume, has just edited one section, or asks about one area like "the skills section" or "the formatting" or "my bullets".

2026-04-190

typst-vendor.md

from "wcygan/resume"

Vendor @preview Typst packages into template/ or bootstrap new in-house Typst templates from scratch. Use when the user asks to vendor, fork, copy, pin, localize, or maintain a Typst template; when a @preview package breaks on a newer typst version; when creating a new template.typ from scratch; or says things like "vendor modern-cv", "fork this typst package", "bring this template in-repo", "write our own resume template". Keywords typst, vendor, fork, @preview, template, modern-cv, fontawesome, linguify, typst.toml, lib.typ, MIT attribution, version drift.

2026-04-150

package.json

"author": "wcygan"

"repository": "wcygan/resume"

Ouvrir le dépôt GitHub Voir les dépôts du créateur

$ install --global

$ download --local

Exécuter dans Manus

$ useful --forSOC

Analystes en assurance qualité des logiciels et testeursProfessions informatiques et mathématiques15-1253L4

name	extraction-check
description	ATS text-extraction regression gate for the resume repo. Runs the compiled PDF through three open-source parsers (pdftotext, pdftotext -layout, Apache Tika) that real ATSs actually use, asserts clean extraction across nine structural checks, and surfaces disagreement between parsers as diagnostic signal. Use when the user asks about ATS parseability, text extraction, PDF parsing reliability, whether a resume variant breaks for ATSs, how the extraction gate works, running or debugging extraction-check, comparing extractor outputs, adding assertions, writing new broken fixtures, interpreting cross-extractor disagreement, or the text-extraction hypothesis at .claude/context/text-extraction-hypothesis.md. Keywords ATS, applicant tracking system, text extraction, PDF parsing, pdftotext, poppler, tika, Apache Tika, reading order, mojibake, cross-extractor, extraction check, regression gate, resume parser, parseability, soft hyphen, url dedup.
allowed-tools	Read, Write, Edit, Grep, Glob, Bash

Extraction Check

When to invoke this skill

"Is this resume change safe for ATSs?" / "Will this parse cleanly?"
"Why did extraction-check fail on [assertion X]?"
"Run the extraction check" / "Compare what each extractor sees"
"Add a new assertion / broken fixture / extractor"
"How does the gate work?"
"The CI Extraction Check job is failing — help me diagnose"
Any mention of pdftotext, poppler, Tika, mojibake, reading order, or cross-extractor divergence

Quickstart

just compile            # produce will_cygan_resume.pdf
just extraction-check   # run the 7-assertion gate on the compiled PDF
just test               # run the negative-fixture regression suite (6 pytest cases)

Prereqs on macOS: brew install poppler tika typst. CI installs pinned Tika 3.3.0.

The nine assertions

#	Name	What it checks
1	`1-non-empty`	Extracted text is >500 bytes (catches image-only or encoding-broken PDFs)
2	`2-section-order`	Declared section headers appear in the expected visual order
3	`3-name-contact`	Name and email are present, not glued together (ATS field-map footgun)
4	`4-job-contiguity`	Title / company / date-start / date-end co-occur within a 300-char window per job
5	`5-date-format`	At least one date range matches a consistent "Mon YYYY – …" format
6	`6-mojibake`	Zero replacement chars, no flagged smart-quote/em-dash, and no Private Use Area codepoints U+E000–U+F8FF (icon-font tofu from stray `fa-icon(...)` calls)
7	`7-cross-extractor`	All available extractors agree on section order and job count, and no two extractor outputs differ by more than 1.5× in byte count
8	`8-soft-hyphen`	No U+00AD soft hyphens in any extractor's output (breaks hyphenated words across paragraphs in Tika)
9	`9-keyword-roundtrip`	Every ATS-searchable keyword declared in `[keywords].required` survives extraction as an exact substring (guards against ligature collapse and font-substitution regressions)
10	`10-url-dedup`	Every URL that appears in extractor output appears exactly once (redundant title-links triple-emit in Tika's URL block)
11	`11-section-boundary`	Every adjacent pair of top-level section headers is separated by at least one blank line (prevents section-boundary parsers from merging sections)

Reading further

For anything deeper, read the relevant reference:

Architecture — script layout, data model, how evaluate_pdf orchestrates parallel extractors
Running checks — local commands, CI workflow, interpreting output
Comparing outputs — what each extractor tells you, resolving disagreement, manual cross-checks
Extending — adding an assertion, a new extractor, or a broken fixture; updating fixtures when the resume changes
Failure playbook — one entry per assertion: how failures look, root causes, how to fix

Ground rules

Do not silently loosen an assertion to make it pass. If the 300-char window fires on a legitimate layout, investigate the layout first. Adjust thresholds only with a commit-message reason.
Do not add a new extractor unless it represents a real ATS parser family. pypdf / pdfplumber are Python-native and don't match ATS behavior — see the hypothesis spec for why.
The real resume is the ground truth for fixtures. scripts/extraction-check.fixtures.toml is hand-maintained against will_cygan_resume.typ. When the resume gains a job or renames a section, update the TOML in the same commit.
Negative fixtures must fail cleanly. Each broken .typ exists to prove one assertion fires. If a fixture trips multiple assertions unintentionally, tighten or split the fixture before shipping.

extraction-check

Extraction Check

When to invoke this skill

Quickstart

The nine assertions

Reading further

Ground rules

Plus depuis ce dépôt

Plus depuis ce dépôt

Extraction Check

When to invoke this skill

Quickstart

The nine assertions

Reading further

Ground rules