Run any Skill in Manus with one click

$pwd:

job-pipeline

Name: Job Pipeline
Author: ymuromcev

// Multi-profile job-search pipeline — scan ATS adapters, validate the pipeline, prepare Inbox jobs (fit scoring, CL gen, Notion push), sync with Notion, check Gmail for responses, and answer application Q&A questions with reuse from a Notion-backed answer bank. Trigger on: /job-pipeline, /job-pipeline scan, /job-pipeline validate, /job-pipeline sync, /job-pipeline prepare, /job-pipeline check, /job-pipeline answer, or when user asks to scan/validate/sync/prepare/check/answer jobs for a specific profile (see the `profiles/` directory for the current list).

Run Skill in Manus

$ git log --oneline --stat

stars:18

forks:1

updated:May 6, 2026 at 08:29

File Explorer

3 files

SKILL.md

readonly

package.json

"author": "ymuromcev"

"repository": "ymuromcev/ai-job-searcher"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name

job-pipeline

description

Multi-profile job-search pipeline — scan ATS adapters, validate the pipeline, prepare Inbox jobs (fit scoring, CL gen, Notion push), sync with Notion, check Gmail for responses, and answer application Q&A questions with reuse from a Notion-backed answer bank. Trigger on: /job-pipeline, /job-pipeline scan, /job-pipeline validate, /job-pipeline sync, /job-pipeline prepare, /job-pipeline check, /job-pipeline answer, or when user asks to scan/validate/sync/prepare/check/answer jobs for a specific profile (see the `profiles/` directory for the current list).

job-pipeline — Multi-profile Job Search Pipeline

Single engine, per-profile data. All commands take --profile <id>. Currently supported profiles: jared.

Commands

/job-pipeline scan — Discover new jobs across configured ATS adapters (greenhouse / lever / ashby / smartrecruiters / workday / calcareers / usajobs / indeed / remoteok). Append to shared pool + per-profile pipeline.
/job-pipeline validate — Pre-flight: TSV hygiene, company-cap check, URL liveness on active applications.
/job-pipeline sync — Reconcile per-profile applications with Notion. Default = dry-run, must pass --apply to commit.
/job-pipeline prepare — Two-phase processing of Inbox jobs (RFC 014): mechanical pre-phase (filter / URL check / JD fetch / salary) + Claude LLM phase (geo check / fit score / CL gen / Notion push). Commit transitions row from Inbox → To Apply (decision=to_apply) or Inbox → Archived (decision=archive).
/job-pipeline check — Two-phase Gmail response check: --prepare builds Gmail search batches for Claude MCP, --apply consumes Claude-written emails and updates Notion + TSV + logs.
/job-pipeline indeed-prep — Phase 1 of Indeed ingest. Prints scan URLs + JS extraction snippet + filter context for the Claude browser MCP session. Phase 2 (browser fetch) is manual via Chrome MCP. Phase 3 = standard scan (the indeed adapter ingests the file Claude wrote).
/job-pipeline answer — Generate or reuse application answers (Why join? / Influences? / Motivation? etc.). Three-phase: search Notion Q&A DB by dedup key → reuse if exact match else generate via Humanizer Rules → push answer back to Notion + write local .md backup. Per-profile DB at profile.notion.application_qa_db_id.

If no mode is specified, show this help and ask which to run.

Required context per session

Before running any command, verify:

Profile id — resolution policy:
- Default profile is jared. If the user invokes /job-pipeline <cmd> cold with no profile hint, use jared and do NOT ask.
- NLP extraction: if the user's phrase mentions another profile by name (e.g. "для Лили", "scan Lilia's pipeline", "Lilia profile"), extract that name, lowercase it, map it to the matching profiles/<id>/ directory, and use it. Valid profiles = subdirectories of profiles/ (excluding _example).
- Session-sticky: once a non-default profile is resolved in a session, keep using it for subsequent commands in the same session. Switch back to jared only on an explicit phrase like "switch to Jared" / "для Джареда".
- Ask only when ambiguous: if the phrase mentions a name that doesn't match any profile dir, ask once (list valid profiles), then stick.
- Always pass the resolved id to the CLI via --profile <id>.
Working directory = ai-job-searcher/ (public repo clone at ~/Desktop/Claude Code/ai-job-searcher/). All commands resolve data/ and profiles/ relative to cwd.
Secrets in env. For profile <id>, the CLI reads <ID_UPPER>_* env vars only:
- JARED_NOTION_TOKEN (required for sync)
- JARED_USAJOBS_API_KEY, JARED_USAJOBS_EMAIL (required if discovery:usajobs is enabled)
Companies pool. data/companies.tsv must exist. Bootstrap once with:
```
node engine/bin/seed_companies.js
```
(parses 247 targets from the legacy Job Search/find_jobs.js).

Step-by-step

scan

node engine/cli.js scan --profile <id> [--dry-run] [--verbose]

Load profiles/<id>/profile.json → enabled discovery modules + filter rules.
Load data/companies.tsv → group targets by adapter source.
Apply per-profile discovery.companies_whitelist/blacklist.
Invoke each enabled adapter via engine/core/scan.js orchestrator (errors per source isolated, do not block the run).
Dedupe new jobs against data/jobs.tsv master pool by (source, jobId).
Atomically write data/jobs.tsv + append fresh rows to profiles/<id>/applications.tsv with status="Inbox" (RFC 014 — TSV-only state for fresh-after-scan rows; transitions to To Apply after prepare --phase commit decision=to_apply). Notion DBs never see Inbox — they keep the 8-status set.

--dry-run prints planned writes without touching disk.

validate

node engine/cli.js validate --profile <id> [--dry-run]

Read-only checks. Exit 0 if clean, 1 if any issue:

TSV hygiene — both data/jobs.tsv and profiles/<id>/applications.tsv parse cleanly.
company_cap — counts active applications per company against profile.filter_rules.company_cap.max_active plus per-company overrides. Active = To Apply / Applied / Interview / Offer. (Inbox is intentionally excluded — those rows are pre-triage and may yet be archived.)
URL liveness — HEAD-pings each active application URL with concurrency 8, falls back to GET on 405/501. Reports any 4xx/5xx/timeout. Skipped under --dry-run.

sync

node engine/cli.js sync --profile <id> [--apply] [--verbose]

Bidirectional reconcile with the profile's notion.jobs_pipeline_db_id:

Push — bare push from sync is disabled (legacy). Notion pages are created exclusively by prepare --phase commit decision=to_apply, which assembles fit + CL + salary + Notion page atomically. sync is pull-only.
Pull — fetch all Notion pages, match by key field (<source>:<jobId>), apply Notion's status and notion_page_id to local TSV (Notion wins on status).

Default mode prints the plan and runs the read-only pull preview. Pass --apply to actually mutate Notion and TSV.

Failure modes / how to recover

companies pool is empty — run node engine/bin/seed_companies.js once.
missing JARED_NOTION_TOKEN — load it from ~/.bashrc / .env. Token format: ntn_….
adapter source mismatch or no adapter for source X — profile.json references an unknown discovery module. Either remove from modules or add the adapter file.
HTTP 429 from greenhouse/lever — adapters retry with exp backoff up to 3 attempts; on persistent 429, the source is reported in the summary and the rest of the run continues.
CalCareers HTML changed — sanity warn ResultCount marker missing indicates upstream changed; investigate engine/modules/discovery/calcareers.js regexes.

Anti-patterns — do NOT

Do not run any command without --profile — the CLI will refuse and print help.
Do not invoke sync without --apply and assume it will write — it always defaults to dry-run.
Do not invoke check --apply without first running check --prepare + the Phase-2 Gmail MCP reads — the script will error on missing raw_emails.json.
Do not edit data/jobs.tsv or profiles/<id>/applications.tsv by hand while a scan is running. Atomic-rename protects against partial writes but not against logical conflicts.
Do not commit data/ or profiles/<id>/ to git — both are in .gitignore for a reason. Only profiles/_example/ is committed.
Do not mix profiles in one process. Each CLI invocation loads exactly one profile's secrets — never load JARED_* and PAT_* together.
Do not generate a new application answer without first running answer --phase search and inspecting matches. Reuse before regenerate is the rule for /job-pipeline answer.
Do not push an answer to the Notion Q&A DB without an explicit user approval signal (пойдет / good / submitted / залил). Same shared-state rule as the cover-letter flow.
Do not invent new Q&A categories. Use one of the 8 canonical names from the DB; the categorizer picks a default automatically.

prepare

prepare is autonomous (BL-9 Step 5, 2026-05-05). The user invokes /job-pipeline prepare once. Claude orchestrates a multi-iteration loop that calls the CLI repeatedly and judges batches, until either the target (30 jobs in Notion) is met or the Inbox is exhausted. Then a single push to Notion finalizes the result.

The loop has three phases of CLI invocation, all using the same prepare_context.json file:

Mode	Source pool	When
`--mode fresh`	Inbox rows that pass blocklist / cap / URL-check (the standard pipeline)	First iteration of every run
`--mode topup`	`deferredQueue` (rows that passed filter but didn't fit the previous batch)	Iterations 2–3 if Strong+Medium < 30
`--mode weak-fallback`	`deferredQueue` + already-Weak Inbox rows in TSV	After 3 iterations if Strong+Medium still < 30

Phase 1 — pre (CLI, called by Claude)

node engine/cli.js prepare --profile <id> --phase pre --mode <fresh|topup|weak-fallback> [--batch 30] [--need K] [--dry-run]

Runs without LLM cost. Outputs profiles/<id>/prepare_context.json:

batch[] — jobs ready for SKILL judgement (URL-checked, JD-fetched, salary computed). Topup / weak-fallback APPEND to this array.
batch[i].wasAlreadyWeak: true (weak-fallback only) — row was judged Weak in a prior run; carry over priorFitScore / priorFitRationale instead of re-judging.
deferredQueue[] — TSV keys that passed the filter pipeline but weren't URL-checked yet (used by the next topup / weak-fallback call).
skipped[] — engine-level skip reasons (company_cap / title_blocklist / url_dead / already_evaluated_weak / etc.).
stats.inboxExhausted: bool — true when no more fresh rows remain in TSV outside the current batch (excluding duplicate-flagged rows). The loop must stop iterating when this flips true.
stats.skipReasons / stats.deferred / stats.weakFallbackRuns etc.

Phase 2 — autonomous loop (Claude orchestrates)

When the user invokes /job-pipeline prepare, Claude runs the following loop. Loop logic lives in the SKILL; the engine is stateless.

Step A — initialize
  iteration = 0
  verdicts = { strong: [], medium: [], weak: [] }    # per-job evaluation buckets
  Run: prepare --phase pre --mode fresh --batch 30
  Read: profiles/<id>/prepare_context.json

Step B — per-iteration evaluation loop (chase Strong)
  while True:
    iteration += 1
    new_entries = batch entries we haven't judged yet
    For each entry in new_entries:
      run Steps 1, 3, 4, 5, 5.7, 6, 7, 8 (fit + CL gen, NOT Notion push)
      append to verdicts[<fitScore-bucket>]

    # Stop conditions — keep iterating to chase more Strong even if
    # Medium is plentiful. Selection priority is Strong → Medium → Weak,
    # so 30 pure Strong always beats 15 Strong + 15 Medium.
    if len(verdicts.strong) >= 30: break
    if stats.inboxExhausted: break
    if iteration >= 3: break

    # Topup — try to find more Strong (need = how many Strong we still want)
    Run: prepare --phase pre --mode topup --need (30 - len(verdicts.strong))
    Re-read prepare_context.json → continue

Step C — weak fallback (only if needed)
  strongMedium = len(verdicts.strong) + len(verdicts.medium)
  if strongMedium < 30 AND NOT stats.inboxExhausted:
    Run: prepare --phase pre --mode weak-fallback --need (30 - strongMedium)
    Re-read prepare_context.json
    new_entries = batch entries not yet judged
    For each entry:
      if entry.wasAlreadyWeak:
        # Reuse saved verdict, do NOT spend tokens re-judging
        fitScore = entry.priorFitScore  (always "Weak")
        fitRationale = entry.priorFitRationale
        skip Step 4
        run Steps 7, 8 (CL gen) so the row is push-ready
        append to verdicts.weak
      else:
        run Steps 1, 3, 4, 5, 5.7, 6, 7, 8 as usual

Step D — pick top 30 by sequential priority
  # Sequential fill: take ALL Strong first, then Medium to fill, then Weak.
  # Mediums never bump Strongs; Weaks never bump Mediums.
  candidates = verdicts.strong + verdicts.medium + verdicts.weak  (in that order)
  final = candidates[:30]
  if len(final) < 30:
    warn user: "only N candidates found, push N instead of 30"

Step E — push final to Notion
  For each job in `final`: run Step 9 (Notion page creation)

Step F — write results + commit
  Step 10: write prepare_results_<timestamp>.json
    decision = "to_apply" for each job in `final`
    decision = "skip" for everything else evaluated this run (carries fitScore so the
               row gets persisted to TSV and won't be re-judged next prepare run)
  Step 11: prepare --phase commit --results-file <path>
  Step 12: report to user

Stdout per iteration (mirror this format so the user sees progress):

prepare loop iteration 1/3: 30 candidates → 8 Strong, 12 Medium, 10 Weak
prepare loop iteration 2/3: 27 new candidates → 5 Strong, 9 Medium, 13 Weak
prepare loop iteration 3/3: 21 new candidates → 2 Strong, 9 Medium, 10 Weak
loop done: 15 Strong, 30 Medium, 33 Weak (target 30)
selecting top 30: 15 Strong + 15 Medium → push to Notion

If weak-fallback runs:

strong+medium below target (8+5=13 of 30); falling back to weak…
weak-fallback: pulled 17 entries (12 already-Weak, 5 fresh)
final: 8 Strong + 5 Medium + 17 Weak = 30 → push to Notion

If inboxExhausted:

loop done: 4 Strong, 7 Medium (Inbox exhausted)
warn: only 11 candidates found, push 11 instead of 30
selecting all 11 (4 Strong + 7 Medium) → push to Notion

Anti-patterns — do NOT

Do not push to Notion (Step 9) inside the iteration loop. Push happens ONCE on the top-30 final list (Step E above). Otherwise we'd create 60–90 Notion pages and only some would be marked final.
Do not re-judge entry.wasAlreadyWeak === true rows in Step 4. The verdict is already in TSV; re-asking burns tokens. Carry priorFitScore / priorFitRationale straight into the verdict bucket.
Do not keep iterating past 3 even if Strong < 30 and Inbox isn't exhausted. The cap is by design — token budget protection. After 3, the SKILL falls back to weak.
Do not stop the loop just because Strong+Medium ≥ 30. Selection priority is Strong → Medium → Weak; we keep iterating to find more Strong (Mediums don't satisfy the target). Stop only on Strong ≥ 30, iter ≥ 3, or inboxExhausted.
Do not terminate the loop early just because iteration N had 0 Strong. Strong from later iterations can still hit the target; keep going until one of the three stop conditions is true.
Do not invoke --mode topup or --mode weak-fallback directly without first running --mode fresh (or another mode) to seed prepare_context.json. Topup / weak-fallback exit 1 if the context file is missing.

The reference for each Step (1, 3, 4, 5, 5.7, 6, 7, 8, 9, 10, 11, 12) is below — these describe the per-job evaluation logic that the loop above orchestrates.

After the CLI writes prepare_context.json, Claude then:

Step 1 — Load memory

Read profiles/<id>/prepare_context.json first. The memory block (populated by profile_loader from profile.json.memory) contains:

memory.writingStyle — content of profiles/<id>/memory/user_writing_style.md
memory.resumeKeyPoints — content of profiles/<id>/memory/user_resume_key_points.md
memory.feedback — array of {file, content} for each feedback_*.md under the configured dir

Use those strings directly. If memory.writingStyle or memory.resumeKeyPoints is null, the per-profile file is missing — fall back to profiles/<id>/resume_versions.json and profiles/<id>/cover_letter_template.md for tone hints. Do not re-read the memory files from disk — the engine already loaded them.

Step 2 — Read prepare_context.json

Read profiles/<id>/prepare_context.json

Report stats: inboxTotal / afterFilter / inBatch / urlAlive / urlDead. Proceed without confirmation — the CLI's --batch N flag already gates batch size; Claude does not re-prompt the user. Default is 30; adjust by re-running pre-phase with a different --batch.

Step 3 — Geo decision (now profile-driven, L-4 / RFC 013)

The engine pre-phase already enforces profile.geo policy and surfaces the result on every batch entry. Read prepare_context.batch[i].geo_decision — do NOT WebFetch for geo.

For each job in batch:

geo_decision === "allowed" → proceed to Step 4. The geo_matched_by field describes WHY it passed ("city:Sacramento" / "remote" / "country:US" / "unrestricted") — useful when generating the fit rationale.
geo_decision === "rejected" → already pruned by engine. You won't see it in batch[] (it's in prepare_context.skipped[] with reason "geo_metro_miss" / "geo_country_miss" / "geo_remote_only_miss" / "geo_blocklist" / "geo_no_location"). The stats.skipReasons breakdown shown in Step 12 includes geo-counters.
geo_decision field absent (legacy prepare_context.json from before L-4 migration, or profile without profile.geo block) → fallback: WebFetch JD location, apply simple US-policy as before. Engine version post-2026-05-04 always populates geo_decision.

Step 4 — Fit scoring (per job)

Apply Fit Score rules from ## Global Guard Rails below. Assign one of: Strong / Medium / Weak.

Write a 1-sentence fit rationale (concrete domain overlap, not generic praise). This goes into the Notion Notes field.

Early-startup modifier: if company is pre-Series B or <50 employees — downgrade one level.

Step 5 — Bucket: fit (loop-aware)

Geo filtering already happened in the engine pre-phase (Step 3 is read-only) — only entries with geo_decision === "allowed" reach the batch. The remaining gate is fit.

In the autonomous loop (Step B above), assign every judged job to one of three in-memory buckets — verdicts.strong / verdicts.medium / verdicts.weak — based on fitScore. Do NOT mark anything decision="skip" yet. The final skip / to_apply call is made in Step E once the top-30 is picked.

If the entry has wasAlreadyWeak: true, it goes straight to verdicts.weak with fitScore="Weak" carried over from entry.priorFitScore. No Step 4 re-judging.

Report iteration counts to the user (per the stdout format in Phase 2 above): how many landed in each bucket per iteration, what the running total is.

Step 5.7 — Auto-tier unknown companies

If prepare_context.unknownTierCompanies is non-empty, assign each company a tier (S / A / B / C). Do not prompt the user — the user never tiers companies manually; this is your job.

Tier criteria (read profile-flavor-aware):

Tier	Jared (PM / fintech-leaning)	Lilia (healthcare RN)
S	Public big-tech / top fintech, $10B+ market cap, top brand recognition. AI-native or strong AI investment.	Major regional health systems (Kaiser, UC Davis, Sutter, Dignity), 10K+ employees, full benefits, RN union presence.
A	Late-stage / public mid-cap. Strong funding, well-known brand. AI presence non-trivial.	Large medical foundations / dialysis chains / managed care. 1K+ employees, multi-site.
B	Growth-stage Series C–E, $1–5B valuation. Recognized in their sector.	Specialty clinic chains (eye, dental, dermatology). Multi-location, regional reach.
C	Early/mid-stage Series A–B, <$1B. Less known.	Small private practices, single-clinic operations, local services.

Sources of signal (use freely, in this order):

Company name recognition + general knowledge (size, valuation, funding round).
jdText if present — funding mentions, employee count, "Series X", "publicly traded".
WebFetch the company website / Crunchbase / LinkedIn for ambiguous cases (one extra fetch per unknown company is fine).

Output: every name from unknownTierCompanies MUST appear in results.companyTiers with a value in {"S","A","B","C"}. Don't skip companies — the engine treats absence as "still unknown" and the next prepare run will re-ask.

After tiering, the engine will persist the assignments to profile.json.company_tiers on commit (one-shot per company). It also uses them to set the Notion Companies DB Tier field on first push.

Step 6 — Salary (auto-fill)

For each remaining job:

If prepare_context.batch[i].salary is non-null: use it as-is.
If salary is null AND the entry has unknownTier: true: look up the tier you just assigned in Step 5.7, then use prepare_context.salaryConfig (per-profile matrix + level parser + COL config from profile.json.salary). When salaryConfig is null the engine's default fintech-PM matrix in engine/core/salary_calc.js applies. Pick the row at Tier × Level (level = engine parseLevel(title, salaryConfig.levelParser)). Compute salaryMin / salaryMax from the matrix and apply the COL adjustment defined in salaryConfig.colAdjustment (defaults: SF/NYC +7.5% unless Remote).
If salary is null AND unknownTier is not true: this means the tier is known but the matrix doesn't cover the level — flag to user with the company name and title, do NOT invent a range.

Step 7 — Archetype selection (per job)

Choose the best resume archetype from profiles/<id>/resume_versions.json for this specific role. Prefer the archetype whose domain keywords overlap most with the JD / job title.

Mandatory validation: resumeVer MUST be a key that literally exists in profile.resume_versions.versions. Do NOT invent or paraphrase a key. If no archetype is a clear match, pick the closest existing key (or the profile's default if defined) and note the partial match in the rationale — never write a key that isn't in the file. The prepare commit phase will hard-fail on unknown keys, so catch the mismatch here.

Record resumeVer = archetype key (e.g. "fintech-pm-v3").

Step 8 — Cover letter generation (template-first, per job)

Per G-17: do NOT write CLs from scratch. Reuse the profile's saved cover-letter library so proof paragraphs (achievements, metrics, candidate facts) stay locked across the batch and only company-specific copy is regenerated. This keeps tone consistent across same-batch letters and cuts token cost roughly in half.

8a — Load base template.

Read profiles/<id>/cover_letter_versions.json and detect its shape:

Template-variants shape (Lilia and similar profiles): top-level defaults object with locked-down p2, p3, p4_template, availability, sign strings, plus a letters array of { job_id, role, company, p1 } variants.
Library shape (Jared and similar profiles): top-level keys map to entries shaped as { filename, paragraphs: [p1, p2, p3, p4] }.

If cover_letter_versions.json is missing or empty (cold-start profile), fall back to writing all paragraphs from scratch using profiles/<id>/cover_letter_template.md as the structure guide. This is the only path where Step 8 generates the full letter.

8b — Pick the base entry.

Match priority, in order — stop at the first hit:

Same company + same role focus — exact company match AND role keywords overlap (e.g. existing entry for Affirm + Capital-focused, current job is Affirm Capital PM → reuse).
Same company, different role — useful when prior letter exists for the company. Replace P3 (why this company) sparingly; P2 proof stays.
Same archetype, different company — match the chosen resumeVer (Step 7) to an existing entry whose role focus aligns with the archetype (e.g. resumeVer = ConsumerLending → look for lendbuzz_creditcard / affirm_capital / similar entries).
Closest archetype — if no archetype-aligned entry exists, pick the most domain-adjacent one (e.g. PaymentsInfra → fall back to PlatformInfra-style entry).

In template-variants shape: defaults.{p2, p3, p4_template} IS the base — every letter reuses them. Only P1 varies, and the letters array is your reference set for tone/length on past P1s.

8c — Rebuild the letter.

P2 (Core proof) — copy verbatim from the base entry. Do NOT paraphrase, reorder facts, or substitute different metrics. The candidate's achievements are stable; rewriting P2 introduces drift and dilutes proof.
P3 (Secondary proof / AI angle / why this company) — copy verbatim from the base entry IF the role focus matches. If the new role is in a clearly different sub-domain (e.g. base is growth-retention, new role is platform-API), regenerate ONLY this paragraph. Apply Humanizer Rules.
P1 (Hook — company-specific) — always regenerate. Use the JD signal + company-specific challenge as anchor. Apply Humanizer Rules. Pattern: "[Company] does [X]. The harder problem is [Y]. That's exactly what I've solved at [previous role]."
P4 (Close) — for template-variants shape, fill p4_template placeholders ({availability}, etc.). For library shape, copy verbatim from base entry.

Output: a 4-paragraph CL where P2 (and usually P3) are exact copies of an existing humanized letter, and only the hook (and rarely P3) is freshly written for this company.

8d — Final humanizer pass.

Apply Humanizer Rules from ## Humanizer Rules below to any newly-written paragraphs (typically P1, sometimes P3). The verbatim-copied paragraphs are already humanized — do NOT re-humanize them, that introduces drift.

8e — Emit paragraphs (engine writes the files).

Do NOT write the CL file yourself. Per RFC 019 / BL-14, the SKILL produces paragraphs and the engine writes both .md and .pdf in the commit phase.

In the results.json entry for this job (Step 10) include:

clParagraphs: string[] — exactly the 4 finalized paragraphs (P1, P2, P3, P4) in order, no headers, no blank-line padding inside the strings. Engine joins them with \n\n for the MD file and feeds them to pdfkit for the PDF.
clKey — filename stem only, no extension and no path. Convention: <Company>_<role-slug>_<YYYYMMDD> (e.g. Affirm_senior-ai-pm_20260505). Engine derives the company sub-folder by slugifying applications.tsv.companyName — you do not need to slugify yourself.
clBaseKey — the base entry key / job_id you reused for proof paragraphs (helps audit batch consistency: if 10 letters share clBaseKey = "affirm_capital", P2 is identical across them, which is the point).

Do NOT set clPath — the engine computes the canonical relative path (cover_letters/<CompanySlug>/<clKey>.pdf) and writes it into TSV cl_path itself. Sending clPath is harmless (it gets overridden), but it's no longer the SKILL's responsibility.

Files the engine produces on commit:

profiles/<id>/cover_letters_md/<CompanySlug>/<clKey>.md — overwritten on every commit run with the fresh content from this batch.
profiles/<id>/cover_letters/<CompanySlug>/<clKey>.pdf — generated via pdfkit if it doesn't already exist; existing PDFs are preserved (idempotent).

Step 9 — Notion page creation (per job in the final top-N)

This step runs ONCE per prepare invocation, after the autonomous loop (Step B) and weak-fallback (Step C) have settled the verdict buckets. Iterate over the top-30 sorted list (Step D above) — these are the jobs marked decision = "to_apply". Do NOT push during loop iterations; that would create duplicate / over-pushed pages.

For each job in the final top-N:

9.0 Skip-guard. If the matching applications.tsv row already has a non-empty notion_page_id, the page was created in a prior run — record the existing id as notionPageId in results.json and skip 9a–9c (no new page, no duplicate). This makes operator-reruns of the SKILL idempotent.

9a. Resolve Company relation. Query profile.notion.companies_db_id for the company by name (title match). If found — use that page id. If not — create a new Company page with Name = company and Tier from profile.company_tiers[name] (if known), then use the new page id.

Create a Notion page in profile.notion.jobs_pipeline_db_id with ALL required fields (see Notion Field Completeness in Guard Rails):

Title — job title
Company — relation (array with the page id from 9a)
Status — "To Apply"
Fit Score — Strong / Medium (from Step 4)
URL — from url
Source — from source
Date Added — today (YYYY-MM-DD)
Work Format — from JD or job listing
City — from JD (or "Remote")
State — from JD (or "Any" when unspecified / remote US-wide)
Notes — fit rationale from Step 4
Salary Expectations — display string like "$140-190K ($165K mid)"
Salary Min — numeric dollar amount (e.g., 140000)
Salary Max — numeric dollar amount (e.g., 190000)
Cover Letter — full PDF filename including the .pdf extension, e.g. Affirm_analyst-ii-credit-risk-analytics_20260420.pdf (i.e. <clKey>.pdf). This is the value users search by in Finder / Drive to locate the actual file. Per BL-14: rich_text field, not file-attachment.
Resume Version — select, from resumeVer

Profile-gated fields (L-5) — push only when profile.notion.property_map declares the field. If the property is absent from the map, do NOT push (back-compat: Jared has no Schedule / Requirements columns; his pages remain unchanged):

Schedule — select, from prepare_context.batch[i].schedule (extracted by the engine from JD text — values like "Full-time" / "Part-time" / "Per Diem" / "PRN" / "Contract"). Skip the field when the entry has no schedule key (extractor returned null).
Requirements — rich_text, from prepare_context.batch[i].requirements (short bulleted summary of education / years experience / language / certifications). Skip when the entry has no requirements key.

Industry is a rollup — do NOT set it. It is inherited automatically from the Company relation.

Record the returned notion_page_id.

Step 10 — Write results file

Write profiles/<id>/prepare_results_<YYYYMMDD_HHMMSS>.json ONCE at the end of the run, covering every job evaluated across all loop iterations (not just the final top-N).

Result entries:

decision: "to_apply" — the top-N pushed to Notion in Step 9. Carries clKey, clParagraphs, clBaseKey, resumeVer, notionPageId, salaryMin, salaryMax, plus fitScore / fitRationale. Engine writes the MD + PDF files from clParagraphs on commit (BL-14 / RFC 019); do not include clPath — engine derives it from the company slug + clKey.
decision: "skip" — every other judged row from the loop. Carries fitScore: "Weak" (or rarely Strong/Medium that didn't make the cut) + fitRationale so the engine commit phase persists the verdict and filterAlreadyEvaluated skips them on the next prepare run. (Strong/Medium that didn't make top-30 still get persisted; if they pass filter next time, they'll be picked up again.)

Format:

{
  "profileId": "<id>",
  "generatedAt": "<ISO timestamp>",
  "companyTiers": {
    "<company-name>": "S|A|B|C"
  },
  "results": [
    {
      "key": "<source>:<jobId>",
      "decision": "to_apply",
      "fitScore": "Strong",
      "fitRationale": "...",
      "geo": "us-compatible",
      "clKey": "<Company>_<role-slug>_<YYYYMMDD>",
      "clParagraphs": [
        "P1 — company-specific hook ...",
        "P2 — core proof, verbatim from base entry ...",
        "P3 — secondary proof / why this company ...",
        "P4 — close ..."
      ],
      "clBaseKey": "<reused-entry-key-from-cover_letter_versions.json or null>",
      "resumeVer": "<archetype-key>",
      "notionPageId": "<uuid>",
      "salaryMin": 140000,
      "salaryMax": 190000
    },
    {
      "key": "<source>:<jobId>",
      "decision": "skip",
      "fitScore": "Weak",
      "fitRationale": "...",
      "geo": "us-compatible"
    }
  ]
}

clParagraphs schema notes:

Required for decision: "to_apply". If absent, the engine logs a warning and falls back to legacy behavior (no MD/PDF written this batch); the row still becomes "To Apply" in TSV/Notion but has no file. Always include it.
Absent for decision: "skip" and decision: "archive" — no CL needed.
Must be exactly 4 strings (P1/P2/P3/P4). Each string is one paragraph; no embedded blank lines, no markdown headers, no \n\n inside individual strings (engine inserts the spacing).

companyTiers is required only when prepare_context.unknownTierCompanies was non-empty. List every company from that array with the tier you assigned in Step 5.7. Engine merges this into profile.json.company_tiers on commit; Notion's Companies DB Tier field is updated by the SKILL itself when it creates/updates the company page in Step 9.

Step 11 — Commit phase (CLI)

node engine/cli.js prepare --profile <id> --phase commit \
  --results-file profiles/<id>/prepare_results_<timestamp>.json

This updates applications.tsv: to_apply entries get status="To Apply", cl_key, cl_path, resume_ver, notion_page_id, salary_min, salary_max. Run with --dry-run first to preview.

Step 12 — Report to user

Summarize at the user level (BL-11 — what the user sees, not engine internals):

Headline: pushed N jobs to Notion (Strong: A, Medium: B, Weak: C). If N < 30, lead with only N candidates found and the reason (Inbox exhausted or loop hit 3-iteration limit).
Loop summary: how many iterations ran, whether weak-fallback triggered. Example: loop: 2 iterations + weak-fallback (15 already-Weak rows reused).
CL reuse breakdown: group by clBaseKey — 8 reused affirm_capital, 3 reused chime_growth, 1 written from scratch.
Pre-phase skips: read from prepare_context.stats.skipReasons and surface the breakdown verbatim (e.g. company_cap: 5, title_blocklist: 2, url_dead: 1). Omit if {}.
Auto-tier assignments: only if Step 5.7 ran — list the company → tier mapping.
Deferred queue: prepare_context.stats.deferred — number of fresh rows that didn't make it into the batch. Mention only if non-zero AND inboxExhausted is false (otherwise queue's empty).
Warnings / anomalies: any invalid resumeVer, invalid tier, fit-validation warnings the engine logged.

Failure modes / how to recover (prepare-specific)

prepare_context.json missing — run --phase pre first.
jdText is null for many jobs — Greenhouse / Lever API may have changed; investigate engine/core/jd_cache.js. Geo + fit can still run from the job title + company name.
Notion page creation fails — check JARED_NOTION_TOKEN env var and that the DB id in profile.json is correct. Re-run the SKILL for the failed jobs only (skip already-created ones by key).
Unknown company tier (salary = null AND unknownTier: true) — assign the tier in Step 5.7 and put it in results.companyTiers. The commit phase persists it to profile.json.company_tiers automatically; no need to edit the file by hand.

check

Two-phase Gmail response checker. Reads are delegated to Claude via Gmail MCP — the script never touches OAuth.

Phase 1 — prepare (CLI)

node engine/cli.js check --profile <id> --prepare [--since <ISO>]

Builds a search plan without hitting Gmail:

Loads profiles/<id>/applications.tsv → picks rows where status ∈ {Applied, To Apply, Interview, Offer} AND notion_page_id is set → forms activeJobsMap. (Inbox is excluded — those rows haven't been pushed to Notion, so no email thread can match yet.)
Computes cursor epoch: saved.last_check or --since ISO, clamped to 30 days ago.
Emits Gmail query batches (10 companies/batch + fixed LinkedIn batch + fixed recruiter batch).
Writes profiles/<id>/.gmail-state/check_context.json.
Prints JSON: { epoch, batches, processedIds }.

Phase 2 — Gmail reads (Claude via MCP)

Claude executes in parallel:

For each batches[i] → call Gmail MCP search_threads with the query + pageSize: 50.
Collect all messageId values across threads, dedupe, remove any already in processedIds.

For each new messageId → call gmail_read_message in parallel. Build per-email object:

{
  "messageId": "...",
  "subject": "<headers.Subject>",
  "from": "<headers.From>",
  "date": "<headers.Date>",
  "body": "<body>"
}

Write the array to profiles/<id>/.gmail-state/raw_emails.json via Write tool.

If 0 new IDs found → write [] and proceed (Phase 3 still runs to bump last_check).

Phase 3 — apply (CLI)

node engine/cli.js check --profile <id> [--apply]

Default is dry-run (plan only, no Notion writes, no TSV mutations). With --apply:

Reads raw_emails.json + check_context.json.
Filters out messages already in processed_messages.json.
Branches per email:
- LinkedIn job alert (from:jobalerts-noreply@linkedin.com) → append a new TSV row with status="Inbox" (enters the same fresh-discovery lifecycle as scan-derived rows; next prepare will triage it).
- Recruiter outreach (subject matches recruiter keywords): if sender's company is in pipeline → append TSV row with status="Inbox"; otherwise → recruiter_leads.md only.
- Normal pipeline → classifier.js assigns one of: REJECTION, INTERVIEW_INVITE, INFO_REQUEST, ACKNOWLEDGMENT, OTHER. Then email_matcher.js resolves to a pipeline (company, role) tuple.
Plans Notion actions per match:
- REJECTION → Status → Rejected + add comment.
- INTERVIEW_INVITE → Status → Interview + add comment. (Notion DB only has Interview — do NOT push Phone Screen / Onsite.)
- INFO_REQUEST → comment only (no status change).
- Skips any row whose current status is Rejected / Closed.
With --apply: calls updatePageStatus + addPageComment via Notion SDK v5; appends to profiles/<id>/rejection_log.md, recruiter_leads.md, email_check_log.md; writes processed_messages.json; saves TSV.

Failure modes (check-specific)

raw_emails.json missing — Phase 2 didn't run or Claude didn't write the file. Re-do Phase 2.
Notion 400 on status push — a status option doesn't exist in the DB (e.g. tried pushing Phone Screen). The mapping lives in engine/commands/check.js — keep it in sync with the DB's Status select options.
Cursor epoch stuck at 30d — last_check was never saved (all prior --apply runs were dry-run). Override with --since <ISO> once, then --apply will bump last_check.

indeed-prep

Three-phase Indeed ingest. Indeed has no public API and Cloudflare blocks scraping; the only reliable path is opening search pages in a browser. The CLI hands Claude a playbook; Claude does the browser work; the next scan ingests the result.

Use when: profile has discovery:indeed in modules and discovery.indeed.keywords in profile.json (currently: lilia).

Phase 1 — playbook (CLI)

node engine/cli.js indeed-prep --profile <id>

Reads profile.discovery.indeed and prints a JSON payload:

scan_urls[] — one entry per keywords[] (Indeed search URL with location/radius/fromage)
extraction_snippet — JS to paste into the browser console; returns pipe-separated rows jk|title|company|location
viewjob_template — https://www.indeed.com/viewjob?jk={jk} (open these to read JD before keeping)
filters.cert_blockers[] — license keywords (CMA / RN / LVN / CPC / RDA / RDH …) that disqualify candidates with no clinical certs
filters.location_whitelist[] / location_blocklist[] — geography gates
ingest_file — absolute path where Phase 2 must write the result
instructions[] — ordered checklist for the browser session

Side effects: creates profiles/<id>/.indeed-state/ and seeds an empty raw_indeed.json if missing. Never overwrites an existing ingest file — re-running indeed-prep is safe and idempotent.

Phase 2 — browser (Claude via Chrome MCP)

Claude executes:

For each scan_urls[].url → open in a Chrome tab (recommend 2 in parallel to avoid CAPTCHA).
In each tab: paste extraction_snippet into the browser console; copy the pipe-separated rows.
Parse each row into { jk, title, company, location }.
Apply browser-side filters in this order (reject early):
- location_blocklist — skip if location matches any entry.
- location_whitelist (if non-empty) — keep ONLY if location matches.
- Title obvious-noise — driver / warehouse / nurse / therapist / physician (these never match the candidate's seeking intent).
For surviving rows: navigate to viewjob_template with the row's jk, fetch JD body, check for any cert_blockers keyword (single match → reject). Use:
```
document.querySelector('#jobDescriptionText')?.innerText?.substring(0,1500)
```
Capture per surviving entry: { jk, title, company, location, url?, postedAt? }.
Overwrite ingest_file with the JSON array.

If 0 entries survive → write []. The next scan will simply produce zero new applications.

CAPTCHA handling: if "Security Check" / "Один момент" appears, navigate to a different scan_urls[].url. If both tabs blocked, report to user — they may need to solve manually once.

Phase 3 — ingest (CLI)

node engine/cli.js scan --profile <id>

Standard scan flow: the indeed adapter (engine/modules/discovery/indeed.js) reads the ingest file referenced in data/companies.tsv (row: Indeed (<profile>) | indeed | <slug> | {"ingestFile": "profiles/<id>/.indeed-state/raw_indeed.json"}), normalizes entries, dedupes against data/jobs.tsv and applications.tsv, and appends fresh rows with status="Inbox" (RFC 014).

Failure modes (indeed-specific)

Empty payload after Phase 1 — discovery.indeed.keywords is empty or missing. Edit profile.json.
Phase 3 reports "ingest file not found" — Phase 2 didn't write to the same path Phase 1 printed. Re-run Phase 1 (it's idempotent), then verify Claude's write target matches ingest_file.
Phase 3 reports "0 fresh" — either the ingest file is empty ([]) or every entry is a duplicate. Check applications.tsv for prior jobId matches.
Cards extraction returns 0 rows — Indeed changed selectors. Update extraction_snippet in engine/commands/indeed_prepare.js to match current a[data-jk] / [data-testid="company-name"] markup.

answer

Two-phase application Q&A flow with reuse-first lookup against a Notion-backed answer bank. Per RFC 009.

Use when: the user invokes /job-pipeline answer with a question + role context, e.g. for an "Additional Information" field, a "Why X?" prompt, motivation/influences/values questions on application forms.

Phase 1 — search (CLI)

node engine/cli.js answer --profile <id> --phase search \
  --company "<Company>" --role "<Role>" --question "<question text>"

Prints JSON to stdout:

{
  "key": "figma||product manager, ai platform||why do you want to join figma?",
  "exact": { "pageId": "...", "question": "...", "answer": "...", "category": "Motivation" } | null,
  "partials": [ /* same shape, same company+role OR same question across companies */ ],
  "schema": { "categories": ["Behavioral","Technical","Culture Fit","Logistics","Salary","Other","Experience","Motivation"] },
  "category_suggestion": "Motivation"
}

Phase 2 — SKILL (Claude executes)

Step 1 — Parse the user request. Extract <company>, <role>, and <question> from the user input. If any is missing or ambiguous, ask the user once.

Step 2 — Run search phase. Call the CLI Phase 1 above with the three values.

Step 3 — Branch on results.

If exact is non-null → show the existing answer to the user, with category and a clear note ("Found this in your answer bank for this exact role+question. Reuse?"). Offer [reuse] / [regenerate] / [edit]. If reuse — skip to Step 7 with existingPageId = exact.pageId and unchanged answer.
If partials is non-empty → mention them as reference ("Same role, different question: ...; Same question for Stripe: ..."), but proceed to Step 4 unless user asks to reuse one.
Otherwise → go to Step 4.

Step 4 — Load memory. Read paths declared in profile.json.memory:

memory.writing_style_file
memory.resume_key_points_file
Files matching feedback_*.md under memory.feedback_dir

If the memory block is absent or any file is missing, fall back to profiles/<id>/resume_versions.json.

Step 5 — Generate the answer. Apply Humanizer Rules throughout. Default character limit: see feedback_210_char_limit.md (210 chars unless the form specifies otherwise; for essay-type questions like Linear's, ignore the default and write a fuller answer).

Step 6 — Show + categorize. Show draft to the user. Mention category_suggestion from search response and confirm or adjust. Wait for approval signals: "пойдет" / "ok" / "good" / "submitted" / "залил".

Step 7 — Write the draft file.

Write profiles/<id>/.answers/draft_<YYYYMMDD_HHMMSS>.json:

{
  "company": "Figma",
  "role": "Product Manager, AI Platform",
  "question": "Why do you want to join Figma?",
  "answer": "<final approved text>",
  "category": "Motivation",
  "notes": "Optional context. E.g. 210-char short version. Field: Additional Information.",
  "existingPageId": null
}

If the user chose to update an existing entry, set existingPageId to the matched pageId from Phase 1.

Phase 3 — push (CLI)

node engine/cli.js answer --profile <id> --phase push \
  --results-file profiles/<id>/.answers/draft_<timestamp>.json

CLI:

If existingPageId is set → updates Answer / Category / Notes on that Notion page.
Otherwise → creates a new page in profile.notion.application_qa_db_id.
Always writes a local .md backup to profiles/<id>/application_answers/<Company>_<role-slug>_<YYYYMMDD>.md. If a file with that name already exists today, suffix _v2, _v3 etc.
Prints JSON: { pageId, action: "created"|"updated", url, backupPath }.

Step 8 — Report to user

Summarize:

Action: created or updated.
Notion URL of the page.
Local backup path.
Char count of the saved answer.

Failure modes (answer-specific)

no notion.application_qa_db_id configured — profile.json is missing the field. For jared it's ca4fa9e8-b3a6-4ccb-bcc2-3a13ff6b06ae. For other profiles, create the Q&A DB in Notion first.
missing JARED_NOTION_TOKEN — load it from ~/.bashrc / .env. Same token used by sync and check.
invalid category — the draft includes a category not in the canonical 8. Fix to one of: Behavioral, Technical, Culture Fit, Logistics, Salary, Other, Experience, Motivation. The categorize() helper picks a default automatically.
Notion 400 on create — usually a missing required property or a Category option that doesn't exist in the DB. Categories must already be in the DB schema; do not invent new ones.
Search returns nothing for a clearly recurring question — the question text drift may exceed the 120-char dedup window. Look at partials for near-matches.

Global Guard Rails (prepare / answer modes)

These rules apply whenever Claude generates content or makes pipeline decisions. They reference per-profile config — do not hardcode profile-specific values here.

Level Filter

Single source of truth: profiles/<id>/filter_rules.json → title_blocklist.patterns and location_blocklist.patterns. Applied as case-insensitive substring matches against the full title string. Never hardcode level checks inline — add/remove patterns in filter_rules.json only.

When a new over-level title slips into Inbox after a scan: add the pattern to filter_rules.json → title_blocklist.patterns with a reason, then re-run node engine/cli.js validate --profile <id> to surface existing Inbox rows that match.

Company Cap

Config: profiles/<id>/filter_rules.json → company_cap.max_active (with optional company_cap.overrides per company). Active statuses for the cap: To Apply, Applied, Interview, Offer. (Inbox is intentionally excluded — those rows are pre-triage and may yet be archived.)

Cap is enforced at prepare time only — scan always lets all jobs through. If a company already has ≥ cap active rows, excess Inbox jobs stay as Inbox (not archived); they are skipped for the current prepare run and re-evaluated next time.

Fit Score (domain fit only)

Level does NOT affect fit score. Evaluate by domain match to the candidate's profile:

Strong — core domain match (see profiles/<id>/memory/user_resume_key_points.md for domain specifics) plus a relevant tech or product component
Medium — adjacent domain, or right domain with lesser location/format fit, or outside core domain but with a key component overlap (AI/ML, data platform, payments)
Weak — outside core domain with no overlapping component
Early-startup modifier (pre-Series B, <50 people): downgrade one level (Strong→Medium, Medium→Weak)

Profile-specific domain criteria: profiles/<id>/memory/user_resume_key_points.md.

Salary Expectations (auto-fill at prepare time)

Determined automatically from Company Tier × Role Level. No JD salary analysis needed.

Level parsing is per-profile. Engine uses profile.salary.level_parser from profile.json:

"pm" (default — Jared / fintech): Lead / Senior / PM. Catches "Lead", "Senior", "Sr.", "Sr ", and Capital One-style "Manager, Product Management" → Senior.
"healthcare" (Lilia): Senior / Coordinator / MedAdmin. Catches "Lead" / "Supervisor" / "Senior" → Senior; "Coordinator" / "Specialist" → Coordinator; everything else → MedAdmin.
"default" (single-row matrix): always returns default.

COL adjustment is per-profile (profile.salary.col_adjustment). Default for pm: +7.5% if hybrid/onsite in SF/NYC. For Lilia (Sacramento metro) the multiplier is 1.0 — no adjustment.

Salary matrix is per-profile (profile.json.salary.matrix). When the block is omitted the engine falls back to its default fintech-PM matrix (Jared parity). Company Tier values are stored per-company in the profile's Notion Companies DB and in profile.json.company_tiers.

The CLI surfaces the resolved config in prepare_context.salaryConfig — SKILL Step 6 reads it from there, never from disk.

Notion Field Completeness

Every Notion job page MUST have ALL of: Role, Company (relation — not empty), Status, Fit Score, Job URL, Source, Date Added, Work Format, City, State, Notes (fit rationale — never batch labels), Salary expectations.

Per-profile Notion DB id: profile.json → notion.jobs_pipeline_db_id.

Humanizer Rules (prepare / answer modes)

Apply during CL or answer generation — not as a separate post-pass.

Voice calibration

Match the profile's writing style from prepare_context.memory.writingStyle (engine-loaded from profile.json.memory.writing_style_file). When the field is null, fall back to these defaults:

Confident practitioner, not humble applicant. "I built X that delivered Y" — not "I was responsible for X."
7/10 formality: professional with energy and momentum.
Have opinions; react to facts rather than just reporting them.
Use "I" naturally — first person is honest, not unprofessional.
Numbers in every paragraph except the close.
Short paragraphs (2-3 sentences). Vary rhythm: short punchy sentences mixed with longer ones.
Be specific: concrete details over vague claims.

The defaults above describe Jared's tone. Other profiles (e.g. Lilia — warm, 5/10 formality, no metrics-per-paragraph rule) will override them entirely via their writingStyle memory file.

Banned vocabulary (AI tells)

Never use: delve, landscape, foster, underscore, pivotal, crucial, showcase, tapestry, testament, interplay, intricate.

No copula avoidance: use is/are/has instead of serves as/stands as/boasts.

No significance inflation: no "marking a pivotal moment", "reshaping", "setting the stage".

No superficial -ing phrases: no "highlighting", "underscoring", "ensuring", "reflecting".

No em dash overuse: use commas, periods, or parentheses instead.

No rule-of-three: don't force ideas into groups of three.

No negative parallelisms: no "It's not just X, it's Y".

No generic closers: no "exciting times", "the future looks bright".

No hedging: no "potentially", "it could be argued".

No filler: no "in order to", "it is important to note", "due to the fact that".

No opener clichés: no "Dear Hiring Manager, I am writing to express my interest…", no "I am passionate about [mission]", no "excited to".

Final anti-AI check

After writing, ask: "What makes this obviously AI-generated?" — fix any remaining tells before saving.

Memory files (load before generating)

In every prepare / answer session, read the engine-loaded memory from prepare_context.memory:

memory.writingStyle — writing style profile (from profile.json.memory.writing_style_file)
memory.resumeKeyPoints — skills / experience for matching (from profile.json.memory.resume_key_points_file)
memory.feedback[] — array of {file, content} for each feedback_*.md under profile.json.memory.feedback_dir

For the answer mode (no prepare_context.json available), read the same files directly from disk under the paths declared in profile.json.memory.

If a memory entry is null / missing: fall back to profiles/<id>/resume_versions.json as the source of truth for the candidate's experience. Always ask the user which archetype is most relevant rather than improvising facts.

name

job-pipeline

description

job-pipeline — Multi-profile Job Search Pipeline

Single engine, per-profile data. All commands take --profile <id>. Currently supported profiles: jared.

Commands

/job-pipeline scan — Discover new jobs across configured ATS adapters (greenhouse / lever / ashby / smartrecruiters / workday / calcareers / usajobs / indeed / remoteok). Append to shared pool + per-profile pipeline.
/job-pipeline validate — Pre-flight: TSV hygiene, company-cap check, URL liveness on active applications.
/job-pipeline sync — Reconcile per-profile applications with Notion. Default = dry-run, must pass --apply to commit.
/job-pipeline prepare — Two-phase processing of Inbox jobs (RFC 014): mechanical pre-phase (filter / URL check / JD fetch / salary) + Claude LLM phase (geo check / fit score / CL gen / Notion push). Commit transitions row from Inbox → To Apply (decision=to_apply) or Inbox → Archived (decision=archive).
/job-pipeline check — Two-phase Gmail response check: --prepare builds Gmail search batches for Claude MCP, --apply consumes Claude-written emails and updates Notion + TSV + logs.
/job-pipeline indeed-prep — Phase 1 of Indeed ingest. Prints scan URLs + JS extraction snippet + filter context for the Claude browser MCP session. Phase 2 (browser fetch) is manual via Chrome MCP. Phase 3 = standard scan (the indeed adapter ingests the file Claude wrote).
/job-pipeline answer — Generate or reuse application answers (Why join? / Influences? / Motivation? etc.). Three-phase: search Notion Q&A DB by dedup key → reuse if exact match else generate via Humanizer Rules → push answer back to Notion + write local .md backup. Per-profile DB at profile.notion.application_qa_db_id.

If no mode is specified, show this help and ask which to run.

Required context per session

Before running any command, verify:

Profile id — resolution policy:
- Default profile is jared. If the user invokes /job-pipeline <cmd> cold with no profile hint, use jared and do NOT ask.
- NLP extraction: if the user's phrase mentions another profile by name (e.g. "для Лили", "scan Lilia's pipeline", "Lilia profile"), extract that name, lowercase it, map it to the matching profiles/<id>/ directory, and use it. Valid profiles = subdirectories of profiles/ (excluding _example).
- Session-sticky: once a non-default profile is resolved in a session, keep using it for subsequent commands in the same session. Switch back to jared only on an explicit phrase like "switch to Jared" / "для Джареда".
- Ask only when ambiguous: if the phrase mentions a name that doesn't match any profile dir, ask once (list valid profiles), then stick.
- Always pass the resolved id to the CLI via --profile <id>.
Working directory = ai-job-searcher/ (public repo clone at ~/Desktop/Claude Code/ai-job-searcher/). All commands resolve data/ and profiles/ relative to cwd.
Secrets in env. For profile <id>, the CLI reads <ID_UPPER>_* env vars only:
- JARED_NOTION_TOKEN (required for sync)
- JARED_USAJOBS_API_KEY, JARED_USAJOBS_EMAIL (required if discovery:usajobs is enabled)
Companies pool. data/companies.tsv must exist. Bootstrap once with:
```
node engine/bin/seed_companies.js
```
(parses 247 targets from the legacy Job Search/find_jobs.js).

Step-by-step

scan

node engine/cli.js scan --profile <id> [--dry-run] [--verbose]

Load profiles/<id>/profile.json → enabled discovery modules + filter rules.
Load data/companies.tsv → group targets by adapter source.
Apply per-profile discovery.companies_whitelist/blacklist.
Invoke each enabled adapter via engine/core/scan.js orchestrator (errors per source isolated, do not block the run).
Dedupe new jobs against data/jobs.tsv master pool by (source, jobId).
Atomically write data/jobs.tsv + append fresh rows to profiles/<id>/applications.tsv with status="Inbox" (RFC 014 — TSV-only state for fresh-after-scan rows; transitions to To Apply after prepare --phase commit decision=to_apply). Notion DBs never see Inbox — they keep the 8-status set.

--dry-run prints planned writes without touching disk.

validate

node engine/cli.js validate --profile <id> [--dry-run]

Read-only checks. Exit 0 if clean, 1 if any issue:

TSV hygiene — both data/jobs.tsv and profiles/<id>/applications.tsv parse cleanly.
company_cap — counts active applications per company against profile.filter_rules.company_cap.max_active plus per-company overrides. Active = To Apply / Applied / Interview / Offer. (Inbox is intentionally excluded — those rows are pre-triage and may yet be archived.)
URL liveness — HEAD-pings each active application URL with concurrency 8, falls back to GET on 405/501. Reports any 4xx/5xx/timeout. Skipped under --dry-run.

sync

node engine/cli.js sync --profile <id> [--apply] [--verbose]

Bidirectional reconcile with the profile's notion.jobs_pipeline_db_id:

Push — bare push from sync is disabled (legacy). Notion pages are created exclusively by prepare --phase commit decision=to_apply, which assembles fit + CL + salary + Notion page atomically. sync is pull-only.
Pull — fetch all Notion pages, match by key field (<source>:<jobId>), apply Notion's status and notion_page_id to local TSV (Notion wins on status).

Default mode prints the plan and runs the read-only pull preview. Pass --apply to actually mutate Notion and TSV.

Failure modes / how to recover

companies pool is empty — run node engine/bin/seed_companies.js once.
missing JARED_NOTION_TOKEN — load it from ~/.bashrc / .env. Token format: ntn_….
adapter source mismatch or no adapter for source X — profile.json references an unknown discovery module. Either remove from modules or add the adapter file.
HTTP 429 from greenhouse/lever — adapters retry with exp backoff up to 3 attempts; on persistent 429, the source is reported in the summary and the rest of the run continues.
CalCareers HTML changed — sanity warn ResultCount marker missing indicates upstream changed; investigate engine/modules/discovery/calcareers.js regexes.

Anti-patterns — do NOT

Do not run any command without --profile — the CLI will refuse and print help.
Do not invoke sync without --apply and assume it will write — it always defaults to dry-run.
Do not invoke check --apply without first running check --prepare + the Phase-2 Gmail MCP reads — the script will error on missing raw_emails.json.
Do not edit data/jobs.tsv or profiles/<id>/applications.tsv by hand while a scan is running. Atomic-rename protects against partial writes but not against logical conflicts.
Do not commit data/ or profiles/<id>/ to git — both are in .gitignore for a reason. Only profiles/_example/ is committed.
Do not mix profiles in one process. Each CLI invocation loads exactly one profile's secrets — never load JARED_* and PAT_* together.
Do not generate a new application answer without first running answer --phase search and inspecting matches. Reuse before regenerate is the rule for /job-pipeline answer.
Do not push an answer to the Notion Q&A DB without an explicit user approval signal (пойдет / good / submitted / залил). Same shared-state rule as the cover-letter flow.
Do not invent new Q&A categories. Use one of the 8 canonical names from the DB; the categorizer picks a default automatically.

prepare

The loop has three phases of CLI invocation, all using the same prepare_context.json file:

Mode	Source pool	When
`--mode fresh`	Inbox rows that pass blocklist / cap / URL-check (the standard pipeline)	First iteration of every run
`--mode topup`	`deferredQueue` (rows that passed filter but didn't fit the previous batch)	Iterations 2–3 if Strong+Medium < 30
`--mode weak-fallback`	`deferredQueue` + already-Weak Inbox rows in TSV	After 3 iterations if Strong+Medium still < 30

Phase 1 — pre (CLI, called by Claude)

node engine/cli.js prepare --profile <id> --phase pre --mode <fresh|topup|weak-fallback> [--batch 30] [--need K] [--dry-run]

Runs without LLM cost. Outputs profiles/<id>/prepare_context.json:

batch[] — jobs ready for SKILL judgement (URL-checked, JD-fetched, salary computed). Topup / weak-fallback APPEND to this array.
batch[i].wasAlreadyWeak: true (weak-fallback only) — row was judged Weak in a prior run; carry over priorFitScore / priorFitRationale instead of re-judging.
deferredQueue[] — TSV keys that passed the filter pipeline but weren't URL-checked yet (used by the next topup / weak-fallback call).
skipped[] — engine-level skip reasons (company_cap / title_blocklist / url_dead / already_evaluated_weak / etc.).
stats.inboxExhausted: bool — true when no more fresh rows remain in TSV outside the current batch (excluding duplicate-flagged rows). The loop must stop iterating when this flips true.
stats.skipReasons / stats.deferred / stats.weakFallbackRuns etc.

Phase 2 — autonomous loop (Claude orchestrates)

When the user invokes /job-pipeline prepare, Claude runs the following loop. Loop logic lives in the SKILL; the engine is stateless.

Step A — initialize
  iteration = 0
  verdicts = { strong: [], medium: [], weak: [] }    # per-job evaluation buckets
  Run: prepare --phase pre --mode fresh --batch 30
  Read: profiles/<id>/prepare_context.json

Step B — per-iteration evaluation loop (chase Strong)
  while True:
    iteration += 1
    new_entries = batch entries we haven't judged yet
    For each entry in new_entries:
      run Steps 1, 3, 4, 5, 5.7, 6, 7, 8 (fit + CL gen, NOT Notion push)
      append to verdicts[<fitScore-bucket>]

    # Stop conditions — keep iterating to chase more Strong even if
    # Medium is plentiful. Selection priority is Strong → Medium → Weak,
    # so 30 pure Strong always beats 15 Strong + 15 Medium.
    if len(verdicts.strong) >= 30: break
    if stats.inboxExhausted: break
    if iteration >= 3: break

    # Topup — try to find more Strong (need = how many Strong we still want)
    Run: prepare --phase pre --mode topup --need (30 - len(verdicts.strong))
    Re-read prepare_context.json → continue

Step C — weak fallback (only if needed)
  strongMedium = len(verdicts.strong) + len(verdicts.medium)
  if strongMedium < 30 AND NOT stats.inboxExhausted:
    Run: prepare --phase pre --mode weak-fallback --need (30 - strongMedium)
    Re-read prepare_context.json
    new_entries = batch entries not yet judged
    For each entry:
      if entry.wasAlreadyWeak:
        # Reuse saved verdict, do NOT spend tokens re-judging
        fitScore = entry.priorFitScore  (always "Weak")
        fitRationale = entry.priorFitRationale
        skip Step 4
        run Steps 7, 8 (CL gen) so the row is push-ready
        append to verdicts.weak
      else:
        run Steps 1, 3, 4, 5, 5.7, 6, 7, 8 as usual

Step D — pick top 30 by sequential priority
  # Sequential fill: take ALL Strong first, then Medium to fill, then Weak.
  # Mediums never bump Strongs; Weaks never bump Mediums.
  candidates = verdicts.strong + verdicts.medium + verdicts.weak  (in that order)
  final = candidates[:30]
  if len(final) < 30:
    warn user: "only N candidates found, push N instead of 30"

Step E — push final to Notion
  For each job in `final`: run Step 9 (Notion page creation)

Step F — write results + commit
  Step 10: write prepare_results_<timestamp>.json
    decision = "to_apply" for each job in `final`
    decision = "skip" for everything else evaluated this run (carries fitScore so the
               row gets persisted to TSV and won't be re-judged next prepare run)
  Step 11: prepare --phase commit --results-file <path>
  Step 12: report to user

Stdout per iteration (mirror this format so the user sees progress):

prepare loop iteration 1/3: 30 candidates → 8 Strong, 12 Medium, 10 Weak
prepare loop iteration 2/3: 27 new candidates → 5 Strong, 9 Medium, 13 Weak
prepare loop iteration 3/3: 21 new candidates → 2 Strong, 9 Medium, 10 Weak
loop done: 15 Strong, 30 Medium, 33 Weak (target 30)
selecting top 30: 15 Strong + 15 Medium → push to Notion

If weak-fallback runs:

strong+medium below target (8+5=13 of 30); falling back to weak…
weak-fallback: pulled 17 entries (12 already-Weak, 5 fresh)
final: 8 Strong + 5 Medium + 17 Weak = 30 → push to Notion

If inboxExhausted:

loop done: 4 Strong, 7 Medium (Inbox exhausted)
warn: only 11 candidates found, push 11 instead of 30
selecting all 11 (4 Strong + 7 Medium) → push to Notion

Anti-patterns — do NOT

Do not push to Notion (Step 9) inside the iteration loop. Push happens ONCE on the top-30 final list (Step E above). Otherwise we'd create 60–90 Notion pages and only some would be marked final.
Do not re-judge entry.wasAlreadyWeak === true rows in Step 4. The verdict is already in TSV; re-asking burns tokens. Carry priorFitScore / priorFitRationale straight into the verdict bucket.
Do not keep iterating past 3 even if Strong < 30 and Inbox isn't exhausted. The cap is by design — token budget protection. After 3, the SKILL falls back to weak.
Do not stop the loop just because Strong+Medium ≥ 30. Selection priority is Strong → Medium → Weak; we keep iterating to find more Strong (Mediums don't satisfy the target). Stop only on Strong ≥ 30, iter ≥ 3, or inboxExhausted.
Do not terminate the loop early just because iteration N had 0 Strong. Strong from later iterations can still hit the target; keep going until one of the three stop conditions is true.
Do not invoke --mode topup or --mode weak-fallback directly without first running --mode fresh (or another mode) to seed prepare_context.json. Topup / weak-fallback exit 1 if the context file is missing.

The reference for each Step (1, 3, 4, 5, 5.7, 6, 7, 8, 9, 10, 11, 12) is below — these describe the per-job evaluation logic that the loop above orchestrates.

After the CLI writes prepare_context.json, Claude then:

Step 1 — Load memory

Read profiles/<id>/prepare_context.json first. The memory block (populated by profile_loader from profile.json.memory) contains:

memory.writingStyle — content of profiles/<id>/memory/user_writing_style.md
memory.resumeKeyPoints — content of profiles/<id>/memory/user_resume_key_points.md
memory.feedback — array of {file, content} for each feedback_*.md under the configured dir

Step 2 — Read prepare_context.json

Read profiles/<id>/prepare_context.json

Step 3 — Geo decision (now profile-driven, L-4 / RFC 013)

The engine pre-phase already enforces profile.geo policy and surfaces the result on every batch entry. Read prepare_context.batch[i].geo_decision — do NOT WebFetch for geo.

For each job in batch:

geo_decision === "allowed" → proceed to Step 4. The geo_matched_by field describes WHY it passed ("city:Sacramento" / "remote" / "country:US" / "unrestricted") — useful when generating the fit rationale.
geo_decision === "rejected" → already pruned by engine. You won't see it in batch[] (it's in prepare_context.skipped[] with reason "geo_metro_miss" / "geo_country_miss" / "geo_remote_only_miss" / "geo_blocklist" / "geo_no_location"). The stats.skipReasons breakdown shown in Step 12 includes geo-counters.
geo_decision field absent (legacy prepare_context.json from before L-4 migration, or profile without profile.geo block) → fallback: WebFetch JD location, apply simple US-policy as before. Engine version post-2026-05-04 always populates geo_decision.

Step 4 — Fit scoring (per job)

Apply Fit Score rules from ## Global Guard Rails below. Assign one of: Strong / Medium / Weak.

Write a 1-sentence fit rationale (concrete domain overlap, not generic praise). This goes into the Notion Notes field.

Early-startup modifier: if company is pre-Series B or <50 employees — downgrade one level.

Step 5 — Bucket: fit (loop-aware)

Geo filtering already happened in the engine pre-phase (Step 3 is read-only) — only entries with geo_decision === "allowed" reach the batch. The remaining gate is fit.

If the entry has wasAlreadyWeak: true, it goes straight to verdicts.weak with fitScore="Weak" carried over from entry.priorFitScore. No Step 4 re-judging.

Report iteration counts to the user (per the stdout format in Phase 2 above): how many landed in each bucket per iteration, what the running total is.

Step 5.7 — Auto-tier unknown companies

If prepare_context.unknownTierCompanies is non-empty, assign each company a tier (S / A / B / C). Do not prompt the user — the user never tiers companies manually; this is your job.

Tier criteria (read profile-flavor-aware):

Tier	Jared (PM / fintech-leaning)	Lilia (healthcare RN)
S	Public big-tech / top fintech, $10B+ market cap, top brand recognition. AI-native or strong AI investment.	Major regional health systems (Kaiser, UC Davis, Sutter, Dignity), 10K+ employees, full benefits, RN union presence.
A	Late-stage / public mid-cap. Strong funding, well-known brand. AI presence non-trivial.	Large medical foundations / dialysis chains / managed care. 1K+ employees, multi-site.
B	Growth-stage Series C–E, $1–5B valuation. Recognized in their sector.	Specialty clinic chains (eye, dental, dermatology). Multi-location, regional reach.
C	Early/mid-stage Series A–B, <$1B. Less known.	Small private practices, single-clinic operations, local services.

Sources of signal (use freely, in this order):

Company name recognition + general knowledge (size, valuation, funding round).
jdText if present — funding mentions, employee count, "Series X", "publicly traded".
WebFetch the company website / Crunchbase / LinkedIn for ambiguous cases (one extra fetch per unknown company is fine).

After tiering, the engine will persist the assignments to profile.json.company_tiers on commit (one-shot per company). It also uses them to set the Notion Companies DB Tier field on first push.

Step 6 — Salary (auto-fill)

For each remaining job:

If prepare_context.batch[i].salary is non-null: use it as-is.
If salary is null AND the entry has unknownTier: true: look up the tier you just assigned in Step 5.7, then use prepare_context.salaryConfig (per-profile matrix + level parser + COL config from profile.json.salary). When salaryConfig is null the engine's default fintech-PM matrix in engine/core/salary_calc.js applies. Pick the row at Tier × Level (level = engine parseLevel(title, salaryConfig.levelParser)). Compute salaryMin / salaryMax from the matrix and apply the COL adjustment defined in salaryConfig.colAdjustment (defaults: SF/NYC +7.5% unless Remote).
If salary is null AND unknownTier is not true: this means the tier is known but the matrix doesn't cover the level — flag to user with the company name and title, do NOT invent a range.

Step 7 — Archetype selection (per job)

Choose the best resume archetype from profiles/<id>/resume_versions.json for this specific role. Prefer the archetype whose domain keywords overlap most with the JD / job title.

Record resumeVer = archetype key (e.g. "fintech-pm-v3").

Step 8 — Cover letter generation (template-first, per job)

8a — Load base template.

Read profiles/<id>/cover_letter_versions.json and detect its shape:

Template-variants shape (Lilia and similar profiles): top-level defaults object with locked-down p2, p3, p4_template, availability, sign strings, plus a letters array of { job_id, role, company, p1 } variants.
Library shape (Jared and similar profiles): top-level keys map to entries shaped as { filename, paragraphs: [p1, p2, p3, p4] }.

8b — Pick the base entry.

Match priority, in order — stop at the first hit:

Same company + same role focus — exact company match AND role keywords overlap (e.g. existing entry for Affirm + Capital-focused, current job is Affirm Capital PM → reuse).
Same company, different role — useful when prior letter exists for the company. Replace P3 (why this company) sparingly; P2 proof stays.
Same archetype, different company — match the chosen resumeVer (Step 7) to an existing entry whose role focus aligns with the archetype (e.g. resumeVer = ConsumerLending → look for lendbuzz_creditcard / affirm_capital / similar entries).
Closest archetype — if no archetype-aligned entry exists, pick the most domain-adjacent one (e.g. PaymentsInfra → fall back to PlatformInfra-style entry).

In template-variants shape: defaults.{p2, p3, p4_template} IS the base — every letter reuses them. Only P1 varies, and the letters array is your reference set for tone/length on past P1s.

8c — Rebuild the letter.

P2 (Core proof) — copy verbatim from the base entry. Do NOT paraphrase, reorder facts, or substitute different metrics. The candidate's achievements are stable; rewriting P2 introduces drift and dilutes proof.
P3 (Secondary proof / AI angle / why this company) — copy verbatim from the base entry IF the role focus matches. If the new role is in a clearly different sub-domain (e.g. base is growth-retention, new role is platform-API), regenerate ONLY this paragraph. Apply Humanizer Rules.
P1 (Hook — company-specific) — always regenerate. Use the JD signal + company-specific challenge as anchor. Apply Humanizer Rules. Pattern: "[Company] does [X]. The harder problem is [Y]. That's exactly what I've solved at [previous role]."
P4 (Close) — for template-variants shape, fill p4_template placeholders ({availability}, etc.). For library shape, copy verbatim from base entry.

Output: a 4-paragraph CL where P2 (and usually P3) are exact copies of an existing humanized letter, and only the hook (and rarely P3) is freshly written for this company.

8d — Final humanizer pass.

8e — Emit paragraphs (engine writes the files).

Do NOT write the CL file yourself. Per RFC 019 / BL-14, the SKILL produces paragraphs and the engine writes both .md and .pdf in the commit phase.

In the results.json entry for this job (Step 10) include:

clParagraphs: string[] — exactly the 4 finalized paragraphs (P1, P2, P3, P4) in order, no headers, no blank-line padding inside the strings. Engine joins them with \n\n for the MD file and feeds them to pdfkit for the PDF.
clKey — filename stem only, no extension and no path. Convention: <Company>_<role-slug>_<YYYYMMDD> (e.g. Affirm_senior-ai-pm_20260505). Engine derives the company sub-folder by slugifying applications.tsv.companyName — you do not need to slugify yourself.
clBaseKey — the base entry key / job_id you reused for proof paragraphs (helps audit batch consistency: if 10 letters share clBaseKey = "affirm_capital", P2 is identical across them, which is the point).

Files the engine produces on commit:

profiles/<id>/cover_letters_md/<CompanySlug>/<clKey>.md — overwritten on every commit run with the fresh content from this batch.
profiles/<id>/cover_letters/<CompanySlug>/<clKey>.pdf — generated via pdfkit if it doesn't already exist; existing PDFs are preserved (idempotent).

Step 9 — Notion page creation (per job in the final top-N)

For each job in the final top-N:

Create a Notion page in profile.notion.jobs_pipeline_db_id with ALL required fields (see Notion Field Completeness in Guard Rails):

Title — job title
Company — relation (array with the page id from 9a)
Status — "To Apply"
Fit Score — Strong / Medium (from Step 4)
URL — from url
Source — from source
Date Added — today (YYYY-MM-DD)
Work Format — from JD or job listing
City — from JD (or "Remote")
State — from JD (or "Any" when unspecified / remote US-wide)
Notes — fit rationale from Step 4
Salary Expectations — display string like "$140-190K ($165K mid)"
Salary Min — numeric dollar amount (e.g., 140000)
Salary Max — numeric dollar amount (e.g., 190000)
Cover Letter — full PDF filename including the .pdf extension, e.g. Affirm_analyst-ii-credit-risk-analytics_20260420.pdf (i.e. <clKey>.pdf). This is the value users search by in Finder / Drive to locate the actual file. Per BL-14: rich_text field, not file-attachment.
Resume Version — select, from resumeVer

Schedule — select, from prepare_context.batch[i].schedule (extracted by the engine from JD text — values like "Full-time" / "Part-time" / "Per Diem" / "PRN" / "Contract"). Skip the field when the entry has no schedule key (extractor returned null).
Requirements — rich_text, from prepare_context.batch[i].requirements (short bulleted summary of education / years experience / language / certifications). Skip when the entry has no requirements key.

Industry is a rollup — do NOT set it. It is inherited automatically from the Company relation.

Record the returned notion_page_id.

Step 10 — Write results file

Write profiles/<id>/prepare_results_<YYYYMMDD_HHMMSS>.json ONCE at the end of the run, covering every job evaluated across all loop iterations (not just the final top-N).

Result entries:

decision: "to_apply" — the top-N pushed to Notion in Step 9. Carries clKey, clParagraphs, clBaseKey, resumeVer, notionPageId, salaryMin, salaryMax, plus fitScore / fitRationale. Engine writes the MD + PDF files from clParagraphs on commit (BL-14 / RFC 019); do not include clPath — engine derives it from the company slug + clKey.
decision: "skip" — every other judged row from the loop. Carries fitScore: "Weak" (or rarely Strong/Medium that didn't make the cut) + fitRationale so the engine commit phase persists the verdict and filterAlreadyEvaluated skips them on the next prepare run. (Strong/Medium that didn't make top-30 still get persisted; if they pass filter next time, they'll be picked up again.)

Format:

{
  "profileId": "<id>",
  "generatedAt": "<ISO timestamp>",
  "companyTiers": {
    "<company-name>": "S|A|B|C"
  },
  "results": [
    {
      "key": "<source>:<jobId>",
      "decision": "to_apply",
      "fitScore": "Strong",
      "fitRationale": "...",
      "geo": "us-compatible",
      "clKey": "<Company>_<role-slug>_<YYYYMMDD>",
      "clParagraphs": [
        "P1 — company-specific hook ...",
        "P2 — core proof, verbatim from base entry ...",
        "P3 — secondary proof / why this company ...",
        "P4 — close ..."
      ],
      "clBaseKey": "<reused-entry-key-from-cover_letter_versions.json or null>",
      "resumeVer": "<archetype-key>",
      "notionPageId": "<uuid>",
      "salaryMin": 140000,
      "salaryMax": 190000
    },
    {
      "key": "<source>:<jobId>",
      "decision": "skip",
      "fitScore": "Weak",
      "fitRationale": "...",
      "geo": "us-compatible"
    }
  ]
}

clParagraphs schema notes:

Required for decision: "to_apply". If absent, the engine logs a warning and falls back to legacy behavior (no MD/PDF written this batch); the row still becomes "To Apply" in TSV/Notion but has no file. Always include it.
Absent for decision: "skip" and decision: "archive" — no CL needed.
Must be exactly 4 strings (P1/P2/P3/P4). Each string is one paragraph; no embedded blank lines, no markdown headers, no \n\n inside individual strings (engine inserts the spacing).

Step 11 — Commit phase (CLI)

node engine/cli.js prepare --profile <id> --phase commit \
  --results-file profiles/<id>/prepare_results_<timestamp>.json

This updates applications.tsv: to_apply entries get status="To Apply", cl_key, cl_path, resume_ver, notion_page_id, salary_min, salary_max. Run with --dry-run first to preview.

Step 12 — Report to user

Summarize at the user level (BL-11 — what the user sees, not engine internals):

Headline: pushed N jobs to Notion (Strong: A, Medium: B, Weak: C). If N < 30, lead with only N candidates found and the reason (Inbox exhausted or loop hit 3-iteration limit).
Loop summary: how many iterations ran, whether weak-fallback triggered. Example: loop: 2 iterations + weak-fallback (15 already-Weak rows reused).
CL reuse breakdown: group by clBaseKey — 8 reused affirm_capital, 3 reused chime_growth, 1 written from scratch.
Pre-phase skips: read from prepare_context.stats.skipReasons and surface the breakdown verbatim (e.g. company_cap: 5, title_blocklist: 2, url_dead: 1). Omit if {}.
Auto-tier assignments: only if Step 5.7 ran — list the company → tier mapping.
Deferred queue: prepare_context.stats.deferred — number of fresh rows that didn't make it into the batch. Mention only if non-zero AND inboxExhausted is false (otherwise queue's empty).
Warnings / anomalies: any invalid resumeVer, invalid tier, fit-validation warnings the engine logged.

Failure modes / how to recover (prepare-specific)

prepare_context.json missing — run --phase pre first.
jdText is null for many jobs — Greenhouse / Lever API may have changed; investigate engine/core/jd_cache.js. Geo + fit can still run from the job title + company name.
Notion page creation fails — check JARED_NOTION_TOKEN env var and that the DB id in profile.json is correct. Re-run the SKILL for the failed jobs only (skip already-created ones by key).
Unknown company tier (salary = null AND unknownTier: true) — assign the tier in Step 5.7 and put it in results.companyTiers. The commit phase persists it to profile.json.company_tiers automatically; no need to edit the file by hand.

check

Two-phase Gmail response checker. Reads are delegated to Claude via Gmail MCP — the script never touches OAuth.

Phase 1 — prepare (CLI)

node engine/cli.js check --profile <id> --prepare [--since <ISO>]

Builds a search plan without hitting Gmail:

Loads profiles/<id>/applications.tsv → picks rows where status ∈ {Applied, To Apply, Interview, Offer} AND notion_page_id is set → forms activeJobsMap. (Inbox is excluded — those rows haven't been pushed to Notion, so no email thread can match yet.)
Computes cursor epoch: saved.last_check or --since ISO, clamped to 30 days ago.
Emits Gmail query batches (10 companies/batch + fixed LinkedIn batch + fixed recruiter batch).
Writes profiles/<id>/.gmail-state/check_context.json.
Prints JSON: { epoch, batches, processedIds }.

Phase 2 — Gmail reads (Claude via MCP)

Claude executes in parallel:

For each batches[i] → call Gmail MCP search_threads with the query + pageSize: 50.
Collect all messageId values across threads, dedupe, remove any already in processedIds.

For each new messageId → call gmail_read_message in parallel. Build per-email object:

{
  "messageId": "...",
  "subject": "<headers.Subject>",
  "from": "<headers.From>",
  "date": "<headers.Date>",
  "body": "<body>"
}

Write the array to profiles/<id>/.gmail-state/raw_emails.json via Write tool.

If 0 new IDs found → write [] and proceed (Phase 3 still runs to bump last_check).

Phase 3 — apply (CLI)

node engine/cli.js check --profile <id> [--apply]

Default is dry-run (plan only, no Notion writes, no TSV mutations). With --apply:

Reads raw_emails.json + check_context.json.
Filters out messages already in processed_messages.json.
Branches per email:
- LinkedIn job alert (from:jobalerts-noreply@linkedin.com) → append a new TSV row with status="Inbox" (enters the same fresh-discovery lifecycle as scan-derived rows; next prepare will triage it).
- Recruiter outreach (subject matches recruiter keywords): if sender's company is in pipeline → append TSV row with status="Inbox"; otherwise → recruiter_leads.md only.
- Normal pipeline → classifier.js assigns one of: REJECTION, INTERVIEW_INVITE, INFO_REQUEST, ACKNOWLEDGMENT, OTHER. Then email_matcher.js resolves to a pipeline (company, role) tuple.
Plans Notion actions per match:
- REJECTION → Status → Rejected + add comment.
- INTERVIEW_INVITE → Status → Interview + add comment. (Notion DB only has Interview — do NOT push Phone Screen / Onsite.)
- INFO_REQUEST → comment only (no status change).
- Skips any row whose current status is Rejected / Closed.
With --apply: calls updatePageStatus + addPageComment via Notion SDK v5; appends to profiles/<id>/rejection_log.md, recruiter_leads.md, email_check_log.md; writes processed_messages.json; saves TSV.

Failure modes (check-specific)

raw_emails.json missing — Phase 2 didn't run or Claude didn't write the file. Re-do Phase 2.
Notion 400 on status push — a status option doesn't exist in the DB (e.g. tried pushing Phone Screen). The mapping lives in engine/commands/check.js — keep it in sync with the DB's Status select options.
Cursor epoch stuck at 30d — last_check was never saved (all prior --apply runs were dry-run). Override with --since <ISO> once, then --apply will bump last_check.

indeed-prep

Use when: profile has discovery:indeed in modules and discovery.indeed.keywords in profile.json (currently: lilia).

Phase 1 — playbook (CLI)

node engine/cli.js indeed-prep --profile <id>

Reads profile.discovery.indeed and prints a JSON payload:

scan_urls[] — one entry per keywords[] (Indeed search URL with location/radius/fromage)
extraction_snippet — JS to paste into the browser console; returns pipe-separated rows jk|title|company|location
viewjob_template — https://www.indeed.com/viewjob?jk={jk} (open these to read JD before keeping)
filters.cert_blockers[] — license keywords (CMA / RN / LVN / CPC / RDA / RDH …) that disqualify candidates with no clinical certs
filters.location_whitelist[] / location_blocklist[] — geography gates
ingest_file — absolute path where Phase 2 must write the result
instructions[] — ordered checklist for the browser session

Side effects: creates profiles/<id>/.indeed-state/ and seeds an empty raw_indeed.json if missing. Never overwrites an existing ingest file — re-running indeed-prep is safe and idempotent.

Phase 2 — browser (Claude via Chrome MCP)

Claude executes:

For each scan_urls[].url → open in a Chrome tab (recommend 2 in parallel to avoid CAPTCHA).
In each tab: paste extraction_snippet into the browser console; copy the pipe-separated rows.
Parse each row into { jk, title, company, location }.
Apply browser-side filters in this order (reject early):
- location_blocklist — skip if location matches any entry.
- location_whitelist (if non-empty) — keep ONLY if location matches.
- Title obvious-noise — driver / warehouse / nurse / therapist / physician (these never match the candidate's seeking intent).
For surviving rows: navigate to viewjob_template with the row's jk, fetch JD body, check for any cert_blockers keyword (single match → reject). Use:
```
document.querySelector('#jobDescriptionText')?.innerText?.substring(0,1500)
```
Capture per surviving entry: { jk, title, company, location, url?, postedAt? }.
Overwrite ingest_file with the JSON array.

If 0 entries survive → write []. The next scan will simply produce zero new applications.

CAPTCHA handling: if "Security Check" / "Один момент" appears, navigate to a different scan_urls[].url. If both tabs blocked, report to user — they may need to solve manually once.

Phase 3 — ingest (CLI)

node engine/cli.js scan --profile <id>

Failure modes (indeed-specific)

Empty payload after Phase 1 — discovery.indeed.keywords is empty or missing. Edit profile.json.
Phase 3 reports "ingest file not found" — Phase 2 didn't write to the same path Phase 1 printed. Re-run Phase 1 (it's idempotent), then verify Claude's write target matches ingest_file.
Phase 3 reports "0 fresh" — either the ingest file is empty ([]) or every entry is a duplicate. Check applications.tsv for prior jobId matches.
Cards extraction returns 0 rows — Indeed changed selectors. Update extraction_snippet in engine/commands/indeed_prepare.js to match current a[data-jk] / [data-testid="company-name"] markup.

answer

Two-phase application Q&A flow with reuse-first lookup against a Notion-backed answer bank. Per RFC 009.

Phase 1 — search (CLI)

node engine/cli.js answer --profile <id> --phase search \
  --company "<Company>" --role "<Role>" --question "<question text>"

Prints JSON to stdout:

{
  "key": "figma||product manager, ai platform||why do you want to join figma?",
  "exact": { "pageId": "...", "question": "...", "answer": "...", "category": "Motivation" } | null,
  "partials": [ /* same shape, same company+role OR same question across companies */ ],
  "schema": { "categories": ["Behavioral","Technical","Culture Fit","Logistics","Salary","Other","Experience","Motivation"] },
  "category_suggestion": "Motivation"
}

Phase 2 — SKILL (Claude executes)

Step 1 — Parse the user request. Extract <company>, <role>, and <question> from the user input. If any is missing or ambiguous, ask the user once.

Step 2 — Run search phase. Call the CLI Phase 1 above with the three values.

Step 3 — Branch on results.

If exact is non-null → show the existing answer to the user, with category and a clear note ("Found this in your answer bank for this exact role+question. Reuse?"). Offer [reuse] / [regenerate] / [edit]. If reuse — skip to Step 7 with existingPageId = exact.pageId and unchanged answer.
If partials is non-empty → mention them as reference ("Same role, different question: ...; Same question for Stripe: ..."), but proceed to Step 4 unless user asks to reuse one.
Otherwise → go to Step 4.

Step 4 — Load memory. Read paths declared in profile.json.memory:

memory.writing_style_file
memory.resume_key_points_file
Files matching feedback_*.md under memory.feedback_dir

If the memory block is absent or any file is missing, fall back to profiles/<id>/resume_versions.json.

Step 7 — Write the draft file.

Write profiles/<id>/.answers/draft_<YYYYMMDD_HHMMSS>.json:

{
  "company": "Figma",
  "role": "Product Manager, AI Platform",
  "question": "Why do you want to join Figma?",
  "answer": "<final approved text>",
  "category": "Motivation",
  "notes": "Optional context. E.g. 210-char short version. Field: Additional Information.",
  "existingPageId": null
}

If the user chose to update an existing entry, set existingPageId to the matched pageId from Phase 1.

Phase 3 — push (CLI)

node engine/cli.js answer --profile <id> --phase push \
  --results-file profiles/<id>/.answers/draft_<timestamp>.json

CLI:

If existingPageId is set → updates Answer / Category / Notes on that Notion page.
Otherwise → creates a new page in profile.notion.application_qa_db_id.
Always writes a local .md backup to profiles/<id>/application_answers/<Company>_<role-slug>_<YYYYMMDD>.md. If a file with that name already exists today, suffix _v2, _v3 etc.
Prints JSON: { pageId, action: "created"|"updated", url, backupPath }.

Step 8 — Report to user

Summarize:

Action: created or updated.
Notion URL of the page.
Local backup path.
Char count of the saved answer.

Failure modes (answer-specific)

no notion.application_qa_db_id configured — profile.json is missing the field. For jared it's ca4fa9e8-b3a6-4ccb-bcc2-3a13ff6b06ae. For other profiles, create the Q&A DB in Notion first.
missing JARED_NOTION_TOKEN — load it from ~/.bashrc / .env. Same token used by sync and check.
invalid category — the draft includes a category not in the canonical 8. Fix to one of: Behavioral, Technical, Culture Fit, Logistics, Salary, Other, Experience, Motivation. The categorize() helper picks a default automatically.
Notion 400 on create — usually a missing required property or a Category option that doesn't exist in the DB. Categories must already be in the DB schema; do not invent new ones.
Search returns nothing for a clearly recurring question — the question text drift may exceed the 120-char dedup window. Look at partials for near-matches.

Global Guard Rails (prepare / answer modes)

These rules apply whenever Claude generates content or makes pipeline decisions. They reference per-profile config — do not hardcode profile-specific values here.

Level Filter

Company Cap

Fit Score (domain fit only)

Level does NOT affect fit score. Evaluate by domain match to the candidate's profile:

Strong — core domain match (see profiles/<id>/memory/user_resume_key_points.md for domain specifics) plus a relevant tech or product component
Medium — adjacent domain, or right domain with lesser location/format fit, or outside core domain but with a key component overlap (AI/ML, data platform, payments)
Weak — outside core domain with no overlapping component
Early-startup modifier (pre-Series B, <50 people): downgrade one level (Strong→Medium, Medium→Weak)

Profile-specific domain criteria: profiles/<id>/memory/user_resume_key_points.md.

Salary Expectations (auto-fill at prepare time)

Determined automatically from Company Tier × Role Level. No JD salary analysis needed.

Level parsing is per-profile. Engine uses profile.salary.level_parser from profile.json:

"pm" (default — Jared / fintech): Lead / Senior / PM. Catches "Lead", "Senior", "Sr.", "Sr ", and Capital One-style "Manager, Product Management" → Senior.
"healthcare" (Lilia): Senior / Coordinator / MedAdmin. Catches "Lead" / "Supervisor" / "Senior" → Senior; "Coordinator" / "Specialist" → Coordinator; everything else → MedAdmin.
"default" (single-row matrix): always returns default.

COL adjustment is per-profile (profile.salary.col_adjustment). Default for pm: +7.5% if hybrid/onsite in SF/NYC. For Lilia (Sacramento metro) the multiplier is 1.0 — no adjustment.

The CLI surfaces the resolved config in prepare_context.salaryConfig — SKILL Step 6 reads it from there, never from disk.

Notion Field Completeness

Per-profile Notion DB id: profile.json → notion.jobs_pipeline_db_id.

Humanizer Rules (prepare / answer modes)

Apply during CL or answer generation — not as a separate post-pass.

Voice calibration

Match the profile's writing style from prepare_context.memory.writingStyle (engine-loaded from profile.json.memory.writing_style_file). When the field is null, fall back to these defaults:

Confident practitioner, not humble applicant. "I built X that delivered Y" — not "I was responsible for X."
7/10 formality: professional with energy and momentum.
Have opinions; react to facts rather than just reporting them.
Use "I" naturally — first person is honest, not unprofessional.
Numbers in every paragraph except the close.
Short paragraphs (2-3 sentences). Vary rhythm: short punchy sentences mixed with longer ones.
Be specific: concrete details over vague claims.

The defaults above describe Jared's tone. Other profiles (e.g. Lilia — warm, 5/10 formality, no metrics-per-paragraph rule) will override them entirely via their writingStyle memory file.

Banned vocabulary (AI tells)

Never use: delve, landscape, foster, underscore, pivotal, crucial, showcase, tapestry, testament, interplay, intricate.

No copula avoidance: use is/are/has instead of serves as/stands as/boasts.

No significance inflation: no "marking a pivotal moment", "reshaping", "setting the stage".

No superficial -ing phrases: no "highlighting", "underscoring", "ensuring", "reflecting".

No em dash overuse: use commas, periods, or parentheses instead.

No rule-of-three: don't force ideas into groups of three.

No negative parallelisms: no "It's not just X, it's Y".

No generic closers: no "exciting times", "the future looks bright".

No hedging: no "potentially", "it could be argued".

No filler: no "in order to", "it is important to note", "due to the fact that".

No opener clichés: no "Dear Hiring Manager, I am writing to express my interest…", no "I am passionate about [mission]", no "excited to".

Final anti-AI check

After writing, ask: "What makes this obviously AI-generated?" — fix any remaining tells before saving.

Memory files (load before generating)

In every prepare / answer session, read the engine-loaded memory from prepare_context.memory:

memory.writingStyle — writing style profile (from profile.json.memory.writing_style_file)
memory.resumeKeyPoints — skills / experience for matching (from profile.json.memory.resume_key_points_file)
memory.feedback[] — array of {file, content} for each feedback_*.md under profile.json.memory.feedback_dir

For the answer mode (no prepare_context.json available), read the same files directly from disk under the paths declared in profile.json.memory.