원클릭으로 Manus에서 모든 스킬 실행

check-prose-claims

스타11

포크1

업데이트2026년 5월 24일 00:00

Fact-check prose claims in slides, reports, PDFs, and papers — statistics, comparatives, attributions, causal claims, quotes. Two-pass extract-then-verify protocol with strict numerical precision and a doc-only mode. Use when the user asks to "check the claims in this deck", "fact-check this report", "audit this PDF", "verify the numbers in these slides", or before publishing/shipping any externally-facing document with quantitative claims. Complements `check-bib-references` (which handles BibTeX entries) — this skill handles the prose around them.

설치

Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.

Manus에서 실행

출처

yulonglin

yulonglin/dotfiles

GitHub 저장소 열기 Creator 저장소 보기

다운로드

Manus에서 실행

Check prose claims against authoritative sources

When to invoke

Before any external publication — papers, slide decks, blog posts, reports, talks with quantitative or attributed claims
After LLM-assisted drafting — anything Claude/Codex/another LLM wrote that contains numbers, attributions, or comparisons
When a claim smells off — round numbers, suspiciously clean comparatives ("3x faster"), confident attributions to a paper without a page reference
User asks: "check the claims in this deck", "fact-check the report", "audit this PDF", "verify the numbers", "are these statistics right?"
Doc-only audit mode — user says "only use this document" / "verify against the PDF only" / "don't search the web". Everything must trace to provided source

When NOT to use this skill

Situation	Use instead
BibTeX entry verification (titles/authors/arXiv IDs)	`check-bib-references`
Code review / type-checking a script	`code:code-reviewer`
Verifying a single fact mid-conversation	`~/.claude/rules/verify-before-instructing.md` (just WebSearch in main context)
Editing for tone/style/clarity (not factual content)	Writing skills
"Does this paper exist?" (single bib entry)	`check-bib-references/check_bib.py --key <citekey>`
Drafting new claims (you have data, need to write the sentence)	Just write it, then run this skill on the draft

The failure mode this catches

Prose drift between source and claim. The bib entry can be correct and the cited paper real — but the sentence attached to the cite says something the paper doesn't.

Failure	Example
Wrong number — paper says 96.555%, slide says 97%	Source rounded; deck is now technically wrong in academic mode
Wrong direction — paper says X reduces Y, slide says X increases Y	Sign flip — pure hallucination
Wrong comparator — paper compares against baseline, slide compares against SOTA	Same effect size, different (and bigger) claim
Missing context — paper finding holds for subset A, slide drops the qualifier	"Best performance" when paper says "best on the easy subset"
Confidently attributed to nobody — "Studies show X" / "According to research..." with no actual source	Padding that sounds authoritative but verifies to nothing
Causal inflation — paper shows correlation, slide says "X causes Y"	See `~/.claude/rules/research-integrity.md` § Causal Claims
Visual distortion — bar chart with truncated y-axis exaggerates a 2% gap to look like 40%	Chart looks compelling, source data is mundane
Cherry-picked timeframe — chart shows 2022-2024 spike, full series is flat with one anomaly	Misleading by omission

Two-pass architecture (the core protocol)

Never interleave extraction and verification. Discovering new claims mid-verification breaks consistency.

Pass 1 (extract) → freeze list → user confirms → Pass 2 (verify in parallel) → report

Pass 1: extraction

Read the entire document (slides, PDF, report). For PDFs >10 pages, delegate to a subagent — see ~/.claude/rules/context-management.md
Extract every claim matching the IN taxonomy below. No verification yet.
Output a numbered list and present to user for confirmation before moving on. The user often spots extraction errors (missed claims, mis-typed claims) at this gate
Freeze the list to out/claim-check-<UTC-timestamp>/claims.jsonl

Pass 2: verification

Take Pass 1 output as fixed input — no re-extraction
Dispatch claims in parallel batches (see orchestrator.md)
Apply the status decision tree to each claim
Append results to results.jsonl keyed by claim_id
Generate report grouped by status

Claim taxonomy

EXTRACT (the IN list)

Type	Pattern	Example
Statistic	Any number with unit/context (%, $, count, ratio, decimal)	"92.3% accuracy", "$4.7B market"
Comparative	X is [comparative] than Y	"3x faster than baseline"
Temporal	Time-bound assertion	"In 2024, adoption reached…"
Attribution	Claim tied to a source	"According to WHO…", "Smith et al. (2024) found…"
Causal	X causes / leads to / results in Y	"Fine-tuning on X reduces hallucination by Y%"
Existence	Asserts something exists or is true	"500M users", "the model supports tool use"
Ranking	Position claim	"largest", "first", "top 3"
Quote	Direct quotation attributed to a source	Anything in quote marks with an author

SKIP by default (the OUT list)

Type	Example	Reason
Definitions	"Machine learning is a subset of AI"	Definitional, not factual
Marked opinions	"We believe…", "In our view…"	Explicitly subjective
Hypotheticals	"If adoption continues…"	Speculative
Questions	"What drives growth?"	Not an assertion
Future predictions without source	"Will reach $10B by 2030"	Unless citing a forecast
Methodology descriptions	"We used PyTorch 2.0"	Process, not factual claim
Acknowledgments	"Thanks to our collaborators"	Not verifiable

Override for high-stakes audits

For papers, safety-critical reports, and external talks, also extract:

Normally OUT	Why extract anyway
Methodology descriptions ("we used model X with hyperparameter Y")	Wrong model name or wrong hyperparameter is a reproducibility bug. Audit if the document will be published
Hypothetical presuppositions ("If adoption continues at current rates…")	The presupposed rate is a verifiable claim even if the conditional isn't. Extract the embedded statistic
Marked opinions backed by data ("We believe X, given that 70% of users…")	The supporting data inside the opinion is a Statistic — extract it, skip the opinion wrapper

The default OUT list is tuned for slide decks. Ask the user before extraction starts: "Default mode (slide-deck) or high-stakes mode (paper/safety-critical)?"

Numerical precision rules (academic standard)

Default: strict. Exact numbers only.

Source	Claim	Status
96.555%	96.555%	Verified
96.555%	97%	Numerical Error (rounding)
96.555%	96.6%	Numerical Error (rounding)
96.555%	96.5%	Numerical Error (truncation)
0.834	0.83	Numerical Error (sigfigs)
96.555%	0.96555	Numerical Error (unit mismatch)
+12% growth	+15% growth	Hallucination (different value)
+12% growth	−12% growth	Hallucination (wrong direction)
$4.7B	$47B	Hallucination (order of magnitude)

Exception: if the source itself rounds ("96.555% (approximately 97%)"), then "97%" verifies — cite the approximation.

Loosening the rule: if the user is checking a slide deck for a general audience, ask whether to permit 1-sigfig rounding. Default to strict; relax only on explicit request.

Status decision tree

Apply to every claim. No shortcuts.

Is this a CITATION claim (references a paper/report/source)?
├─ YES → CITATION VALIDATION
└─ NO  → STATISTIC/FACT VALIDATION

CITATION VALIDATION
├─ Cited source exists?
│   ├─ NO  → "Citation Not Found"
│   └─ YES → Source contains the claimed topic?
│             ├─ NO  → "Misquoted"
│             └─ YES → Source supports the exact claim?
│                       ├─ YES exact     → Verified (exact)
│                       ├─ YES paraphrase → Verified (paraphrase)
│                       ├─ PARTIAL       → Misleading (note missing context)
│                       └─ NO            → Hallucination (note what source says)

STATISTIC/FACT VALIDATION
├─ Authoritative source found?
│   ├─ NO  → "Unverified" (NOT "Hallucination" — absence ≠ contradiction)
│   └─ YES → Values match exactly?
│             ├─ YES → Verified
│             └─ NO  → Numerical Error (record source value, claimed value, deviation)

DOC-ONLY MODE
└─ Claim traceable to provided source?
    ├─ YES → run normal validation against that source
    └─ NO  → "Not in Source" (likely external knowledge / training-data hallucination)

Visual integrity checklist (charts, graphs, tables)

For every visual:

Extract every data point, axis label, unit, scale, legend
Find source for the underlying numbers
Compare value-by-value (one row per visual element)
Check integrity:

Check	Issue type
Y-axis starts at non-zero (bar chart)	Visual Distortion: axis manipulation
3D effects distort proportions	Visual Distortion: 3D exaggeration
Missing error bars when source has them	Misleading: uncertainty omitted
Different time range than source	Misleading: cherry-picked timeframe
Color/encoding suggests ordering source doesn't support	Misleading: false ranking
Inset zoom without indication	Misleading: hidden truncation

Doc-only mode

Triggers: "only use this document" / "don't search the web" / "verify against the PDF only" / "everything should be from the source"

In this mode:

Build a source index first (page → key facts/numbers/tables/figures)
Pass 1 extraction proceeds normally
Pass 2 verifies only against the index — no web search
Any claim that doesn't trace to the index → Not in Source (regardless of whether it's true elsewhere). Likely external knowledge or training-data leakage

This mode is essential when auditing an LLM-generated summary of a specific document — you want to know what isn't in the source, not what's true in general.

Source authority hierarchy (search mode)

When multiple sources found, prefer in this order:

Rank	Type	Examples
1	Primary source	Original study, official report, raw data
2	Government/institutional	WHO, CDC, World Bank, national stats offices
3	Peer-reviewed	Nature, Science, IEEE, ACM proceedings
4	Named industry report	Gartner, McKinsey, Statista (with methodology)
5	Reputable news citing primary	NYT/Reuters citing original source
6	Secondary compilations	Wikipedia (check their sources)

If only Rank 5–6 sources are found → status = "Unverified", note "only secondary sources found".

Search templates

Run all applicable templates per claim — don't stop at the first hit.

Academic citations

Q1: "<first author last name> <year> <first 3 words of title>"
Q2: "<full paper title>" site:semanticscholar.org OR site:arxiv.org
Q3: "doi:<DOI>"            (if DOI provided)
Q4: "arxiv:<arxiv_id>"     (if arXiv ID provided)

For arXiv-ID-bearing claims, don't WebSearch — use check-bib-references/check_bib.py (batched API, no rate-limit hassle). The script reads a BibTeX file, so write the arXiv IDs to a tiny bib first:

# Write a throwaway bib with just the IDs you need to verify
cat > /tmp/claim-check.bib <<'EOF'
@misc{c1, eprint = {2401.12345}, archivePrefix = {arXiv}}
@misc{c2, eprint = {2403.99999}, archivePrefix = {arXiv}}
EOF
uv run ~/.claude/skills/check-bib-references/check_bib.py /tmp/claim-check.bib

(The orchestrator generates this bib programmatically from claims.jsonl — see orchestrator.md.)

Statistics (market size, usage, etc.)

Q1: "<exact number with unit> <topic> <year>"
Q2: "<topic> <year>" site:statista.com OR site:gartner.com OR site:mckinsey.com
Q3: "<topic> <year>" site:gov OR site:edu
Q4: "<topic> <number> original source"

Company / product claims

Q1: "<company> <claim topic> press release <year>"
Q2: site:<company-domain> "<claim topic>"
Q3: "<company> <metric> official announcement"
Q4: "<company> <claim> SEC filing"   (public companies)

Health / medical

Q1: "<topic>" site:who.int OR site:cdc.gov OR site:nih.gov
Q2: "<claim>" systematic review site:cochrane.org
Q3: "<claim>" meta-analysis pubmed

Government / policy

Q1: "<policy/law name>" site:gov
Q2: "<statistic> official statistics <country>"
Q3: "<claim> <agency name> report"

Tie-breaker rules

Situation	Rule
Missing date on claim	Assume most recent year available; flag "needs date"
Conflicting sources	Most recent authoritative; cite both; note conflict
Source not found after all queries	"Unverified" (not "Hallucination")
Currency/unit conversion ambiguity	Flag "needs clarification: currency/units"
Same org, multiple reports	Use most recent; cite with date
Claim uses "approximately" / "about"	Verify base number is within ±10% of source
Source behind paywall	"Source behind paywall, unable to verify exact text"
Source in different language	Translate and verify; note translation

Integration with sibling skills

Claim contains…	Use…
arXiv ID, DOI, OpenReview forum ID	`check-bib-references/check_bib.py` (batched, rate-limit-aware)
Citation to a paper that has no ID in the bib	WebSearch → WebFetch the abstract — same as Pass 2
Pure prose stat ("70% of users…") with no cite	WebSearch templates above
Visual chart	Vision pass first (extract values), then verify per template
Long PDF (>10 pages)	Delegate Pass 1 extraction to a subagent (`~/.claude/rules/context-management.md`)

Cross-references:

~/.claude/rules/agents-and-delegation.md § Factual Verification — never delegate factual lookup to an agent without tools
~/.claude/rules/verify-before-instructing.md — verify before stating; same epistemic standard
~/.claude/rules/research-integrity.md § Causal Claims Match Evidence — for causal/correlational distinctions

Orchestration

For >20 claims, see orchestrator.md for the batch runtime pattern (parallel agents, persistent state, resumability, arXiv pre-resolve).

For small batches (<20 claims), verify inline in the main session — orchestration overhead exceeds the savings.

Output format

Summary (always first)

## Verification Report

**Mode:** [Search / Doc-Only]
**Document:** <filename>
**Generated:** <UTC timestamp>

### Summary
| Metric | Count |
|--------|-------|
| Total claims extracted | N |
| Verified | … |
| Numerical Error | … |
| Hallucination | … |
| Misleading | … |
| Unverified | … |
| Citation Not Found | … |
| Not in Source (doc-only) | … |

**Overall:** [PASS / FAIL — N issues]

Findings (grouped by status)

### Verified (N)
| ID | Claim | Source | Location | Confidence |

### Numerical Errors (N)
| ID | Claim | Source Value | Claimed Value | Deviation | Fix |

### Hallucinations (N)
| ID | Claim | Issue | Source Says |

### Unverified (N)
| ID | Claim | Issue |

### Misleading (N)
| ID | Claim | Issue | Missing Context |

### Sources Consulted
| ID | Citation | Type | URL | Used For |

Critical rules

Two passes, never one. Extract first, freeze, then verify.
Confirm extraction with the user before Pass 2. They will catch missed claims.
Exact numbers only in default mode. 96.555% ≠ 97%.
Absence ≠ contradiction. No source found → "Unverified", not "Hallucination".
Run ALL search templates per claim, not just the first.
Citations must be real — for anything with an arXiv/DOI/OpenReview ID, defer to check-bib-references/check_bib.py.
Check what sources actually say — a real paper can still be misquoted.
In doc-only mode, flag everything external — even claims that happen to be true.
Conservative when uncertain — "Unverified" beats false "Verified".
Verify agent output on disk, not from agent status — classifyHandoffIfNeeded returns false failures. Read results.jsonl to confirm.

Lessons from designing this

Placeholder — fill in after first real audit.

Initial notes (pre-use):

arXiv pre-resolution in main context (batched) saves ~60s of throttle per agent. Don't let agents hit arXiv individually.
The user-confirmation gate between Pass 1 and Pass 2 is where most of the value lands — extraction quality is what differentiates a useful audit from a noisy one.
Two failure modes from the audit reference that probably hit us too: (a) discovering "new" claims mid-verification (handled by Pass 1 freeze), (b) accepting secondary sources without seeking primaries (handled by authority hierarchy).
The "high-stakes override" of the OUT list is our addition — the reference is tuned for slide decks. For papers, methodology errors are real bugs.

Files

SKILL.md — this protocol
orchestrator.md — batch-orchestration runtime pattern (only needed for >20 claims)

Adapting to other repos

The skill is generic — point it at any document. For per-project use, drop SKILL.md + orchestrator.md into the repo's .claude/skills/check-prose-claims/ if you want project-specific extensions (e.g., a custom IN list for a specific report format).

check-prose-claims

이 저장소의 다른 Skills

이 저장소의 다른 Skills

Check prose claims against authoritative sources

When to invoke

When NOT to use this skill

The failure mode this catches

Two-pass architecture (the core protocol)

Pass 1: extraction

Pass 2: verification

Claim taxonomy

EXTRACT (the IN list)

SKIP by default (the OUT list)

Override for high-stakes audits

Numerical precision rules (academic standard)

Status decision tree

Visual integrity checklist (charts, graphs, tables)

Doc-only mode

Source authority hierarchy (search mode)

Search templates

Academic citations

Statistics (market size, usage, etc.)

Company / product claims

Health / medical

Government / policy

Tie-breaker rules

Integration with sibling skills

Orchestration

Output format

Summary (always first)

Findings (grouped by status)

Critical rules

Lessons from designing this

Files

Adapting to other repos

Check prose claims against authoritative sources

When to invoke

When NOT to use this skill

The failure mode this catches

Two-pass architecture (the core protocol)

Pass 1: extraction

Pass 2: verification

Claim taxonomy

EXTRACT (the IN list)

SKIP by default (the OUT list)

Override for high-stakes audits

Numerical precision rules (academic standard)

Status decision tree

Visual integrity checklist (charts, graphs, tables)

Doc-only mode

Source authority hierarchy (search mode)

Search templates

Academic citations

Statistics (market size, usage, etc.)

Company / product claims

Health / medical

Government / policy

Tie-breaker rules

Integration with sibling skills

Orchestration

Output format

Summary (always first)

Findings (grouped by status)

Critical rules

Lessons from designing this

Files

Adapting to other repos