بنقرة واحدة
post-ocr-cleanup
Clean post-OCR text: correction, QA, multilingual handling, provenance.
التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.
القائمة
Clean post-OCR text: correction, QA, multilingual handling, provenance.
التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.
استنادا إلى تصنيف SOC المهني
Scaffold or audit an entire research project repository organized around its source library. Use whenever the user is starting, structuring, organizing, or reviewing a whole project — "set up a research repo", "how should I structure/organize this project", "initialize my sources folder", "new paper or literature-review project", "audit my repo structure", "is my sources folder set up right", "check my project layout". Builds the full tree from the sources spine outward — sources/{og,md,unprocessed}, references.bib, a PDF→Markdown convert script (OpenDataLoader PDF), a process-source intake command, CLAUDE.md/AGENTS.md, .gitignore, .venv — plus the analysis, manuscript, and review folders; or audits an existing repo and reports what is present, partial, or missing. NOT for intaking or converting a single PDF (use process-source) or building a publication replication package (use replication-package).
LLM token logprobs and calibration: per-decision confidence, ECE, Brier, reliability diagrams, low-confidence triage.
LLM council/panel voting: multi-model coders, consensus rules, inter-rater agreement (kappa, alpha), correlated-error diagnostics.
Compare OCR systems before a bulk run: candidate set, stratified ground truth, CER/WER, normalization, per-language and per-stratum accuracy.
Fact-check a manuscript's claims against the cited sources themselves: locate each source's knowledge-base Markdown file and verify the in-text claim is actually supported. Runs a pre-flight gate that refuses unless a per-source Markdown knowledge base exists and is clean (PDFs converted via process-source); then runs citation-check; then audits claim support, overclaiming, direction, scope, and misattribution.
Audit citation existence and fabrication risk, in-text/reference parity, DOIs, claim support, and style.
| name | post-ocr-cleanup |
| description | Clean post-OCR text: correction, QA, multilingual handling, provenance. |
reference/prompt-templates-and-schema.md for a minimal constrained-decoding-friendly baseline prompt, a Bourne-style socio-cultural-context prompt, and a span-level JSONL provenance schema (per Guo & Wei 2026 §3.2/§3.3).pre-registration-writing skill).