| name | manual-segmentation-latex-transcription |
| description | Default skill for user-driven manual PDF-to-LaTeX transcription of math PDFs. Provides a portable, self-contained workflow with polygon segment review, audited session manifests, pinned Python dependencies, and reproducible Part A commands (`init --no-transcribe`, `segment-review`, `transcribe-and-build`). |
Manual Segmentation LaTeX Transcription
Use this skill by default whenever a user needs manual or user-driven PDF-to-LaTeX transcription. It packages the manual-segmentation Part A workflow as a portable skill that can be copied into another Codex environment without depending on repository code outside the skill folder.
Scope
- Included:
init --no-transcribe
- foreground
segment-review
transcribe-and-build
status
check-env
smoke-test
- Not included in this skill version:
- optional API reformatting
- final local QA and sectioning pass
- HTML export or verification workflows
Preconditions
- Request outbound access to
api.openai.com before any OpenAI call.
- Use
python -u for API-calling commands so progress lines are not buffered.
- Default to model
gpt-5.4 unless the operator explicitly overrides --model or OPENAI_MODEL.
- Ensure system binaries are installed:
pdflatex
pdfinfo
pdftoppm
- Ensure Tk is available for the review UI.
- Prefer the bundled lockfile and bootstrap script before running the workflow in a new environment.
Runtime Entry Points
Run from the skill folder:
python -u scripts/workflow.py check-env
python -u scripts/workflow.py init /path/to/file.pdf --no-transcribe
python -u scripts/workflow.py segment-review /path/to/session_dir
python -u scripts/workflow.py transcribe-and-build /path/to/session_dir
python -u scripts/workflow.py status /path/to/session_dir
python -u scripts/workflow.py smoke-test
Use scripts/bootstrap_venv.py to create a local .venv from requirements.lock.txt when the environment is not already prepared.
Session Location
- All intermediate artifacts, checkpoints, audit files, and final outputs are stored in a timestamped session directory next to the source PDF.
- Example:
/repo/path/chapter.pdf creates /repo/path/chapter_YYYYMMDD_HHMMSS/.
- The skill must not save normal workflow sessions under
.agents/ or the skill folder itself.
Audit Contract
Every session writes:
workflow_state.json
audit/run_manifest.json
audit/environment.json
audit/artifact_hashes.json
audit/api_calls.jsonl
audit/commands/*.log
Treat the session directory plus the audit/ subtree as the source of truth for replay and review.
References
- Runtime and portability requirements:
references/runtime_requirements.md
- Provenance schema and audit expectations:
references/provenance.md