| name | openai-latex-transcription-sync |
| description | Convert PDF textbook pages into LaTeX by rendering pages to single-page PNGs at 300 DPI and uploading one image per synchronous OpenAI Responses API request. Use when users ask for OCR/transcription of math-heavy pages, full prose-plus-math LaTeX extraction, or reliable one-page-at-a-time processing with ordered snippet assembly and pdflatex compilation. |
OpenAI LaTeX Transcription Sync
Use this skill to run a reliable LaTeX transcription workflow with scripts/transcribe_math_latex_sync.py.
Preconditions
- Ensure
OPENAI_API_KEY is available in environment variables or a local .env file.
- Ensure the repo has Python dependencies (
openai, python-dotenv) installed.
- Ensure system tools are available:
pdftoppm for rendering PDF pages to PNG
pdflatex for final compile
- Run transcription commands with
python -u for unbuffered request logs.
Core Workflow
- Render source PDF pages to PNG at 300 DPI:
mkdir -p /tmp/latex_ocr/pages /tmp/latex_ocr/logs /tmp/latex_ocr/snippets /tmp/latex_ocr/final
pdftoppm -r 300 -f <start_page> -l <end_page> -png input.pdf /tmp/latex_ocr/pages/page
- Use exactly one full rendered page per transcription image:
- Do not append multiple pages into one image.
- Do not split a page into sub-images, even when the page is dense.
- The runner rejects any single-page image above 50 MB before upload.
- Submit one image per OpenAI call using the skill-local script:
python -u .agents/skills/openai-latex-transcription-sync/scripts/transcribe_math_latex_sync.py /tmp/latex_ocr/pages/page-000006.png | tee /tmp/latex_ocr/logs/page-000006.log
python -u .agents/skills/openai-latex-transcription-sync/scripts/transcribe_math_latex_sync.py /tmp/latex_ocr/pages/page-000007.png | tee /tmp/latex_ocr/logs/page-000007.log
- Repeat once per page image.
- Keep each request single-image; do not batch multiple images in one request.
- The runner accepts exactly one positional
image_path.
- Read the per-page request log:
request_start
response_complete
output_text_missing if the API marks the response completed but returns no LaTeX body content.
- Treat exit code
0 with response_complete status completed as success.
- Extract transcribed LaTeX between delimiters:
===LATEX_BEGIN===
===LATEX_END===
- Successful runs emit both delimiters.
- If delimiters are absent or
output_text_missing appears, retry only that page.
- Assemble extracted snippets in document reading order and compile with
pdflatex.
API Pattern
Follow this request shape for each image:
uploaded = client.files.create(file=open(image_path, "rb"), purpose="vision")
response = client.responses.create(
model="gpt-5.4",
instructions=TRANSCRIPTION_INSTRUCTIONS,
reasoning={"effort": "xhigh"},
input=[
{
"role": "user",
"content": [
{"type": "input_text", "text": TRANSCRIPTION_PROMPT},
{"type": "input_image", "file_id": uploaded.id, "detail": "high"},
],
}
],
)
Execution Rules
- Use the Responses API (
responses.create), not Chat Completions.
- Put durable conversion policy in
instructions=...; keep the user input_text short and task-specific.
- Upload and transcribe exactly one PNG per request.
- Render PDF pages at 300 DPI when this skill owns the PDF-to-image step.
- Use exactly one full page image per request; do not append pages and do not split pages.
- The runner rejects any single-page image above 50 MB before upload.
- Keep the
content array to one input_text item plus one input_image item; do not send multiple images in a single prompt.
- Preserve visible headers and footers as document content, but omit page numbers.
- Preserve page ordering when assembling final
.tex.
- Keep logs for traceability and delimiter-based extraction.
- Treat
completed responses with no emitted LaTeX as failures; the runner logs output_text_missing and exits non-zero.
- On non-zero exit or missing delimiters, retry only affected pages.
Repo-Specific Notes
- Use
.agents/skills/openai-latex-transcription-sync/scripts/transcribe_math_latex_sync.py as the canonical transcription runner.
- Use
references/workflow-template.md as the canonical sequence and reusable command template.