| name | openai-latex-transcription-background-mode |
| description | Convert PDF textbook pages into LaTeX by rendering pages to PNG at 300 DPI, uploading exactly one full page image per OpenAI Responses API request with background=True, and polling by response id until terminal status. Use when users ask for OCR/transcription of math-heavy pages and want API-side background work with strict one-page-per-request processing. |
OpenAI LaTeX Transcription Background Mode
Use this skill to run a strict one-page-per-request background-mode LaTeX transcription workflow.
The canonical PDF workflow uses scripts/transcribe_math_latex_background_single_page.py, which renders source pages at 300 DPI, submits one background request per page, retries only failed pages once, appends retry output to the same per-page log, and assembles the final LaTeX in reading order.
The lower-level primitive remains scripts/transcribe_math_latex_background_mode.py, which accepts exactly one already-rendered page image and handles one background request plus one polling loop. If you need manifest-driven resume, true simultaneous in-flight job management, or integrated high-concurrency orchestration, use openai-latex-transcription-background-orchestrator instead.
Preconditions
- Ensure
OPENAI_API_KEY is available in environment variables or a local .env file.
- Ensure the repo has Python dependencies (
openai, python-dotenv) installed.
- Ensure system tools are available:
pdftoppm for rendering PDF pages to PNG
pdflatex for final compile
- Run transcription commands with
python -u for unbuffered submission and polling logs.
Core Workflow
- Render source PDF pages to PNG at 300 DPI:
mkdir -p /tmp/latex_ocr/pages /tmp/latex_ocr/logs /tmp/latex_ocr/snippets /tmp/latex_ocr/final
pdftoppm -r 300 -f <start_page> -l <end_page> -png input.pdf /tmp/latex_ocr/pages/page
- Use exactly one full rendered page image per background request:
- Do not append multiple pages into one image.
- Do not split a page into sub-images.
- Use one page PNG per request, one per-page log file, and one final snippet per successful page.
- When the canonical helper retries a page, it appends the retry attempt to that page's existing log file instead of creating a second log.
- The runner rejects any single-page image above 50 MB before upload.
- Run the canonical helper when starting from a PDF:
python -u .agents/skills/openai-latex-transcription-background-mode/scripts/transcribe_math_latex_background_single_page.py \
--pdf-path input.pdf \
--start-page <start_page> \
--end-page <end_page> \
--job-dir /tmp/latex_ocr
- The helper retries each failed page once after the initial attempt.
- Or, if you already have prepared single-page PNGs, submit one page image per OpenAI call using the low-level runner:
python -u .agents/skills/openai-latex-transcription-background-mode/scripts/transcribe_math_latex_background_mode.py /tmp/latex_ocr/pages/page-000006.png | tee /tmp/latex_ocr/logs/page-000006.log
python -u .agents/skills/openai-latex-transcription-background-mode/scripts/transcribe_math_latex_background_mode.py /tmp/latex_ocr/pages/page-000007.png | tee /tmp/latex_ocr/logs/page-000007.log
- Repeat once per page image.
- Keep each request single-image; do not batch multiple images in one request.
- The runner accepts exactly one positional
image_path, and that image must be a single rendered page.
- Read the per-page background lifecycle log:
submit_start
submitted
- repeated
poll
final
output_text_missing if the API marks the response completed but returns no LaTeX body content.
- Treat exit code
0 with final status completed as success.
- The canonical helper appends retry attempts to the same page log, separated by
===RETRY_ATTEMPT===.
- Extract transcribed LaTeX between delimiters:
===LATEX_BEGIN===
===LATEX_END===
- Successful runs emit both delimiters.
- If delimiters are absent or
output_text_missing appears, retry only that page.
- Assemble extracted snippets by concatenating them in document reading order, then compile with
pdflatex.
API Pattern
Follow this request shape for each image:
uploaded = client.files.create(file=open(image_path, "rb"), purpose="vision")
response = client.responses.create(
model="gpt-5.4",
instructions=TRANSCRIPTION_INSTRUCTIONS,
text={"format": {"type": "text"}, "verbosity": "high"},
reasoning={"effort": "xhigh"},
background=True,
input=[
{
"role": "user",
"content": [
{"type": "input_text", "text": TRANSCRIPTION_PROMPT},
{"type": "input_image", "file_id": uploaded.id, "detail": "high"},
],
}
],
)
Poll with client.responses.retrieve(response.id) every 10 seconds until terminal status.
Execution Rules
- Use the Responses API (
responses.create, responses.retrieve), not Chat Completions.
- Put durable conversion policy in
instructions=...; keep the user input_text short and task-specific.
- Set
text={"format": {"type": "text"}, "verbosity": "high"} on each create call.
- Use
background=True for each page request.
- Upload and transcribe exactly one PNG per request.
- Use exactly one full rendered page image per request.
- Render PDF pages at 300 DPI when this skill owns the PDF-to-image step.
- Do not append multiple pages into one image and do not split a page into fragments.
- The runner rejects any single-page image above 50 MB before upload.
- Keep the
content array to one input_text item plus one input_image item; do not send multiple images in a single prompt.
- Preserve visible headers and footers as document content, but omit page numbers.
- Preserve page ordering when assembling final
.tex.
- Keep logs for traceability and delimiter-based extraction.
- The low-level runner owns one page image, one background request, one polling loop, and one LaTeX result.
- The helper owns PDF rendering at 300 DPI, one-page job creation, one retry per failed page, assembly, and optional compile.
- Use the orchestrator skill when you want many in-flight jobs, concurrent polling across many
response_ids, and manifest-based resume.
- Treat
completed responses with no emitted LaTeX as failures; the runner logs output_text_missing and exits non-zero.
- On non-zero exit or missing delimiters, retry only affected pages.
Repo-Specific Notes
- Use
.agents/skills/openai-latex-transcription-background-mode/scripts/transcribe_math_latex_background_single_page.py as the canonical PDF-to-LaTeX background workflow.
- Use
.agents/skills/openai-latex-transcription-background-mode/scripts/transcribe_math_latex_background_mode.py as the low-level one-page background runner.
- Use
references/workflow-template.md as the canonical background-mode sequence and reusable command template.