| name | openai-latex-transcription-background-orchestrator |
| description | Orchestrate simultaneous in-flight background-mode LaTeX transcription jobs by rendering a PDF to single-page PNGs at 300 DPI or consuming prepared single-page images, submitting many one-image Responses API requests with background=True, persisting response ids in a manifest, polling them to completion, extracting LaTeX snippets in order, and resuming after interruption. Use when users want real parallel background execution inside Codex rather than a single-image runner. |
OpenAI LaTeX Transcription Background Orchestrator
Use this skill when the task requires simultaneous in-flight background jobs, manifest-based resume, ordered extraction across many page images, and integrated local PDF rendering.
This skill sits above the lower-level background-mode runner pattern. It is the right place for multi-job submission, concurrency caps, response-id persistence, polling, retry, and ordered snippet assembly.
Preconditions
- Ensure
OPENAI_API_KEY is available in environment variables or a local .env file.
- Ensure the repo has Python dependencies (
openai, python-dotenv) installed.
- Ensure system tools are available:
pdftoppm for rendering PDF pages to PNG if you need to create page images locally.
pdflatex if you plan to compile the final LaTeX output.
- Use one single-page image per API request.
What This Skill Owns
- manifest generation for many single-page images
- PDF page rendering to single-page PNGs
- bounded parallel submission with
background=True
- response-id persistence
- concurrent polling across many in-flight jobs
- retry handling for failed submissions, failed terminal pages, and completed responses that return no
output_text
- snippet extraction and ordered assembly
- leaving a final
body.tex ready for a separate document wrapper and pdflatex compile step when every page completes successfully
What This Skill Does Not Require
It does not require you to keep one local worker blocked per request. That is the point of using background mode here.
Core Workflow
- For a fresh run, provide exactly one input source:
--pdf-path to render pages locally at 300 DPI inside <job_dir>
--image-glob to consume an existing ordered set of prepared single-page images
- Run the orchestrator on the chosen input source:
python -u .agents/skills/openai-latex-transcription-background-orchestrator/scripts/orchestrate_background_latex_transcription.py \
--pdf-path input.pdf \
--start-page 1 \
--end-page 20 \
--job-dir math_documents/jobs/example_job \
--max-in-flight 12 \
--poll-seconds 10
Prepared-image alternative:
python -u .agents/skills/openai-latex-transcription-background-orchestrator/scripts/orchestrate_background_latex_transcription.py \
--image-glob 'prepared_pages/page-*.png' \
--job-dir math_documents/jobs/example_job \
--max-in-flight 12 \
--poll-seconds 10
- Let the orchestrator:
- render page PNGs locally at 300 DPI when
--pdf-path is used
- create or resume
state/manifest.json
- submit multiple background jobs in parallel
- persist each
response_id immediately
- poll all outstanding jobs
- write one snippet file per completed page
- assemble snippets in order only when all pages complete successfully
Execution Rules
- Use the Responses API (
responses.create, responses.retrieve) with background=True.
- Set
text={"format": {"type": "text"}, "verbosity": "high"} on each create call.
- Use one image per request.
- Use exactly one full page image per request; do not append pages and do not split pages.
- For a fresh run, use exactly one of
--pdf-path or --image-glob. Use --resume by itself to continue an existing manifest.
- Render PDF pages at 300 DPI when this skill owns the PDF-to-image step.
- Preserve ordering in the manifest with explicit
order_index.
- Keep the orchestrator manifest on disk so interrupted sessions can resume.
- Fresh non-resume runs with
--pdf-path rebuild the derived pages/, logs/, responses/, snippets/, final/, and state/ outputs inside <job_dir>.
- Fresh non-resume runs with
--image-glob rebuild the derived logs/, responses/, snippets/, final/, and state/ outputs inside <job_dir>, but leave the provided source page images untouched.
- Retry only failed pages, not the whole job.
- Treat page images above 50 MB as errors.
- Prebuilt
--image-glob inputs are size-checked against the hard limit before manifest creation.
--image-glob inputs are sorted naturally before order_index assignment.
--image-glob inputs are assigned synthetic sequential page_id values such as page-0001, page-0002, and so on after sorting.
- If a completed response returns no
output_text, treat that page as failed and retry within the normal retry budget.
- If polling a live
response_id fails transiently, keep the page submitted and retry polling on the next loop.
--retry-limit counts retries after the initial attempt, so the default --retry-limit 2 means up to 3 total attempts per page.
- On resume, the current
--retry-limit value redefines the total attempt budget for each page, bounded below by attempts already used, and previously exhausted pages reopen automatically when the new budget allows another attempt.
- Exit non-zero if any pages remain unresolved after retries; inspect the manifest and per-page logs before resuming.
Log and Console Events
- Each
logs/<page_id>.log file is JSONL and can contain submit_start, submitted, poll, final, submit_error, poll_error, output_text_missing, retry_budget_extended, resume_state_repaired, and snippet_missing_on_resume.
- The orchestrator writes top-level JSON events to stdout:
orchestrator_start, orchestrator_complete, and orchestrator_incomplete.
Files Produced
Inside <job_dir> the orchestrator writes:
state/manifest.json
pages/page-*.png when --pdf-path is used
logs/<page_id>.log
responses/<page_id>.json
snippets/<page_id>.tex
final/body.tex after all pages complete successfully
Repo-Specific Notes
- Use
.agents/skills/openai-latex-transcription-background-orchestrator/scripts/orchestrate_background_latex_transcription.py as the canonical multi-job background orchestrator.
- Use
references/workflow-template.md for a concrete execution template.