Run any Skill in Manus with one click

openai-latex-transcription-background

Convert PDF textbook pages into LaTeX by rendering pages to PNG, uploading one image per OpenAI Responses API request, and polling background jobs to completion. Use when users ask for OCR/transcription of math-heavy pages, full prose-plus-math LaTeX extraction, or reliable background-mode processing with ordered snippet assembly and pdflatex compilation.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/EjayNg-AI/llm_training --skill openai-latex-transcription-background

Copy and paste this command into Claude Code to install the skill

Source

EjayNg-AI/llm_training

Stars0

Forks0

UpdatedFebruary 22, 2026 at 01:12

File Explorer

3 files

SKILL.md

readonly

name	openai-latex-transcription-background
description	Convert PDF textbook pages into LaTeX by rendering pages to PNG, uploading one image per OpenAI Responses API request, and polling background jobs to completion. Use when users ask for OCR/transcription of math-heavy pages, full prose-plus-math LaTeX extraction, or reliable background-mode processing with ordered snippet assembly and pdflatex compilation.

OpenAI LaTeX Transcription Background

Use this skill to run a reliable LaTeX transcription workflow with transcribe_math_latex_background.py.

Preconditions

Ensure OPENAI_API_KEY is available in environment variables or a local .env file.
Ensure the repo has Python dependencies (openai, python-dotenv) installed.
Ensure system tools are available:
pdftoppm for rendering PDF pages to PNG
convert (ImageMagick) for cropping/append operations
pdflatex for final compile
Run transcription commands with python -u for unbuffered polling logs.

Core Workflow

Render source PDF pages to PNG:

pdftoppm -f <start_page> -l <end_page> -png input.pdf /tmp/latex_ocr/pages/page

Build section images in reading order:

Crop section boundaries as needed.
Append contiguous page segments when they belong to one section:

convert /tmp/latex_ocr/pages/page-000006.png /tmp/latex_ocr/pages/page-000007.png -append /tmp/latex_ocr/sections/section_1_2.png

Split very long sections into contiguous chunks for reliability/runtime.
Submit one image per OpenAI call using the repo script:

python -u transcribe_math_latex_background.py /tmp/latex_ocr/sections/section_1_2.png | tee /tmp/latex_ocr/sections/section_1_2.log
python -u transcribe_math_latex_background.py /tmp/latex_ocr/sections/section_1_3_part1.png | tee /tmp/latex_ocr/sections/section_1_3_part1.log

Repeat once per PNG chunk.
Keep each request single-image; do not batch multiple images in one request.

Monitor JSON events until terminal status:

submit_start, submitted, repeated poll, then final
Treat completed as success.

Extract transcribed LaTeX between delimiters:

===LATEX_BEGIN===
===LATEX_END===

Assemble extracted snippets in document reading order and compile with pdflatex.

API Pattern

Follow this request shape for each image:

uploaded = client.files.create(file=open(image_path, "rb"), purpose="vision")

response = client.responses.create(
    model="gpt-5.2",
    reasoning={"effort": "xhigh"},
    background=True,
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": TRANSCRIPTION_PROMPT},
                {"type": "input_image", "file_id": uploaded.id, "detail": "high"},
            ],
        }
    ],
)

Poll with client.responses.retrieve(response.id) every 10 seconds until terminal status.

Execution Rules

Use the Responses API (responses.create, responses.retrieve), not Chat Completions.
Use background=True for long transcriptions.
Upload and transcribe one PNG per request.
Preserve chunk ordering when assembling final .tex.
Keep logs for traceability and delimiter-based extraction.
On non-completed status (failed, cancelled, expired, incomplete), inspect the final payload and retry only affected chunks.

Repo-Specific Notes

Use transcribe_math_latex_background.py as the canonical transcription runner.
Follow README.md section Latest End-to-End Process (Sections 1.2 and 1.3 from a1.pdf) for the current canonical sequence.
Use references/workflow-template.md for reusable command templates.

OpenAI LaTeX Transcription Background

Use this skill to run a reliable LaTeX transcription workflow with transcribe_math_latex_background.py.

Preconditions

Ensure OPENAI_API_KEY is available in environment variables or a local .env file.
Ensure the repo has Python dependencies (openai, python-dotenv) installed.
Ensure system tools are available:
pdftoppm for rendering PDF pages to PNG
convert (ImageMagick) for cropping/append operations
pdflatex for final compile
Run transcription commands with python -u for unbuffered polling logs.

Core Workflow

Render source PDF pages to PNG:

pdftoppm -f <start_page> -l <end_page> -png input.pdf /tmp/latex_ocr/pages/page

Build section images in reading order:

Crop section boundaries as needed.
Append contiguous page segments when they belong to one section:

convert /tmp/latex_ocr/pages/page-000006.png /tmp/latex_ocr/pages/page-000007.png -append /tmp/latex_ocr/sections/section_1_2.png

Split very long sections into contiguous chunks for reliability/runtime.
Submit one image per OpenAI call using the repo script:

python -u transcribe_math_latex_background.py /tmp/latex_ocr/sections/section_1_2.png | tee /tmp/latex_ocr/sections/section_1_2.log
python -u transcribe_math_latex_background.py /tmp/latex_ocr/sections/section_1_3_part1.png | tee /tmp/latex_ocr/sections/section_1_3_part1.log

Repeat once per PNG chunk.
Keep each request single-image; do not batch multiple images in one request.

Monitor JSON events until terminal status:

submit_start, submitted, repeated poll, then final
Treat completed as success.

Extract transcribed LaTeX between delimiters:

===LATEX_BEGIN===
===LATEX_END===

Assemble extracted snippets in document reading order and compile with pdflatex.

API Pattern

Follow this request shape for each image:

uploaded = client.files.create(file=open(image_path, "rb"), purpose="vision")

response = client.responses.create(
    model="gpt-5.2",
    reasoning={"effort": "xhigh"},
    background=True,
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": TRANSCRIPTION_PROMPT},
                {"type": "input_image", "file_id": uploaded.id, "detail": "high"},
            ],
        }
    ],
)

Poll with client.responses.retrieve(response.id) every 10 seconds until terminal status.

Execution Rules

Use the Responses API (responses.create, responses.retrieve), not Chat Completions.
Use background=True for long transcriptions.
Upload and transcribe one PNG per request.
Preserve chunk ordering when assembling final .tex.
Keep logs for traceability and delimiter-based extraction.
On non-completed status (failed, cancelled, expired, incomplete), inspect the final payload and retry only affected chunks.

Repo-Specific Notes

Use transcribe_math_latex_background.py as the canonical transcription runner.
Follow README.md section Latest End-to-End Process (Sections 1.2 and 1.3 from a1.pdf) for the current canonical sequence.
Use references/workflow-template.md for reusable command templates.

openai-latex-transcription-background

OpenAI LaTeX Transcription Background

Preconditions

Core Workflow

API Pattern

Execution Rules

Repo-Specific Notes

More from this repository

More from this repository

OpenAI LaTeX Transcription Background

Preconditions

Core Workflow

API Pattern

Execution Rules

Repo-Specific Notes