تشغيل أي مهارة في Manus بنقرة واحدة

openai-latex-transcription-background-mode

Convert PDF textbook pages into LaTeX by rendering pages to PNG at 300 DPI, uploading exactly one full page image per OpenAI Responses API request with background=True, and polling by response id until terminal status. Use when users ask for OCR/transcription of math-heavy pages and want API-side background work with strict one-page-per-request processing.

تشغيل في Manus

نظرة عامة

أمر التثبيت

npx skills add https://github.com/EjayNg-AI/openai-audio-transcribe --skill openai-latex-transcription-background-mode

انسخ والصق هذا الأمر في Claude Code لتثبيت المهارة

المصدر

EjayNg-AI/openai-audio-transcribe

النجوم٠

التفرعات٠

آخر تحديث١٥ مارس ٢٠٢٦ في ١٢:٣٥

مستكشف الملفات

5 ملفات

SKILL.md

readonly

name	openai-latex-transcription-background-mode
description	Convert PDF textbook pages into LaTeX by rendering pages to PNG at 300 DPI, uploading exactly one full page image per OpenAI Responses API request with background=True, and polling by response id until terminal status. Use when users ask for OCR/transcription of math-heavy pages and want API-side background work with strict one-page-per-request processing.

OpenAI LaTeX Transcription Background Mode

Use this skill to run a strict one-page-per-request background-mode LaTeX transcription workflow.

The canonical PDF workflow uses scripts/transcribe_math_latex_background_single_page.py, which renders source pages at 300 DPI, submits one background request per page, retries only failed pages once, appends retry output to the same per-page log, and assembles the final LaTeX in reading order.

The lower-level primitive remains scripts/transcribe_math_latex_background_mode.py, which accepts exactly one already-rendered page image and handles one background request plus one polling loop. If you need manifest-driven resume, true simultaneous in-flight job management, or integrated high-concurrency orchestration, use openai-latex-transcription-background-orchestrator instead.

Preconditions

Ensure OPENAI_API_KEY is available in environment variables or a local .env file.
Ensure the repo has Python dependencies (openai, python-dotenv) installed.
Ensure system tools are available:
- pdftoppm for rendering PDF pages to PNG
- pdflatex for final compile
Run transcription commands with python -u for unbuffered submission and polling logs.

Core Workflow

Render source PDF pages to PNG at 300 DPI:

mkdir -p /tmp/latex_ocr/pages /tmp/latex_ocr/logs /tmp/latex_ocr/snippets /tmp/latex_ocr/final
pdftoppm -r 300 -f <start_page> -l <end_page> -png input.pdf /tmp/latex_ocr/pages/page

Use exactly one full rendered page image per background request:

Do not append multiple pages into one image.
Do not split a page into sub-images.
Use one page PNG per request, one per-page log file, and one final snippet per successful page.
When the canonical helper retries a page, it appends the retry attempt to that page's existing log file instead of creating a second log.
The runner rejects any single-page image above 50 MB before upload.

Run the canonical helper when starting from a PDF:

python -u .agents/skills/openai-latex-transcription-background-mode/scripts/transcribe_math_latex_background_single_page.py \
  --pdf-path input.pdf \
  --start-page <start_page> \
  --end-page <end_page> \
  --job-dir /tmp/latex_ocr

The helper retries each failed page once after the initial attempt.

Or, if you already have prepared single-page PNGs, submit one page image per OpenAI call using the low-level runner:

python -u .agents/skills/openai-latex-transcription-background-mode/scripts/transcribe_math_latex_background_mode.py /tmp/latex_ocr/pages/page-000006.png | tee /tmp/latex_ocr/logs/page-000006.log
python -u .agents/skills/openai-latex-transcription-background-mode/scripts/transcribe_math_latex_background_mode.py /tmp/latex_ocr/pages/page-000007.png | tee /tmp/latex_ocr/logs/page-000007.log

Repeat once per page image.
Keep each request single-image; do not batch multiple images in one request.
The runner accepts exactly one positional image_path, and that image must be a single rendered page.

Read the per-page background lifecycle log:

submit_start
submitted
repeated poll
final
output_text_missing if the API marks the response completed but returns no LaTeX body content.
Treat exit code 0 with final status completed as success.
The canonical helper appends retry attempts to the same page log, separated by ===RETRY_ATTEMPT===.

Extract transcribed LaTeX between delimiters:

===LATEX_BEGIN===
===LATEX_END===
Successful runs emit both delimiters.
If delimiters are absent or output_text_missing appears, retry only that page.

Assemble extracted snippets by concatenating them in document reading order, then compile with pdflatex.

API Pattern

Follow this request shape for each image:

uploaded = client.files.create(file=open(image_path, "rb"), purpose="vision")

response = client.responses.create(
    model="gpt-5.4",
    instructions=TRANSCRIPTION_INSTRUCTIONS,
    text={"format": {"type": "text"}, "verbosity": "high"},
    reasoning={"effort": "xhigh"},
    background=True,
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": TRANSCRIPTION_PROMPT},
                {"type": "input_image", "file_id": uploaded.id, "detail": "high"},
            ],
        }
    ],
)

Poll with client.responses.retrieve(response.id) every 10 seconds until terminal status.

Execution Rules

Use the Responses API (responses.create, responses.retrieve), not Chat Completions.
Put durable conversion policy in instructions=...; keep the user input_text short and task-specific.
Set text={"format": {"type": "text"}, "verbosity": "high"} on each create call.
Use background=True for each page request.
Upload and transcribe exactly one PNG per request.
Use exactly one full rendered page image per request.
Render PDF pages at 300 DPI when this skill owns the PDF-to-image step.
Do not append multiple pages into one image and do not split a page into fragments.
The runner rejects any single-page image above 50 MB before upload.
Keep the content array to one input_text item plus one input_image item; do not send multiple images in a single prompt.
Preserve visible headers and footers as document content, but omit page numbers.
Preserve page ordering when assembling final .tex.
Keep logs for traceability and delimiter-based extraction.
The low-level runner owns one page image, one background request, one polling loop, and one LaTeX result.
The helper owns PDF rendering at 300 DPI, one-page job creation, one retry per failed page, assembly, and optional compile.
Use the orchestrator skill when you want many in-flight jobs, concurrent polling across many response_ids, and manifest-based resume.
Treat completed responses with no emitted LaTeX as failures; the runner logs output_text_missing and exits non-zero.
On non-zero exit or missing delimiters, retry only affected pages.

Repo-Specific Notes

Use .agents/skills/openai-latex-transcription-background-mode/scripts/transcribe_math_latex_background_single_page.py as the canonical PDF-to-LaTeX background workflow.
Use .agents/skills/openai-latex-transcription-background-mode/scripts/transcribe_math_latex_background_mode.py as the low-level one-page background runner.
Use references/workflow-template.md as the canonical background-mode sequence and reusable command template.

المزيد من هذا المستودع

نفس المستودع

academic-notes-exercise-writer

EjayNg-AI/openai-audio-transcribe

Create a single LaTeX worksheet for Singapore secondary school math with concise notes, aligned new questions based on a named reference exercise, and very concise end-of-document solutions.

2026-03-150

apex-pdf-target-image-removal

EjayNg-AI/openai-audio-transcribe

Detect and remove known APEX targets from born-digital PDFs — logos, cartoons, and Must Do/Challenging badges — whether rendered as image XObjects or vector paths. Use when users want these recurring elements removed while preserving non-target content.

2026-03-150

openai-api-background-mode

EjayNg-AI/openai-audio-transcribe

Create and run Python scripts that submit OpenAI Responses API requests in background mode, poll response status periodically, and stream progress with unbuffered stdout so Codex can monitor long-running jobs in the harness. Use when tasks require background=True calls, response_id polling, or live progress monitoring from python -u output.

2026-03-150

openai-latex-transcription-background-orchestrator

EjayNg-AI/openai-audio-transcribe

Orchestrate simultaneous in-flight background-mode LaTeX transcription jobs by rendering a PDF to single-page PNGs at 300 DPI or consuming prepared single-page images, submitting many one-image Responses API requests with background=True, persisting response ids in a manifest, polling them to completion, extracting LaTeX snippets in order, and resuming after interruption. Use when users want real parallel background execution inside Codex rather than a single-image runner.

2026-03-150

openai-latex-transcription-sync

EjayNg-AI/openai-audio-transcribe

Convert PDF textbook pages into LaTeX by rendering pages to single-page PNGs at 300 DPI and uploading one image per synchronous OpenAI Responses API request. Use when users ask for OCR/transcription of math-heavy pages, full prose-plus-math LaTeX extraction, or reliable one-page-at-a-time processing with ordered snippet assembly and pdflatex compilation.

2026-03-150

pdf-artifact-removal

EjayNg-AI/openai-audio-transcribe

Detect and remove watermarks, headers, and footers from born-digital PDFs by targeting tagged /Artifact BDC/EMC blocks in the content stream. Use when users ask to remove watermarks, strip headers/footers, clean up PDF page furniture, or identify repeating overlay elements across PDF pages. Requires pikepdf and pymupdf (fitz).

2026-03-150

المصدر

EjayNg-AI

EjayNg-AI/openai-audio-transcribe

فتح مستودع GitHub عرض مستودعات المنشئ

أمر التثبيت

تنزيل

تشغيل في Manus

مفيد لـSOC

مطوّرو البرمجياتمهن الحاسوب والرياضيات15-1252L4

name	openai-latex-transcription-background-mode
description	Convert PDF textbook pages into LaTeX by rendering pages to PNG at 300 DPI, uploading exactly one full page image per OpenAI Responses API request with background=True, and polling by response id until terminal status. Use when users ask for OCR/transcription of math-heavy pages and want API-side background work with strict one-page-per-request processing.

OpenAI LaTeX Transcription Background Mode

Use this skill to run a strict one-page-per-request background-mode LaTeX transcription workflow.

Preconditions

Ensure OPENAI_API_KEY is available in environment variables or a local .env file.
Ensure the repo has Python dependencies (openai, python-dotenv) installed.
Ensure system tools are available:
- pdftoppm for rendering PDF pages to PNG
- pdflatex for final compile
Run transcription commands with python -u for unbuffered submission and polling logs.

Core Workflow

Render source PDF pages to PNG at 300 DPI:

mkdir -p /tmp/latex_ocr/pages /tmp/latex_ocr/logs /tmp/latex_ocr/snippets /tmp/latex_ocr/final
pdftoppm -r 300 -f <start_page> -l <end_page> -png input.pdf /tmp/latex_ocr/pages/page

Use exactly one full rendered page image per background request:

Do not append multiple pages into one image.
Do not split a page into sub-images.
Use one page PNG per request, one per-page log file, and one final snippet per successful page.
When the canonical helper retries a page, it appends the retry attempt to that page's existing log file instead of creating a second log.
The runner rejects any single-page image above 50 MB before upload.

Run the canonical helper when starting from a PDF:

python -u .agents/skills/openai-latex-transcription-background-mode/scripts/transcribe_math_latex_background_single_page.py \
  --pdf-path input.pdf \
  --start-page <start_page> \
  --end-page <end_page> \
  --job-dir /tmp/latex_ocr

The helper retries each failed page once after the initial attempt.

Or, if you already have prepared single-page PNGs, submit one page image per OpenAI call using the low-level runner:

python -u .agents/skills/openai-latex-transcription-background-mode/scripts/transcribe_math_latex_background_mode.py /tmp/latex_ocr/pages/page-000006.png | tee /tmp/latex_ocr/logs/page-000006.log
python -u .agents/skills/openai-latex-transcription-background-mode/scripts/transcribe_math_latex_background_mode.py /tmp/latex_ocr/pages/page-000007.png | tee /tmp/latex_ocr/logs/page-000007.log

Repeat once per page image.
Keep each request single-image; do not batch multiple images in one request.
The runner accepts exactly one positional image_path, and that image must be a single rendered page.

Read the per-page background lifecycle log:

submit_start
submitted
repeated poll
final
output_text_missing if the API marks the response completed but returns no LaTeX body content.
Treat exit code 0 with final status completed as success.
The canonical helper appends retry attempts to the same page log, separated by ===RETRY_ATTEMPT===.

Extract transcribed LaTeX between delimiters:

===LATEX_BEGIN===
===LATEX_END===
Successful runs emit both delimiters.
If delimiters are absent or output_text_missing appears, retry only that page.

Assemble extracted snippets by concatenating them in document reading order, then compile with pdflatex.

API Pattern

Follow this request shape for each image:

uploaded = client.files.create(file=open(image_path, "rb"), purpose="vision")

response = client.responses.create(
    model="gpt-5.4",
    instructions=TRANSCRIPTION_INSTRUCTIONS,
    text={"format": {"type": "text"}, "verbosity": "high"},
    reasoning={"effort": "xhigh"},
    background=True,
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": TRANSCRIPTION_PROMPT},
                {"type": "input_image", "file_id": uploaded.id, "detail": "high"},
            ],
        }
    ],
)

Poll with client.responses.retrieve(response.id) every 10 seconds until terminal status.

Execution Rules

Use the Responses API (responses.create, responses.retrieve), not Chat Completions.
Put durable conversion policy in instructions=...; keep the user input_text short and task-specific.
Set text={"format": {"type": "text"}, "verbosity": "high"} on each create call.
Use background=True for each page request.
Upload and transcribe exactly one PNG per request.
Use exactly one full rendered page image per request.
Render PDF pages at 300 DPI when this skill owns the PDF-to-image step.
Do not append multiple pages into one image and do not split a page into fragments.
The runner rejects any single-page image above 50 MB before upload.
Keep the content array to one input_text item plus one input_image item; do not send multiple images in a single prompt.
Preserve visible headers and footers as document content, but omit page numbers.
Preserve page ordering when assembling final .tex.
Keep logs for traceability and delimiter-based extraction.
The low-level runner owns one page image, one background request, one polling loop, and one LaTeX result.
The helper owns PDF rendering at 300 DPI, one-page job creation, one retry per failed page, assembly, and optional compile.
Use the orchestrator skill when you want many in-flight jobs, concurrent polling across many response_ids, and manifest-based resume.
Treat completed responses with no emitted LaTeX as failures; the runner logs output_text_missing and exits non-zero.
On non-zero exit or missing delimiters, retry only affected pages.

Repo-Specific Notes

Use .agents/skills/openai-latex-transcription-background-mode/scripts/transcribe_math_latex_background_single_page.py as the canonical PDF-to-LaTeX background workflow.
Use .agents/skills/openai-latex-transcription-background-mode/scripts/transcribe_math_latex_background_mode.py as the low-level one-page background runner.
Use references/workflow-template.md as the canonical background-mode sequence and reusable command template.