تشغيل أي مهارة في Manus بنقرة واحدة

openai-latex-transcription-sync

Convert PDF textbook pages into LaTeX by rendering pages to single-page PNGs at 300 DPI and uploading one image per synchronous OpenAI Responses API request. Use when users ask for OCR/transcription of math-heavy pages, full prose-plus-math LaTeX extraction, or reliable one-page-at-a-time processing with ordered snippet assembly and pdflatex compilation.

تشغيل في Manus

نظرة عامة

أمر التثبيت

npx skills add https://github.com/EjayNg-AI/openai-audio-transcribe --skill openai-latex-transcription-sync

انسخ والصق هذا الأمر في Claude Code لتثبيت المهارة

المصدر

EjayNg-AI/openai-audio-transcribe

النجوم٠

التفرعات٠

آخر تحديث١٥ مارس ٢٠٢٦ في ١٢:٣٥

مستكشف الملفات

4 ملفات

SKILL.md

readonly

name	openai-latex-transcription-sync
description	Convert PDF textbook pages into LaTeX by rendering pages to single-page PNGs at 300 DPI and uploading one image per synchronous OpenAI Responses API request. Use when users ask for OCR/transcription of math-heavy pages, full prose-plus-math LaTeX extraction, or reliable one-page-at-a-time processing with ordered snippet assembly and pdflatex compilation.

OpenAI LaTeX Transcription Sync

Use this skill to run a reliable LaTeX transcription workflow with scripts/transcribe_math_latex_sync.py.

Preconditions

Ensure OPENAI_API_KEY is available in environment variables or a local .env file.
Ensure the repo has Python dependencies (openai, python-dotenv) installed.
Ensure system tools are available:
- pdftoppm for rendering PDF pages to PNG
- pdflatex for final compile
Run transcription commands with python -u for unbuffered request logs.

Core Workflow

Render source PDF pages to PNG at 300 DPI:

mkdir -p /tmp/latex_ocr/pages /tmp/latex_ocr/logs /tmp/latex_ocr/snippets /tmp/latex_ocr/final
pdftoppm -r 300 -f <start_page> -l <end_page> -png input.pdf /tmp/latex_ocr/pages/page

Use exactly one full rendered page per transcription image:

Do not append multiple pages into one image.
Do not split a page into sub-images, even when the page is dense.
The runner rejects any single-page image above 50 MB before upload.

Submit one image per OpenAI call using the skill-local script:

python -u .agents/skills/openai-latex-transcription-sync/scripts/transcribe_math_latex_sync.py /tmp/latex_ocr/pages/page-000006.png | tee /tmp/latex_ocr/logs/page-000006.log
python -u .agents/skills/openai-latex-transcription-sync/scripts/transcribe_math_latex_sync.py /tmp/latex_ocr/pages/page-000007.png | tee /tmp/latex_ocr/logs/page-000007.log

Repeat once per page image.
Keep each request single-image; do not batch multiple images in one request.
The runner accepts exactly one positional image_path.

Read the per-page request log:

request_start
response_complete
output_text_missing if the API marks the response completed but returns no LaTeX body content.
Treat exit code 0 with response_complete status completed as success.

Extract transcribed LaTeX between delimiters:

===LATEX_BEGIN===
===LATEX_END===
Successful runs emit both delimiters.
If delimiters are absent or output_text_missing appears, retry only that page.

Assemble extracted snippets in document reading order and compile with pdflatex.

API Pattern

Follow this request shape for each image:

uploaded = client.files.create(file=open(image_path, "rb"), purpose="vision")

response = client.responses.create(
    model="gpt-5.4",
    instructions=TRANSCRIPTION_INSTRUCTIONS,
    reasoning={"effort": "xhigh"},
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": TRANSCRIPTION_PROMPT},
                {"type": "input_image", "file_id": uploaded.id, "detail": "high"},
            ],
        }
    ],
)

Execution Rules

Use the Responses API (responses.create), not Chat Completions.
Put durable conversion policy in instructions=...; keep the user input_text short and task-specific.
Upload and transcribe exactly one PNG per request.
Render PDF pages at 300 DPI when this skill owns the PDF-to-image step.
Use exactly one full page image per request; do not append pages and do not split pages.
The runner rejects any single-page image above 50 MB before upload.
Keep the content array to one input_text item plus one input_image item; do not send multiple images in a single prompt.
Preserve visible headers and footers as document content, but omit page numbers.
Preserve page ordering when assembling final .tex.
Keep logs for traceability and delimiter-based extraction.
Treat completed responses with no emitted LaTeX as failures; the runner logs output_text_missing and exits non-zero.
On non-zero exit or missing delimiters, retry only affected pages.

Repo-Specific Notes

Use .agents/skills/openai-latex-transcription-sync/scripts/transcribe_math_latex_sync.py as the canonical transcription runner.
Use references/workflow-template.md as the canonical sequence and reusable command template.

المزيد من هذا المستودع

نفس المستودع

academic-notes-exercise-writer

EjayNg-AI/openai-audio-transcribe

Create a single LaTeX worksheet for Singapore secondary school math with concise notes, aligned new questions based on a named reference exercise, and very concise end-of-document solutions.

2026-03-150

apex-pdf-target-image-removal

EjayNg-AI/openai-audio-transcribe

Detect and remove known APEX targets from born-digital PDFs — logos, cartoons, and Must Do/Challenging badges — whether rendered as image XObjects or vector paths. Use when users want these recurring elements removed while preserving non-target content.

2026-03-150

openai-api-background-mode

EjayNg-AI/openai-audio-transcribe

Create and run Python scripts that submit OpenAI Responses API requests in background mode, poll response status periodically, and stream progress with unbuffered stdout so Codex can monitor long-running jobs in the harness. Use when tasks require background=True calls, response_id polling, or live progress monitoring from python -u output.

2026-03-150

openai-latex-transcription-background-mode

EjayNg-AI/openai-audio-transcribe

Convert PDF textbook pages into LaTeX by rendering pages to PNG at 300 DPI, uploading exactly one full page image per OpenAI Responses API request with background=True, and polling by response id until terminal status. Use when users ask for OCR/transcription of math-heavy pages and want API-side background work with strict one-page-per-request processing.

2026-03-150

openai-latex-transcription-background-orchestrator

EjayNg-AI/openai-audio-transcribe

Orchestrate simultaneous in-flight background-mode LaTeX transcription jobs by rendering a PDF to single-page PNGs at 300 DPI or consuming prepared single-page images, submitting many one-image Responses API requests with background=True, persisting response ids in a manifest, polling them to completion, extracting LaTeX snippets in order, and resuming after interruption. Use when users want real parallel background execution inside Codex rather than a single-image runner.

2026-03-150

pdf-artifact-removal

EjayNg-AI/openai-audio-transcribe

Detect and remove watermarks, headers, and footers from born-digital PDFs by targeting tagged /Artifact BDC/EMC blocks in the content stream. Use when users ask to remove watermarks, strip headers/footers, clean up PDF page furniture, or identify repeating overlay elements across PDF pages. Requires pikepdf and pymupdf (fitz).

2026-03-150

المصدر

EjayNg-AI

EjayNg-AI/openai-audio-transcribe

فتح مستودع GitHub عرض مستودعات المنشئ

أمر التثبيت

تنزيل

تشغيل في Manus

مفيد لـSOC

مطوّرو البرمجياتمهن الحاسوب والرياضيات15-1252L4

name	openai-latex-transcription-sync
description	Convert PDF textbook pages into LaTeX by rendering pages to single-page PNGs at 300 DPI and uploading one image per synchronous OpenAI Responses API request. Use when users ask for OCR/transcription of math-heavy pages, full prose-plus-math LaTeX extraction, or reliable one-page-at-a-time processing with ordered snippet assembly and pdflatex compilation.

OpenAI LaTeX Transcription Sync

Use this skill to run a reliable LaTeX transcription workflow with scripts/transcribe_math_latex_sync.py.

Preconditions

Ensure OPENAI_API_KEY is available in environment variables or a local .env file.
Ensure the repo has Python dependencies (openai, python-dotenv) installed.
Ensure system tools are available:
- pdftoppm for rendering PDF pages to PNG
- pdflatex for final compile
Run transcription commands with python -u for unbuffered request logs.

Core Workflow

Render source PDF pages to PNG at 300 DPI:

mkdir -p /tmp/latex_ocr/pages /tmp/latex_ocr/logs /tmp/latex_ocr/snippets /tmp/latex_ocr/final
pdftoppm -r 300 -f <start_page> -l <end_page> -png input.pdf /tmp/latex_ocr/pages/page

Use exactly one full rendered page per transcription image:

Do not append multiple pages into one image.
Do not split a page into sub-images, even when the page is dense.
The runner rejects any single-page image above 50 MB before upload.

Submit one image per OpenAI call using the skill-local script:

python -u .agents/skills/openai-latex-transcription-sync/scripts/transcribe_math_latex_sync.py /tmp/latex_ocr/pages/page-000006.png | tee /tmp/latex_ocr/logs/page-000006.log
python -u .agents/skills/openai-latex-transcription-sync/scripts/transcribe_math_latex_sync.py /tmp/latex_ocr/pages/page-000007.png | tee /tmp/latex_ocr/logs/page-000007.log

Repeat once per page image.
Keep each request single-image; do not batch multiple images in one request.
The runner accepts exactly one positional image_path.

Read the per-page request log:

request_start
response_complete
output_text_missing if the API marks the response completed but returns no LaTeX body content.
Treat exit code 0 with response_complete status completed as success.

Extract transcribed LaTeX between delimiters:

===LATEX_BEGIN===
===LATEX_END===
Successful runs emit both delimiters.
If delimiters are absent or output_text_missing appears, retry only that page.

Assemble extracted snippets in document reading order and compile with pdflatex.

API Pattern

Follow this request shape for each image:

uploaded = client.files.create(file=open(image_path, "rb"), purpose="vision")

response = client.responses.create(
    model="gpt-5.4",
    instructions=TRANSCRIPTION_INSTRUCTIONS,
    reasoning={"effort": "xhigh"},
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": TRANSCRIPTION_PROMPT},
                {"type": "input_image", "file_id": uploaded.id, "detail": "high"},
            ],
        }
    ],
)

Execution Rules

Use the Responses API (responses.create), not Chat Completions.
Put durable conversion policy in instructions=...; keep the user input_text short and task-specific.
Upload and transcribe exactly one PNG per request.
Render PDF pages at 300 DPI when this skill owns the PDF-to-image step.
Use exactly one full page image per request; do not append pages and do not split pages.
The runner rejects any single-page image above 50 MB before upload.
Keep the content array to one input_text item plus one input_image item; do not send multiple images in a single prompt.
Preserve visible headers and footers as document content, but omit page numbers.
Preserve page ordering when assembling final .tex.
Keep logs for traceability and delimiter-based extraction.
Treat completed responses with no emitted LaTeX as failures; the runner logs output_text_missing and exits non-zero.
On non-zero exit or missing delimiters, retry only affected pages.

Repo-Specific Notes

Use .agents/skills/openai-latex-transcription-sync/scripts/transcribe_math_latex_sync.py as the canonical transcription runner.
Use references/workflow-template.md as the canonical sequence and reusable command template.