Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

openai-latex-transcription-background-orchestrator

Orchestrate simultaneous in-flight background-mode LaTeX transcription jobs by rendering a PDF to single-page PNGs at 300 DPI or consuming prepared single-page images, submitting many one-image Responses API requests with background=True, persisting response ids in a manifest, polling them to completion, extracting LaTeX snippets in order, and resuming after interruption. Use when users want real parallel background execution inside Codex rather than a single-image runner.

In Manus ausführen

Überblick

Installationsbefehl

npx skills add https://github.com/EjayNg-AI/openai-audio-transcribe --skill openai-latex-transcription-background-orchestrator

Kopieren Sie diesen Befehl und fügen Sie ihn in Claude Code ein, um den Skill zu installieren

Quelle

EjayNg-AI/openai-audio-transcribe

Sterne0

Forks0

Aktualisiert15. März 2026 um 12:35

Datei-Explorer

4 Dateien

SKILL.md

readonly

name

openai-latex-transcription-background-orchestrator

description

OpenAI LaTeX Transcription Background Orchestrator

Use this skill when the task requires simultaneous in-flight background jobs, manifest-based resume, ordered extraction across many page images, and integrated local PDF rendering.

This skill sits above the lower-level background-mode runner pattern. It is the right place for multi-job submission, concurrency caps, response-id persistence, polling, retry, and ordered snippet assembly.

Preconditions

Ensure OPENAI_API_KEY is available in environment variables or a local .env file.
Ensure the repo has Python dependencies (openai, python-dotenv) installed.
Ensure system tools are available:
- pdftoppm for rendering PDF pages to PNG if you need to create page images locally.
- pdflatex if you plan to compile the final LaTeX output.
Use one single-page image per API request.

What This Skill Owns

manifest generation for many single-page images
PDF page rendering to single-page PNGs
bounded parallel submission with background=True
response-id persistence
concurrent polling across many in-flight jobs
retry handling for failed submissions, failed terminal pages, and completed responses that return no output_text
snippet extraction and ordered assembly
leaving a final body.tex ready for a separate document wrapper and pdflatex compile step when every page completes successfully

What This Skill Does Not Require

It does not require you to keep one local worker blocked per request. That is the point of using background mode here.

Core Workflow

For a fresh run, provide exactly one input source:

--pdf-path to render pages locally at 300 DPI inside <job_dir>
--image-glob to consume an existing ordered set of prepared single-page images

Run the orchestrator on the chosen input source:

python -u .agents/skills/openai-latex-transcription-background-orchestrator/scripts/orchestrate_background_latex_transcription.py \
  --pdf-path input.pdf \
  --start-page 1 \
  --end-page 20 \
  --job-dir math_documents/jobs/example_job \
  --max-in-flight 12 \
  --poll-seconds 10

Prepared-image alternative:

python -u .agents/skills/openai-latex-transcription-background-orchestrator/scripts/orchestrate_background_latex_transcription.py \
  --image-glob 'prepared_pages/page-*.png' \
  --job-dir math_documents/jobs/example_job \
  --max-in-flight 12 \
  --poll-seconds 10

Let the orchestrator:

render page PNGs locally at 300 DPI when --pdf-path is used
create or resume state/manifest.json
submit multiple background jobs in parallel
persist each response_id immediately
poll all outstanding jobs
write one snippet file per completed page
assemble snippets in order only when all pages complete successfully

Execution Rules

Use the Responses API (responses.create, responses.retrieve) with background=True.
Set text={"format": {"type": "text"}, "verbosity": "high"} on each create call.
Use one image per request.
Use exactly one full page image per request; do not append pages and do not split pages.
For a fresh run, use exactly one of --pdf-path or --image-glob. Use --resume by itself to continue an existing manifest.
Render PDF pages at 300 DPI when this skill owns the PDF-to-image step.
Preserve ordering in the manifest with explicit order_index.
Keep the orchestrator manifest on disk so interrupted sessions can resume.
Fresh non-resume runs with --pdf-path rebuild the derived pages/, logs/, responses/, snippets/, final/, and state/ outputs inside <job_dir>.
Fresh non-resume runs with --image-glob rebuild the derived logs/, responses/, snippets/, final/, and state/ outputs inside <job_dir>, but leave the provided source page images untouched.
Retry only failed pages, not the whole job.
Treat page images above 50 MB as errors.
Prebuilt --image-glob inputs are size-checked against the hard limit before manifest creation.
--image-glob inputs are sorted naturally before order_index assignment.
--image-glob inputs are assigned synthetic sequential page_id values such as page-0001, page-0002, and so on after sorting.
If a completed response returns no output_text, treat that page as failed and retry within the normal retry budget.
If polling a live response_id fails transiently, keep the page submitted and retry polling on the next loop.
--retry-limit counts retries after the initial attempt, so the default --retry-limit 2 means up to 3 total attempts per page.
On resume, the current --retry-limit value redefines the total attempt budget for each page, bounded below by attempts already used, and previously exhausted pages reopen automatically when the new budget allows another attempt.
Exit non-zero if any pages remain unresolved after retries; inspect the manifest and per-page logs before resuming.

Log and Console Events

Each logs/<page_id>.log file is JSONL and can contain submit_start, submitted, poll, final, submit_error, poll_error, output_text_missing, retry_budget_extended, resume_state_repaired, and snippet_missing_on_resume.
The orchestrator writes top-level JSON events to stdout: orchestrator_start, orchestrator_complete, and orchestrator_incomplete.

Files Produced

Inside <job_dir> the orchestrator writes:

state/manifest.json
pages/page-*.png when --pdf-path is used
logs/<page_id>.log
responses/<page_id>.json
snippets/<page_id>.tex
final/body.tex after all pages complete successfully

Repo-Specific Notes

Use .agents/skills/openai-latex-transcription-background-orchestrator/scripts/orchestrate_background_latex_transcription.py as the canonical multi-job background orchestrator.
Use references/workflow-template.md for a concrete execution template.

Mehr aus diesem Repository

gleiches Repository

academic-notes-exercise-writer

EjayNg-AI/openai-audio-transcribe

Create a single LaTeX worksheet for Singapore secondary school math with concise notes, aligned new questions based on a named reference exercise, and very concise end-of-document solutions.

2026-03-150

apex-pdf-target-image-removal

EjayNg-AI/openai-audio-transcribe

Detect and remove known APEX targets from born-digital PDFs — logos, cartoons, and Must Do/Challenging badges — whether rendered as image XObjects or vector paths. Use when users want these recurring elements removed while preserving non-target content.

2026-03-150

openai-api-background-mode

EjayNg-AI/openai-audio-transcribe

Create and run Python scripts that submit OpenAI Responses API requests in background mode, poll response status periodically, and stream progress with unbuffered stdout so Codex can monitor long-running jobs in the harness. Use when tasks require background=True calls, response_id polling, or live progress monitoring from python -u output.

2026-03-150

openai-latex-transcription-background-mode

EjayNg-AI/openai-audio-transcribe

Convert PDF textbook pages into LaTeX by rendering pages to PNG at 300 DPI, uploading exactly one full page image per OpenAI Responses API request with background=True, and polling by response id until terminal status. Use when users ask for OCR/transcription of math-heavy pages and want API-side background work with strict one-page-per-request processing.

2026-03-150

openai-latex-transcription-sync

EjayNg-AI/openai-audio-transcribe

Convert PDF textbook pages into LaTeX by rendering pages to single-page PNGs at 300 DPI and uploading one image per synchronous OpenAI Responses API request. Use when users ask for OCR/transcription of math-heavy pages, full prose-plus-math LaTeX extraction, or reliable one-page-at-a-time processing with ordered snippet assembly and pdflatex compilation.

2026-03-150

pdf-artifact-removal

EjayNg-AI/openai-audio-transcribe

Detect and remove watermarks, headers, and footers from born-digital PDFs by targeting tagged /Artifact BDC/EMC blocks in the content stream. Use when users ask to remove watermarks, strip headers/footers, clean up PDF page furniture, or identify repeating overlay elements across PDF pages. Requires pikepdf and pymupdf (fitz).

2026-03-150

Quelle

EjayNg-AI

EjayNg-AI/openai-audio-transcribe

GitHub-Repository öffnen Creator-Repositorys ansehen

Installationsbefehl

Download

In Manus ausführen

Nützlich fürSOC

SoftwareentwicklerInformatik- und Mathematikberufe15-1252L4

name

openai-latex-transcription-background-orchestrator

description

OpenAI LaTeX Transcription Background Orchestrator

Use this skill when the task requires simultaneous in-flight background jobs, manifest-based resume, ordered extraction across many page images, and integrated local PDF rendering.

Preconditions

Ensure OPENAI_API_KEY is available in environment variables or a local .env file.
Ensure the repo has Python dependencies (openai, python-dotenv) installed.
Ensure system tools are available:
- pdftoppm for rendering PDF pages to PNG if you need to create page images locally.
- pdflatex if you plan to compile the final LaTeX output.
Use one single-page image per API request.

What This Skill Owns

manifest generation for many single-page images
PDF page rendering to single-page PNGs
bounded parallel submission with background=True
response-id persistence
concurrent polling across many in-flight jobs
retry handling for failed submissions, failed terminal pages, and completed responses that return no output_text
snippet extraction and ordered assembly
leaving a final body.tex ready for a separate document wrapper and pdflatex compile step when every page completes successfully

What This Skill Does Not Require

It does not require you to keep one local worker blocked per request. That is the point of using background mode here.

Core Workflow

For a fresh run, provide exactly one input source:

--pdf-path to render pages locally at 300 DPI inside <job_dir>
--image-glob to consume an existing ordered set of prepared single-page images

Run the orchestrator on the chosen input source:

python -u .agents/skills/openai-latex-transcription-background-orchestrator/scripts/orchestrate_background_latex_transcription.py \
  --pdf-path input.pdf \
  --start-page 1 \
  --end-page 20 \
  --job-dir math_documents/jobs/example_job \
  --max-in-flight 12 \
  --poll-seconds 10

Prepared-image alternative:

python -u .agents/skills/openai-latex-transcription-background-orchestrator/scripts/orchestrate_background_latex_transcription.py \
  --image-glob 'prepared_pages/page-*.png' \
  --job-dir math_documents/jobs/example_job \
  --max-in-flight 12 \
  --poll-seconds 10

Let the orchestrator:

render page PNGs locally at 300 DPI when --pdf-path is used
create or resume state/manifest.json
submit multiple background jobs in parallel
persist each response_id immediately
poll all outstanding jobs
write one snippet file per completed page
assemble snippets in order only when all pages complete successfully

Execution Rules

Use the Responses API (responses.create, responses.retrieve) with background=True.
Set text={"format": {"type": "text"}, "verbosity": "high"} on each create call.
Use one image per request.
Use exactly one full page image per request; do not append pages and do not split pages.
For a fresh run, use exactly one of --pdf-path or --image-glob. Use --resume by itself to continue an existing manifest.
Render PDF pages at 300 DPI when this skill owns the PDF-to-image step.
Preserve ordering in the manifest with explicit order_index.
Keep the orchestrator manifest on disk so interrupted sessions can resume.
Fresh non-resume runs with --pdf-path rebuild the derived pages/, logs/, responses/, snippets/, final/, and state/ outputs inside <job_dir>.
Fresh non-resume runs with --image-glob rebuild the derived logs/, responses/, snippets/, final/, and state/ outputs inside <job_dir>, but leave the provided source page images untouched.
Retry only failed pages, not the whole job.
Treat page images above 50 MB as errors.
Prebuilt --image-glob inputs are size-checked against the hard limit before manifest creation.
--image-glob inputs are sorted naturally before order_index assignment.
--image-glob inputs are assigned synthetic sequential page_id values such as page-0001, page-0002, and so on after sorting.
If a completed response returns no output_text, treat that page as failed and retry within the normal retry budget.
If polling a live response_id fails transiently, keep the page submitted and retry polling on the next loop.
--retry-limit counts retries after the initial attempt, so the default --retry-limit 2 means up to 3 total attempts per page.
On resume, the current --retry-limit value redefines the total attempt budget for each page, bounded below by attempts already used, and previously exhausted pages reopen automatically when the new budget allows another attempt.
Exit non-zero if any pages remain unresolved after retries; inspect the manifest and per-page logs before resuming.

Log and Console Events

Each logs/<page_id>.log file is JSONL and can contain submit_start, submitted, poll, final, submit_error, poll_error, output_text_missing, retry_budget_extended, resume_state_repaired, and snippet_missing_on_resume.
The orchestrator writes top-level JSON events to stdout: orchestrator_start, orchestrator_complete, and orchestrator_incomplete.

Files Produced

Inside <job_dir> the orchestrator writes:

state/manifest.json
pages/page-*.png when --pdf-path is used
logs/<page_id>.log
responses/<page_id>.json
snippets/<page_id>.tex
final/body.tex after all pages complete successfully

Repo-Specific Notes

Use .agents/skills/openai-latex-transcription-background-orchestrator/scripts/orchestrate_background_latex_transcription.py as the canonical multi-job background orchestrator.
Use references/workflow-template.md for a concrete execution template.