| name | document-formula-bridge |
| description | Use when working on Windows with Word, MathType, PDF, DOCX, Markdown, and LaTeX files, especially when formulas must move between these formats without losing editability or visual fidelity. |
Document Formula Bridge
Use this skill on Windows when formulas must move safely between PDF, DOCX, Markdown, and LaTeX. It combines Word + MathType automation with academic PDF extraction.
Quick Start
Main scripts:
- DOCX LaTeX -> MathType:
scripts/convert-docx-latex-to-formulas.ps1
- DOCX audit:
scripts/audit-docx-formulas.ps1
- DOCX MathType OLE -> Markdown:
scripts/export-docx-to-md.ps1
- DOCX MathType OLE -> editable TeX DOCX:
scripts/convert-docx-mathtype-to-latex.ps1
- PDF -> Markdown with Marker:
scripts/pdf2md_marker.py
- PDF -> Markdown/LaTeX with Nougat:
scripts/pdf2latex.py
- Marker OpenAI-compatible helper:
scripts/marker_openai_compat_service.py
- DOCX export notes:
references/docx-to-markdown.md
- PDF extraction notes:
references/pdf-to-markdown.md
- Word/MathType troubleshooting:
references/troubleshooting.md
Supported Workflows
1. LaTeX text in DOCX -> MathType formulas
Use this when a .docx contains $...$ or $$...$$ text that should become native MathType objects.
$SKILL_DIR = "C:\path\to\document-formula-bridge"
powershell -NoProfile -ExecutionPolicy Bypass -File `
"$SKILL_DIR\scripts\convert-docx-latex-to-formulas.ps1" `
-InputPath "C:\path\to\document.docx"
Optional commands:
# Best-effort residual cleanup
powershell -NoProfile -ExecutionPolicy Bypass -File `
"$SKILL_DIR\scripts\convert-docx-latex-to-formulas.ps1" `
-InputPath "C:\path\to\document.docx" `
-AggressiveCleanup
# Preflight only
powershell -NoProfile -ExecutionPolicy Bypass -File `
"$SKILL_DIR\scripts\convert-docx-latex-to-formulas.ps1" `
-InputPath "C:\path\to\document.docx" `
-PreflightOnly
2. DOCX(MathType OLE) -> Markdown with formula images
Use formula-preserved when formulas must stay visually exact in Markdown.
$SKILL_DIR = "C:\path\to\document-formula-bridge"
powershell -NoProfile -ExecutionPolicy Bypass -File `
"$SKILL_DIR\scripts\export-docx-to-md.ps1" `
-InputPath "C:\path\to\document.docx" `
-Mode formula-preserved
Default outputs:
{basename}_formula-preserved.md
{basename}_formula-preserved_assets\
Requirements:
3. DOCX(MathType OLE) -> editable TeX DOCX copy -> Markdown with raw TeX
Use latex-raw when the formulas should become editable TeX text in a copied .docx and in the exported Markdown.
$SKILL_DIR = "C:\path\to\document-formula-bridge"
powershell -NoProfile -ExecutionPolicy Bypass -File `
"$SKILL_DIR\scripts\export-docx-to-md.ps1" `
-InputPath "C:\path\to\document.docx" `
-Mode latex-raw
Default outputs:
{basename}_latex.docx
{basename}_latex_raw.md
{basename}_latex_raw_assets\
Requirements:
- Microsoft Word
- MathType
- Python 3
4. Direct MathType OLE -> TeX conversion on a working copy
Use this lower-level helper when only the copied .docx is needed.
$SKILL_DIR = "C:\path\to\document-formula-bridge"
powershell -NoProfile -ExecutionPolicy Bypass -File `
"$SKILL_DIR\scripts\convert-docx-mathtype-to-latex.ps1" `
-SourcePath "C:\path\to\document.docx" `
-DestinationPath "C:\path\to\document_latex.docx"
5. Academic PDF -> Markdown with formulas using Marker
Use Marker by default for Chinese + English papers, scanned PDFs, or formula-heavy academic PDFs.
$SKILL_DIR = "C:\path\to\document-formula-bridge"
$PDF_PYTHON = "D:\anaconda3\envs\pdf-extractor\python.exe" # adjust to your machine
& $PDF_PYTHON "$SKILL_DIR\scripts\pdf2md_marker.py" `
"C:\path\to\paper.pdf" `
-o "C:\path\to\paper_output\paper.md"
Optional commands:
# Page-0 smoke test
& $PDF_PYTHON "$SKILL_DIR\scripts\pdf2md_marker.py" `
"C:\path\to\paper.pdf" `
--page-range "0" `
-o "C:\path\to\paper_output\page_01.md"
# Force OCR for scanned PDFs
& $PDF_PYTHON "$SKILL_DIR\scripts\pdf2md_marker.py" `
"C:\path\to\paper.pdf" `
--force-ocr `
-o "C:\path\to\paper_output\paper.md"
# Use current Codex-compatible provider settings for Marker LLM mode
& $PDF_PYTHON "$SKILL_DIR\scripts\pdf2md_marker.py" `
"C:\path\to\paper.pdf" `
--codex-gpt-5-2 `
-o "C:\path\to\paper_output\paper.llm.md"
6. English-first academic PDF -> Markdown/LaTeX with Nougat
Use Nougat when the paper is English-only or English-first and you want a faster OCR path.
$SKILL_DIR = "C:\path\to\document-formula-bridge"
$PDF_PYTHON = "D:\anaconda3\envs\pdf-extractor\python.exe" # adjust to your machine
& $PDF_PYTHON "$SKILL_DIR\scripts\pdf2latex.py" `
"C:\path\to\paper.pdf" `
-o "C:\path\to\paper_output\paper.mmd"
Guardrails
- Do not use WPS as a substitute for Microsoft Word when Word COM automation is required.
- Keep the original file untouched unless the script explicitly documents in-place behavior.
- Treat
formula-preserved and latex-raw as different outputs for different needs. One prioritizes visual fidelity; the other prioritizes editability.
- When a document carries revision meaning through red text, keep those color cues in the Markdown export.
- For MathType-heavy
.docx sources that will later be extracted as PDFs, prefer Microsoft Word native PDF export before running Marker or Nougat.
- Prefer Marker for Chinese or mixed-language papers, scanned PDFs, and complex math-heavy layouts.
- Prefer Nougat only for English-first PDFs when speed matters more than multilingual robustness.
- Keep PDF extraction self-contained. Use an existing Python environment with
marker-pdf or nougat-ocr; do not start installing packages mid-task.
- For PDF output, prefer one dedicated output folder per source PDF so the
.md file and image assets stay together.
Known Good Baseline
Word.Application should identify itself as Microsoft Word, not WPS.
- A validated MathType template path on this machine was
C:\Program Files (x86)\MathType\Office Support\32\MathType Commands 2016.dotm.
- Forward bulk conversion worked reliably through reflection-based
InvokeMember('Run', ...).
- Reverse OLE-to-TeX extraction worked reliably by selecting
Equation.DSMT4 inline shapes and running MathTypeCommands.UILib.MTCommand_TeXToggle.
- A validated Windows PDF extraction environment on this machine was
D:\anaconda3\envs\pdf-extractor\python.exe.
Read references/docx-to-markdown.md, references/pdf-to-markdown.md, and references/troubleshooting.md when the environment drifts away from that baseline.