| name | publisher-docx-skill |
| description | Converts bidirectionally between DOCX/PDF and Markdown, edits existing DOCX files programmatically, or prepares a journal-ready manuscript.
TRIGGER when: user needs DOCX↔Markdown conversion, wants to create a styled Word document from Markdown, needs to batch-edit/modify an existing DOCX file, or wants to produce a publisher-styled manuscript (e.g. Elsevier, ACS, IEEE, Nature, or a Korean generic profile).
DO NOT TRIGGER when: user only needs to convert non-DOCX formats (images, HTML, etc.) to Markdown — use markitdown instead. |
| license | MIT |
Document Conversion & Editing Toolkit
This skill provides bidirectional conversion between DOCX/PDF and Markdown, plus DOCX editing capabilities:
- DOCX/PDF → Markdown: Convert documents to clean, reader-friendly Markdown
- Markdown → DOCX: Create properly-styled Word documents from Markdown
- DOCX Editing: Modify existing DOCX files (find/replace, append, insert)
[!IMPORTANT]
Deep-interview intake comes first. When this skill is used, do not begin
conversion, editing, linting, or template selection until you first ask the
user one concise, deep-interview-style ambiguity-reduction question. The
question must clarify: (1) the intended document purpose/audience, (2) desired
output files (.docx, .pdf, Markdown, lint report, edited DOCX), (3) styling
target or preset (elsevier, acs, ieee, nature, korean-generic,
ko-executive-report-*, custom), (4) source file location and where outputs
should be saved, and (5) hard constraints such as language/locale, journal or
executive-report tone, deadline, font/color preferences, and what must stay out
of scope.
Suggested first question:
"Before I touch the files, what is the document's purpose/audience, which
output files do you want, which style or preset should it follow, where are the
source/output paths, and are there any hard constraints on language, fonts,
colors, publisher/report tone, deadline, or out-of-scope changes?"
If the user has already provided all of that information explicitly, restate
the resolved brief in one sentence and proceed. If the user explicitly says to
skip clarification, proceed but note the assumed preset/output choices before
running commands.
[!IMPORTANT]
If the user asks for an Elsevier / ACS / IEEE / Nature / korean-generic /
ko-executive-report manuscript or report,
do not start with the basic Markdown→DOCX path.
Start with:
python scripts/convert_md_to_docx.py draft.md out.docx --publisher <profile>
python scripts/slop_lint.py draft.md --publisher <profile> --report report.md
Use the plain convert_md_to_docx.py input.md output.docx command only when
no publisher-specific workflow is requested.
[!CAUTION] > Do not write your manuscript files into this skill folder.
Your input .md and output .docx/.pdf files belong in your working directory.
Run scripts using absolute paths to this skill folder, e.g.:
python /path/to/publisher-docx-skill/scripts/convert_md_to_docx.py ./input.md ./output.docx
The templates/, references/, and scripts/ subdirectories inside this skill
are intentionally extensible and are maintained by the skill author. Running
scripts/generate_docx_templates.py regenerates templates/docx/*.docx from
templates/registry.yaml; this is the supported way to update publisher styles.
When to Use
- Need production-grade Markdown from DOCX or PDF for docs, LLM context, or web publishing.
- Want to create DOCX documents from Markdown with proper styling.
- Need to edit existing DOCX files programmatically (batch replacements, content insertion).
- Documents include tables, images, footnotes, math, or tracked changes.
- Preparing a publisher-styled manuscript from Markdown? Prefer
convert_md_to_docx.py --publisher ... over the plain legacy Markdown→DOCX path.
- Want a non-destructive manuscript quality check before submission? Run
scripts/slop_lint.py after the publisher DOCX workflow.
Recommended workflow for submissions
If the user asks for a journal-ready manuscript, use this order:
python scripts/convert_md_to_docx.py draft.md out.docx --publisher <profile>
python scripts/slop_lint.py draft.md --publisher <profile> --report report.md
- Optional TeX-enabled path:
python scripts/convert_md_to_pdf.py draft.md out.pdf --publisher <profile>
Use the plain convert_md_to_docx.py input.md output.docx flow only when no publisher-specific styling is requested.
Prerequisites
- Core: python3
- DOCX → MD: pandoc
- PDF → MD: pymupdf (
pip install pymupdf)
- MD → DOCX: python-docx, markdown, beautifulsoup4 (
pip install python-docx markdown beautifulsoup4)
DOCX → Markdown Workflow
- Convert with pandoc (stable tables, extracted media):
pandoc "input.docx" \
-f docx -t gfm \
--wrap=none \
--extract-media="media" \
--reference-links \
--markdown-headings=atx \
--output "draft.md"
- Tracked changes: add
--track-changes=all (or accept/reject first).
- If table lines wrap, add
--columns=200 and re-run.
- Clean the Markdown:
python scripts/clean_markdown.py draft.md output.md
-
Review with the checklist: references/markdown_quality_checklist.md
-
(Optional) Inline images for self-contained MD:
python scripts/embed_images.py output.md output.inline.md --root media
PDF → Markdown Workflow
pip install pymupdf
python scripts/convert_pdf.py input.pdf output.md
This handles multi-column layouts, paragraph merging, and header/footer removal.
Journal Manuscript Workflow (NEW)
Generate a publisher-styled manuscript directly from a Markdown draft using the
--publisher flag. The skill ships a data-driven style registry at
templates/registry.yaml and a generated DOCX template at
templates/docx/<publisher>.docx. Supported publisher profiles are
elsevier, acs, ieee, nature, korean-generic,
ko-executive-report-core, ko-executive-report-navy, and
ko-executive-report-forest.
python scripts/convert_md_to_docx.py input.md output.docx --publisher elsevier
python scripts/convert_md_to_docx.py input.md output.docx --publisher acs
python scripts/convert_md_to_docx.py input.md output.docx --publisher ieee
python scripts/convert_md_to_docx.py input.md output.docx --publisher nature
python scripts/convert_md_to_docx.py input.md output.docx --publisher korean-generic
python scripts/convert_md_to_docx.py input.md output.docx --publisher ko-executive-report-core
python scripts/convert_md_to_docx.py input.md output.docx --publisher ko-executive-report-navy
python scripts/convert_md_to_docx.py input.md output.docx --publisher ko-executive-report-forest
YAML frontmatter contract
Put a YAML block at the top of your Markdown. The converter uses it to build
the title/author/affiliation/abstract/keyword block per the publisher's
design.title_block rules.
---
title: "Catalytic CO2 reduction on Cu/ZnO: a mechanistic study"
authors:
- { name: "Woojin Go", affiliation: 1, corresponding: true }
- { name: "Jane Doe", affiliation: [1, 2] }
affiliations:
1: "Department of Chemical Engineering, University X"
2: "Advanced Materials Institute, University Y"
abstract: "We report ..."
keywords: [CO2 reduction, Cu/ZnO, DFT, operando XPS]
corresponding_email: "wj@example.ac.kr"
---
Markdown → DOCX Workflow (Basic / non-publisher)
Use this path only when the user does not need publisher-specific styling.
python scripts/convert_md_to_docx.py input.md output.docx
Features:
- Headings (H1-H6 → Word heading styles)
- Bold, italic, inline code formatting
- Bullet and numbered lists
- Tables with proper styling
- Code blocks (monospace font)
- Links (as hyperlinks)
- Local images
With template:
python scripts/convert_md_to_docx.py input.md output.docx --template template.docx
What --publisher does automatically
When --publisher is set, the following anti-slop filters run on your text
before and after conversion:
- Micro-typography: straight
"..." → curly "...", 10-15 mg →
10–15 mg (en-dash + non-breaking space), word - word → word — word,
... → …, Fig. 1 / Table 1 / Eq. 1 → with non-breaking space.
- Heading auto-numbering:
Heading 1 → 1., Heading 2 → 1.1., etc.
- Booktabs tables: python-docx's
Table Grid (vertical rules) is replaced
with top/header-bottom/bottom horizontal rules only.
- Title block injection from YAML frontmatter (superscript affiliations,
corresponding-author marker, abstract label, keywords line).
- CJK font pairing: every run gets
w:eastAsia=Pretendard so Hangul in
the source renders as Pretendard rather than the default system CJK font.
- Banned-font / banned-style stripping: Calibri, Cambria, Aptos, Table
Grid, Intense Quote, List Paragraph, Subtle Emphasis are removed if they
leak in.
- Fenced code blocks and inline backtick code are preserved byte-for-byte.
Locale auto-detection
python scripts/convert_md_to_docx.py input.md output.docx --publisher elsevier --locale auto
--locale defaults to auto and detects Korean via Hangul codepoint ratio
(>5% → ko, 1–5% → mixed, else en). The Korean-specific lint rules are
scheduled for M6 of the journal enhancement work. The
korean-generic profile is available now for mixed KR/EN manuscripts that
need explicit Pretendard-first typography.
Regenerating templates
After editing templates/registry.yaml, rebuild the generated .docx files:
python scripts/generate_docx_templates.py
python scripts/generate_docx_templates.py --publisher elsevier
python scripts/generate_docx_templates.py --publisher acs
python scripts/generate_docx_templates.py --publisher ieee
python scripts/generate_docx_templates.py --publisher nature
python scripts/generate_docx_templates.py --publisher korean-generic
python scripts/generate_docx_templates.py --publisher ko-executive-report-core
python scripts/generate_docx_templates.py --publisher ko-executive-report-navy
python scripts/generate_docx_templates.py --publisher ko-executive-report-forest
Adding a new publisher is a data-only change: copy an existing block in
templates/registry.yaml, edit the values per the publisher's author guide,
and rerun the generator.
LaTeX / PDF path
Generate a submission-style PDF via pandoc + XeLaTeX:
python scripts/convert_md_to_pdf.py input.md output.pdf --publisher elsevier
python scripts/convert_md_to_pdf.py input.md output.pdf --publisher korean-generic
This path uses publisher-specific templates under templates/latex/<publisher>/
and checks for the required TeX Live class before compilation. If xelatex,
kpsewhich, or the required class is missing, the script exits non-zero with a
clear install hint instead of failing halfway through compilation.
Anti-slop lint
Run a non-destructive prose lint pass before submission:
python scripts/slop_lint.py input.md --publisher elsevier --report report.md
python scripts/slop_lint.py input.md --publisher nature --report report.md --fix-whitespace
Current rule coverage includes banned AI-tell phrases, hedging pileups, stock
openers/closers, citation-density heuristics, section-order checks, heading-case
checks, qualifier/adjective-stack warnings, and a safe whitespace/typography
autofix mode that writes a .fixed.md sibling.
Publisher Registry
The publisher registry lives in templates/registry.yaml.
- Edit the YAML to add or refine a publisher profile.
- Regenerate
.docx templates with scripts/generate_docx_templates.py.
- The DOCX, lint, and optional PDF paths all consume the same registry.
- This keeps publisher support data-driven rather than hard-coded.
Edit Existing DOCX (NEW)
Modify existing DOCX files with common operations:
Find & Replace
python scripts/edit_docx.py input.docx --replace "OLD_TEXT" "NEW_TEXT" -o output.docx
Add --case-insensitive for case-insensitive matching.
Append Markdown Content
python scripts/edit_docx.py input.docx --append-md extra_content.md -o output.docx
Insert at Placeholder
python scripts/edit_docx.py input.docx --insert-at "{{MARKER}}" "Replacement text" -o output.docx
Extract Text
python scripts/edit_docx.py input.docx --extract-text
Show Document Structure
python scripts/edit_docx.py input.docx --show-structure
Quality Checklist
- Headings: correct hierarchy, single blank line before, none empty.
- Tables: blank line before/after; merged cells rewritten if needed.
- Lists: consistent indentation; avoid hard wraps mid-bullet.
- Images: relative to media/; add alt text; widths removed.
- Math/footnotes: render and resolve; definitions present.
- Spacing: no trailing spaces; ≤2 consecutive blank lines.
References
| Script | Purpose |
|---|
scripts/clean_markdown.py | Post-conversion Markdown cleanup |
scripts/convert_pdf.py | PDF → Markdown conversion |
scripts/embed_images.py | Inline images as base64 |
scripts/convert_md_to_docx.py | Markdown → DOCX |
scripts/convert_md_to_pdf.py | Markdown → PDF (optional) |
scripts/slop_lint.py | Anti-slop prose lint |
scripts/visual_audit.py | Visual audit render smoke |
scripts/edit_docx.py | Edit existing DOCX |
references/markdown_quality_checklist.md | Quality review guide |
references/journal_style_spec.md | Cross-publisher style matrix |
references/frontmatter_schema.md | YAML frontmatter contract |
references/slop_rules.md | Lint rule definitions |
references/frontmatter_schema.md | YAML frontmatter contract |
references/slop_rules.md | Lint rule explanations |