| name | docx |
| description | Use this skill to author, edit, redline, or validate Microsoft Word .docx files. Covers creating new documents from markdown or templates, editing existing documents in place, generating tracked-changes redlines, adding comments, and accepting/rejecting revisions. For READING existing .docx files, use the harness `read` tool — do not invoke this skill. Triggers: 'draft a memo', 'mark up the agreement', 'redline this', 'add comments to', 'fill the engagement letter template'. Does NOT apply to .pdf, .xlsx, .pptx, or .doc (legacy Word). |
DOCX authoring, editing, redlining
Reading is not in scope. To read an existing .docx, use the harness read tool. It already returns structured text via pandoc. This skill is for writing, editing, and validating.
Quick reference
| Goal | Use |
|---|
| Generate a new doc from markdown | scripts/generate_from_md.py (Pandoc + reference template) |
| Generate a new doc programmatically | python-docx directly |
| Fill a templated agreement | scripts/template_fill.py (docxtpl / Jinja) |
| Edit an existing doc | scripts/unpack.py → mutate XML → scripts/pack.py |
| Produce a tracked-changes redline | scripts/redline.py |
| Add comments to a passage | scripts/comments_add.py |
| Accept all redlines | scripts/accept_changes.py |
| Validate a docx before delivery | scripts/validate.py (mandatory final step) |
All scripts live in workspace/skills/docx/scripts/ once the harness has set up the workspace. Invoke them via bash.
Creating a new document
Pick by what you have:
- Markdown content + a styled firm template →
generate_from_md.py input.md out.docx template.docx. Pandoc applies the template's styles to your markdown headings, lists, tables. Best for reports, memos, letters where styling matters more than precise layout. Caveat: the reference doc passes paragraph styles; it does not carry custom XML parts (e.g., comment threads).
- A template with named placeholders →
template_fill.py template.docx context.json out.docx. docxtpl renders Jinja2 expressions inside the template. Best for engagement letters, NDAs, structured agreements.
- Programmatic build → write Python using
python-docx. Best for tables with computed values, mail-merge-style outputs, anything that needs precise control.
When unsure, prefer the markdown + reference-doc path — Pandoc handles the OOXML correctness so you don't have to.
Editing an existing document
Three-step pattern:
python scripts/unpack.py input.docx workdir/
python scripts/pack.py workdir/ output.docx
python scripts/validate.py output.docx
Key files inside the unpacked tree:
word/document.xml — body content (paragraphs, runs, tables)
word/styles.xml — paragraph + run style definitions
word/numbering.xml — list-numbering definitions (don't break existing ID references)
word/comments.xml — comment thread content (created by comments_add.py)
word/header*.xml, word/footer*.xml — running headers/footers
[Content_Types].xml — MIME registration for every part. Edit when you add a new part type.
_rels/ and word/_rels/ — relationships between parts. Edit when you reference a new image, comment, or external resource.
Run-merging gotcha
Word writes adjacent runs (<w:r>) with identical formatting separately. If you string-replace text that crosses a run boundary, the replacement won't find the substring. unpack.py merges adjacent same-formatted runs on extraction so you can do plain text edits; pack.py is permissive about whatever run structure you write back.
Smart-quote escaping
Microsoft Word uses smart quotes (" " ' ') which must be XML-escaped or written as the actual code points. unpack.py substitutes them with XML entities for editing safety; pack.py reverses the substitution before zipping. Don't manually re-escape — let the scripts handle it.
Whitespace preservation
Trailing whitespace inside <w:t> elements is significant. Both unpack.py and pack.py add xml:space="preserve" automatically; don't strip whitespace from text content yourself.
Redlines (tracked changes)
python scripts/redline.py original.docx revised.docx redlined.docx \
--author "Reviewer" --date "2026-04-30"
Default mode shells out to Python-Redlines (MIT) which compares the two documents and emits proper <w:ins>/<w:del> revision elements. Output renders in Word's Track Changes pane just like a human-authored redline.
Manual mode
If --mode=manual is passed, the script falls back to a paragraph SequenceMatcher + word-level diff-match-patch pass. Use this when:
- Python-Redlines fails on a particular doc structure (rare).
- You want to control which paragraphs are diffed.
- The change is purely formatting (
<w:rPrChange> runs) rather than text.
What gets tracked
- Insertions and deletions of text
- Insertions and deletions of paragraphs (deleted-whole-paragraph case requires
<w:del/> inside <w:pPr><w:rPr> or you get an empty paragraph after acceptance)
- Run-property changes via
<w:rPrChange> (formatting-only revisions)
What doesn't get tracked
- Table cell additions/removals (Python-Redlines emits the cells as plain edits)
- Style-definition changes (changes to
styles.xml aren't revision-trackable)
- Image swaps
Comments
python scripts/comments_add.py document.docx comments.json
comments.json is a list of {anchor_text, author, comment} objects. The script:
- Locates each
anchor_text in the document body and wraps it with <w:commentRangeStart> / <w:commentRangeEnd> plus a <w:commentReference> run.
- Creates or appends to
word/comments.xml (with proper id assignment).
- Patches
[Content_Types].xml and word/_rels/document.xml.rels if commenting is being added for the first time.
Anchor matching is exact-string. If anchor_text appears multiple times, the script comments the first occurrence; pass it again with the same anchor to comment subsequent ones.
<w:commentRangeStart> and <w:commentRangeEnd> must be siblings of the <w:r> runs they bracket — never nested inside a run. If you write comments by hand, follow this rule.
Accept / reject changes
python scripts/accept_changes.py redlined.docx accepted.docx
Uses LibreOffice headless via a documented StarBasic macro:
Sub AcceptAllRedlines() ThisComponent.AcceptAllRedlines() ThisComponent.store() End Sub
LibreOffice must be installed and on PATH (soffice binary). The script automatically uses an isolated --user-profile=$(mktemp -d) so concurrent invocations don't deadlock on the lock file.
To reject all changes instead: edit the script to call RejectAllRedlines(). To accept selectively: this isn't supported — open the doc in Word.
Validation gate
Always run validate.py before declaring the task complete.
python scripts/validate.py output.docx
Checks:
- Round-trip ZIP integrity
- XML well-formedness for every part
- Schema validation against ECMA-376 (WordprocessingML) XSDs
- Content-type registration for every referenced part
- Relationship consistency (no dangling rIds)
Exit code 0 = valid. Non-zero exit code with line-number diagnostics = fix and re-pack.
Common pitfalls
- Legacy
.doc (binary) is not supported. Convert with soffice --convert-to docx input.doc first.
- List numbering breaks after edits. Numbering lives in
word/numbering.xml keyed by numId. If you delete a list, also delete its numId reference; if you reorder, don't change IDs.
- Headers and footers are separate parts (
word/header1.xml, etc.). Edits to body content don't touch them.
- Pandoc reference-doc passes paragraph styles only. Custom XML parts (comments, tracked changes baseline) are not carried over.
- Don't pretty-print whitespace inside
<w:t> elements. Pretty-printing breaks runs that depend on exact spacing.
- Tables: dual width specs. Each cell needs both
columnWidths (in <w:tblGrid>) and per-cell <w:tcW w:type="dxa">. Percentages (pct) render fine in Word but break in Google Docs.
Out of scope
- Reading: use the
read tool.
- Producing PDFs from .docx: pipe through
soffice --convert-to pdf after this skill is done.
- Signing / encryption / DRM.
- Word macros (
.docm with VBA).