| name | officecli |
| description | Create, analyze, proofread, and modify Office and PDF documents (.docx, .xlsx, .pptx, .pdf) using the officecli CLI tool. Use when the user wants to create, inspect, check formatting, find issues, add charts, or modify Office or PDF documents. |
officecli
AI-friendly CLI for .docx, .xlsx, .pptx, .pdf. Single binary (pure Rust), no dependencies, no Office installation needed.
Install
If officecli is not installed:
curl -fsSL https://d.officecli.ai/install.sh | bash
irm https://d.officecli.ai/install.ps1 | iex
Verify with officecli --version. If still not found after install, open a new terminal.
Strategy
L1 (read) → L2 (DOM edit) → L3 (raw XML). Always prefer higher layers. Add --json for structured output.
Before doc work, check Specialized Skills (bottom of this file). Fundraising decks, academic papers, financial models, dashboards, and Morph animations need their own skill loaded first — load_skill once, then proceed.
Help System (IMPORTANT)
When unsure about property names, value formats, or command syntax, ALWAYS run help instead of guessing. One help query beats guess-fail-retry loops.
officecli help ≡ officecli --help, and officecli <cmd> --help ≡ officecli help <cmd> — same content.
officecli help
officecli help docx
officecli help docx paragraph
officecli help docx set paragraph
officecli help docx paragraph --json
Format aliases: word→docx, excel→xlsx, ppt/powerpoint→pptx. Verbs: add, set, get, query, remove. MCP exposes the same schema via {"command":"help","format":"docx","type":"paragraph"}.
Performance: Resident Mode
Every command auto-starts a resident on first access (60s idle timeout) — file-lock conflicts are automatically avoided. Explicit open/close is still recommended for longer sessions (12min idle):
officecli open report.docx
officecli set report.docx ...
officecli close report.docx
Opt out of auto-start: OFFICECLI_NO_AUTO_RESIDENT=1.
Quick Start
PPT:
officecli create slides.pptx
officecli add slides.pptx / --type slide --prop title="Q4 Report" --prop background=1A1A2E
officecli add slides.pptx '/slide[1]' --type shape --prop text="Revenue grew 25%" --prop x=2cm --prop y=5cm --prop font=Arial --prop size=24 --prop color=FFFFFF
Word:
officecli create report.docx
officecli add report.docx /body --type paragraph --prop text="Executive Summary" --prop style=Heading1
officecli add report.docx /body --type paragraph --prop text="Revenue increased by 25% year-over-year."
Excel:
officecli create data.xlsx
officecli set data.xlsx /Sheet1/A1 --prop value="Name" --prop bold=true
officecli set data.xlsx /Sheet1/A2 --prop value="Alice"
PDF:
officecli view report.pdf --mode text
officecli view report.pdf --mode outline
officecli get report.pdf '/page[1]' --json
officecli extract-text report.pdf --with-offsets --json
L1: Create, Read & Inspect
officecli create <file>
officecli view <file> <mode>
officecli get <file> <path> --depth N
officecli extract-text <file> --with-offsets --json
officecli query <file> <selector>
officecli validate <file>
view modes
| Mode | Description | Useful flags |
|---|
outline | Document structure | |
stats | Statistics (pages, words, shapes) | |
issues | Formatting/content/structure problems | --type format|content|structure, --limit N |
text | Plain text extraction | --start N --end N, --max-lines N |
annotated | Text with formatting annotations | |
html | Static HTML snapshot — same renderer as watch, no server needed | --browser, --page N (pdf/docx), --start N --end N (pptx) |
screenshot / svg / pdf / forms | PNG via headless browser / SVG (pptx/pdf) / PDF via exporter plugin / form-fields JSON via format-handler plugin | -o, --screenshot-width/-height, pptx --grid N |
Use view html for one-shot snapshots (CI artifacts, archival, diffing); use watch when you need live refresh or browser-side click-to-select.
PDF Support
PDF is a first-class format — read, view, modify (text replace, page delete), and extract text with offsets.
officecli view report.pdf --mode text
officecli view report.pdf --mode outline
officecli view report.pdf --mode stats
officecli view report.pdf --mode issues
officecli get report.pdf '/page[1]' --json
officecli set report.pdf '/page[1]' --prop text="New content"
officecli remove report.pdf '/page[3]'
officecli save report.pdf
PDF path system: /page[N], /page[N]/text[N], /page[N]/image[N], /page[N]/annotation[N].
extract-text — Text/Offset → Path Mapping
Every format (docx, xlsx, pptx, pdf) can emit a TextOffsetMap — full text plus character-offset→path-ID mapping. AI agents read the map, find text to change, get the exact document path, and use set to modify precisely. No guessing, no regex parsing.
officecli extract-text report.docx --with-offsets --json
officecli extract-text budget.xlsx --with-offsets --json
officecli extract-text deck.pptx --with-offsets --json
officecli extract-text report.pdf --with-offsets --json
Output includes full_text, spans (each span has start, end, path, text, element_type), and meta.
get
Any XML path via element localName. Use --depth N to expand children. Add --json for structured output. Default text output is grep-friendly: path (type) "text" key=val key=val ...
officecli get report.docx '/body/p[3]' --depth 2 --json
officecli get slides.pptx '/slide[1]' --depth 1
officecli get data.xlsx '/Sheet1/B2' --json
Stable ID Addressing
Elements with stable IDs return @attr=value paths instead of positional indices. Prefer these in multi-step workflows — positional indices shift on insert/delete, stable IDs do not.
/slide[1]/shape[@id=550950021] # PPT shape
/slide[1]/table[@id=1388430425]/tr[1]/tc[2] # PPT table
/body/p[@paraId=1A2B3C4D] # Word paragraph
/comments/comment[@commentId=1] # Word comment
PPT also accepts @name= (e.g. shape[@name=Title 1]), with morph !! prefix awareness. Elements without stable IDs (slide, run, tr/tc, row) fall back to positional indices.
query
CSS-like selectors: [attr=value], [attr!=value], [attr~=text], [attr>=value], [attr<=value], :contains("text"), :empty, :has(formula), :no-alt.
officecli query report.docx 'paragraph[style=Normal] > run[font!=Arial]'
officecli query slides.pptx 'shape[fill=FF0000]'
Watch & Interactive Selection
Live HTML preview that auto-refreshes on every file change. Browsers can click / shift-click / box-drag to select shapes; the CLI can read the current browser selection and act on it.
officecli watch <file> [--port N]
officecli unwatch <file>
officecli goto <file> <path>
Open the printed http://localhost:N URL. Click to select; shift/cmd/ctrl+click to multi-select; drag from empty space to box-select. PPT/Word use blue outline; Excel uses native-style green selection (double-click cell to edit inline; drag a chart to reposition).
get <file> selected — read what the user clicked
officecli get <file> selected [--json]
Returns DocumentNodes for whatever is currently selected. Empty result if nothing selected. Exit code != 0 if no watch is running.
PATHS=$(officecli get deck.pptx selected --json | jq -r '.data.Results[].path')
for p in $PATHS; do officecli set deck.pptx "$p" --prop fill=FF0000; done
Key properties
- Selection survives file edits. Paths use stable
@id= form.
- All connected browsers share one selection. Last-write-wins.
- Same-file single-watch. A given file can have only one watch process at a time.
- Group shapes select as a whole. Drilling into individual children of a group is not supported in v1.
- Coverage:
.pptx shapes/pictures/tables/charts/connectors/groups; .docx top-level paragraphs and tables. Inherited layout/master decorations and Word nested elements (table cells, run-level) are not addressable. .xlsx does not emit data-path — mark/selection on xlsx always resolve stale=true (v2 candidate).
Marks — edit proposals waiting for review
Use mark when changes need human review BEFORE they hit the file. Marks live in the watch process only; a separate set pipeline applies accepted ones. For one-shot changes use set directly; for permanent file annotations use add --type comment (Word native).
officecli mark <file> <path> [--prop find=... color=... note=... tofix=... regex=true] [--json]
officecli unmark <file> [--path <p> | --all] [--json]
officecli get-marks <file> [--json]
Props: find (literal or regex when regex=true; raw form find='r"[abc]"'), color (hex / rgb(...) / 22 named whitelist), note, tofix (drives apply pipeline). Path must be data-path format from watch HTML — see subskills for full pipeline.
L2: DOM Operations
set — modify properties
officecli set <file> <path> --prop key=value [--prop ...]
Any XML attribute is settable via element path (found via get --depth N) — even attributes not currently present. Without find=, set applies format to the entire element.
Value formats:
| Type | Format | Examples |
|---|
| Colors | Hex (with/without #), named, RGB, theme | FF0000, #FF0000, red, rgb(255,0,0), accent1..accent6 |
| Spacing | Unit-qualified | 12pt, 0.5cm, 1.5x, 150% |
| Dimensions | EMU or suffixed | 914400, 2.54cm, 1in, 72pt, 96px |
Dotted-attr aliases — font.<attr> forms accepted on shape/run/paragraph/table/row/cell/section/styles, e.g. --prop font.color=red --prop font.bold=true --prop font.size=14pt. Run officecli help <fmt> <element> for the full list.
find — format or replace matched text
Use find= with set to target specific text for formatting or replacement. Format props are separate --prop flags — do NOT nest them.
officecli set doc.docx '/body/p[1]' --prop find=weather --prop bold=true --prop color=red
officecli set doc.docx '/body/p[1]' --prop 'find=\d+%' --prop regex=true --prop color=red
officecli set doc.docx / --prop find=draft --prop replace=final
officecli set slides.pptx / --prop find=draft --prop replace=final
Path controls search scope: / = whole document, /body/p[1] or /slide[N]/shape[M] = specific element, /header[1] / /footer[1] = headers/footers.
Notes:
- Case-sensitive by default. Case-insensitive:
--prop 'find=(?i)error' --prop regex=true
- Matches work across run boundaries
- No match = silent success.
--json includes "matched": N
- Excel: only
find + replace supported (no find + format props)
add — add elements or clone
officecli add <file> <parent> --type <type> [--prop ...]
officecli add <file> <parent> --type <type> --after <path> [--prop ...]
officecli add <file> <parent> --type <type> --before <path> [--prop ...]
officecli add <file> <parent> --type <type> --index N [--prop ...]
officecli add <file> <parent> --from <path>
--after, --before, --index are mutually exclusive. No position flag = append to end.
Element types (with aliases):
| Format | Types |
|---|
| pptx | slide (incl. hidden), shape (font.latin/ea/cs, direction=rtl, underline.color, effective.X+effective.X.src; arrow alias for rightArrow; slideMaster/slideLayout typed add/set/remove), picture (SVG, brightness/contrast/glow/shadow, rotation, link, tooltip), chart (direction=rtl, pieOfPie, barOfPie, axisLine/gridline per-attr setters, animation+chartBuild=byCategory |
| docx | paragraph (direction/font.latin/ea/cs, bold.cs/italic.cs/size.cs, lang.latin/ea/cs, wordWrap, framePr.*, tabs shorthand), run (lang slots, direction, underline.color, position half-pts, trackChange=ins|del|format with .author/.date), table (direction=rtl, hMerge, virtual column ops: add/remove/move/copyfrom on /body/tbl[N]/col), row (tr), cell (td), image, header/footer (direction), section (pageNumFmt full enum, direction=rtl, rtlGutter, pgBorders=box), bookmark, comment, footnote, endnote, formfield, sdt, chart, equation, field (28 types), hyperlink, style (direction, indents, pbdr, lineSpacing on Add/Set), toc, watermark, break, ole, num/abstractNum/lvl, tab, textbox/shape (full Add+Get; geometry, fill, line, wrap, alt, anchor). docDefaults.rtl, autoHyphenation, get / exposes locale + /comments /footnotes /endnotes. create --minimal for raw OOXML scaffolding. |
| xlsx | sheet (visible/hidden/veryHidden, print margins, printTitleRows/Cols, rightToLeft sheetView, cascade-aware rename), row (c{N}= cell-content shorthand; add accepts --from /Sheet/col[L]; formula-ref rewrite on insert), col (formula-ref rewrite, named-range follow on move), cell (type=richtext+runs, merge=range/sweep, direction=rtl, phonetic; --shift left|up on remove, shift=right|down on add — Excel UI dialog parity; formula auto-detect; OFFSET/INDIRECT in calc), chart (per-axis RTL/title, anchor=x,y,w,h, pareto), image (SVG), comment (direction=rtl), table (listobject), namedrange (definedname, volatile, [@name=X]; formula-body inlined at parse), pivottable (cache CoW + cross-pivot sharing, labelFilter, topN, fillDownLabels, calculatedField), sparkline, validation, autofilter, shape, textbox, CF (databar/colorscale/iconset/formulacf/cellIs/topN/aboveAverage), ole, csv. Query supports merge/mergedrange. Workbook: password. Shape selector enumerates leaves inside grpSp. |
Pivot tables (xlsx)
officecli add data.xlsx /Sheet1 --type pivottable \
--prop source="Sheet1!A1:E100" --prop rows=Region,Category \
--prop cols=Year --prop values="Sales:sum,Qty:count" \
--prop grandTotals=rows --prop subtotals=off --prop sort=asc
Key props: rows, cols, values (Field:func[:showDataAs]), filters, source, position, layout (compact/outline/tabular), repeatLabels, blankRows, aggregate, showDataAs (percent_of_total/row/col, running_total), grandTotals, subtotals, sort. Aggregators: sum, count, average, max, min, product, stdDev, stdDevp, var, varp, countNums. Date columns auto-group. Run officecli help xlsx pivottable for full schema.
Document-level properties (all formats)
officecli set doc.docx / --prop docDefaults.font=Arial --prop docDefaults.fontSize=11pt
officecli set doc.docx / --prop protection=forms --prop evenAndOddHeaders=true
officecli set data.xlsx / --prop calc.mode=manual --prop calc.refMode=r1c1
officecli set slides.pptx / --prop defaultFont=Arial --prop show.loop=true --prop print.what=handouts
Run officecli help <format> / for all document-level properties (docDefaults, docGrid, CJK spacing, calc, print, show, theme, extended).
Sort (xlsx)
officecli set data.xlsx /Sheet1 --prop sort="C desc" --prop sortHeader=true
officecli set data.xlsx '/Sheet1/A1:D100' --prop sort="A asc" --prop sortHeader=true
Format: COL DIR[, COL DIR ...]. Rejects ranges with merged cells or formulas. Sidecar metadata (hyperlinks, comments, conditional formatting, drawings) follows rows automatically.
Text-anchored insert (--after find:X / --before find:X)
Locate an insertion point by text match within a paragraph. Inline types (run, picture, hyperlink) insert within the paragraph; block types (table, paragraph) auto-split it. PPT only supports inline.
officecli add doc.docx '/body/p[1]' --type run --after find:weather --prop text=" (sunny)"
officecli add doc.docx '/body/p[1]' --type table --after "find:First sentence." --prop rows=2 --prop cols=2
Clone
officecli add <file> / --from '/slide[1]' — copies with all cross-part relationships.
move, swap, remove
officecli move <file> <path> [--to <parent>] [--index N] [--after <path>] [--before <path>]
officecli swap <file> <path1> <path2>
officecli remove <file> '/body/p[4]'
When using --after or --before, --to can be omitted — the target container is inferred from the anchor.
batch — multiple operations in one save cycle
Continues on error by default (returns exit 1 if any item fails). Use --stop-on-error to abort on the first failure. --force is the docx-protection bypass.
officecli dump <file> [<path>] emits a replayable batch JSON for round-trip — .docx (full coverage) and .pptx (text/tables/pictures/charts/notes/theme + OLE/3D/video/audio/SmartArt/morph/p15 transitions via raw-set passthrough). Path defaults to / (whole document); pass a subtree path (/body, /body/p[N], /body/tbl[N], /theme, /settings, /numbering, /styles) to scope the dump. officecli refresh <file.docx> recalculates TOC page numbers / PAGE / cross-references after replay (Word backend on Windows; headless-HTML fallback elsewhere). officecli plugins list extends support to .doc, .hwpx, .pdf export.
echo '[
{"command":"set","path":"/Sheet1/A1","props":{"value":"Name","bold":"true"}},
{"command":"set","path":"/Sheet1/B1","props":{"value":"Score","bold":"true"}}
]' | officecli batch data.xlsx --json
officecli batch data.xlsx --commands '[{"op":"set","path":"/Sheet1/A1","props":{"value":"Done"}}]' --json
officecli batch data.xlsx --input updates.json --force --json
Supports: add, set, get, query, remove, move, swap, view, raw, raw-set, validate. Fields: command (or op), path, parent, type, from, to, index, after, before, props, selector, mode, depth, part, xpath, action, xml.
L3: Raw XML
Use when L2 cannot express what you need. No xmlns declarations needed — prefixes auto-registered.
officecli raw <file> <part>
officecli raw-set <file> <part> --xpath "..." --action replace --xml '<w:p>...</w:p>'
officecli add-part <file> <parent>
raw-set actions: append, prepend, insertbefore, insertafter, replace, remove, setattr. Run officecli help <format> raw for available parts.
Common Pitfalls
| Pitfall | Correct Approach |
|---|
--name "foo" | Use --prop name="foo" — all attributes go through --prop |
Unquoted [N] paths in zsh/bash | Always quote: '/slide[1]' or "/slide[1]" (shell glob-expands brackets) |
PPT shape[1] for content | shape[1] is typically the title placeholder. Use shape[2]+ for content shapes |
/shape[myname] | Name indexing not supported. Use numeric index or @name= (PPT only) |
| Guessing property names | Run officecli help <format> <element> to see exact names |
| Modifying an open file | Close the file in PowerPoint/WPS first |
\n in shell strings | Use \\n for newlines in --prop text="..." |
$ in shell text | --prop text="$15M" strips $15. Use single quotes: --prop text='$15M', or heredoc batch |
Specialized Skills
officecli load_skill <name> — output is a SKILL.md, follow its rules.
Loading rule:
- Pick the most specific match in "When to use"; if none fits, load the format default (
word / pptx / excel / pdf).
- Scenes already contain the format default's rules — load one skill per artifact, never stack.
- Loaded rules persist across turns; don't re-load each reply.
- Two distinct artifacts → two separate loads.
Word (.docx)
| Name | When to use |
|---|
word | Reports, letters, memos, proposals, generic documents |
academic-paper | Journal / conference / thesis: APA / Chicago / IEEE / MLA citations, equations, SEQ + PAGEREF cross-refs, multi-column journal layout, bibliography. NOT for business reports or letters (route those to word) |
word-form | Fillable forms, contracts, SOW templates, intake forms, checklists with content controls (SDT), form fields, or document protection. NOT for normal reports or proposals (route those to word) |
PowerPoint (.pptx)
| Name | When to use |
|---|
pptx | Generic decks: board reviews, sales decks, all-hands, product launches |
pitch-deck | Fundraising only — seed / Series A-C / SAFE / convertible / strategic raise. NOT for sales / product / board decks (route those to pptx) |
morph-ppt | Cinematic Morph-animated presentations. NOT for static decks (route those to pptx) |
morph-ppt-3d | 3D Morph: GLB models, camera moves, depth. NOT for 2D-only Morph (route those to morph-ppt) |
Excel (.xlsx)
| Name | When to use |
|---|
excel | Generic workbooks, formulas, pivots, trackers |
financial-model | Financial models, scenarios, projections. NOT for general data analysis (route those to excel) |
data-dashboard | CSV/tabular data → KPI / analytics / executive dashboards with charts and sparklines. NOT for raw data tracking (route those to excel) |
PDF (.pdf)
| Name | When to use |
|---|
pdf | Reading, parsing, or extracting text; viewing outlines, stats, or formatting issues; rendering PDF pages to HTML or SVG; applying highlights, text color changes, or modifications on specific pages; deleting pages; replacing content stream text |
Example: a fundraising deck task → officecli load_skill pitch-deck → use the printed rules.
Notes
- Paths are 1-based (XPath convention):
'/body/p[3]' = third paragraph
--index is 0-based (array convention): --index 0 = first position
- Excel exception: for
add --type row and add --type col, --index N is 1-based (matches OOXML RowIndex / column letter index). --index 5 inserts at row 5 / column 5.
- After modifications, verify with
validate and/or view issues
- When unsure, run
officecli help <format> <element> instead of guessing