Run any Skill in Manus with one click

odt-convert

Stars4

Forks1

UpdatedMay 9, 2026 at 18:48

Convert ODT (OpenDocument Text) files to Markdown, with a separate threaded comments file. Extracts document body via pandoc and comment threads (with anchor text and reply grouping) via Python XML parsing. Also extracts embedded images and Visio diagrams (with PNG export). Triggers include 'convert odt', 'extract odt comments', 'odt to markdown', or when working with .odt files.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

lossyrob

lossyrob/skills

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Software DevelopersComputer and Mathematical Occupations·SOC 15-1252

File Explorer

5 files

SKILL.md

readonly

name	odt-convert
description	Convert ODT (OpenDocument Text) files to Markdown, with a separate threaded comments file. Extracts document body via pandoc and comment threads (with anchor text and reply grouping) via Python XML parsing. Also extracts embedded images and Visio diagrams (with PNG export). Triggers include 'convert odt', 'extract odt comments', 'odt to markdown', or when working with .odt files.
compatibility	Requires pandoc, Python 3 with standard library. Optional: olefile (pip install olefile) for Visio extraction, libreoffice (headless) for EMF-to-PNG conversion.

ODT to Markdown + Comments Skill

Convert an .odt file into companion output files:

<name>.md — The document body, converted via pandoc, with image refs updated to point at exported artifacts.
<name>-comments.md — All document comments, grouped into threads with anchor text and chronological reply ordering.
<name>-embedded/ — Subdirectory for all extracted media (images, Visio diagrams with PNG previews).

When to Use

User asks to convert an .odt file to Markdown
User asks to extract or review comments from an .odt file
User provides an .odt file path and wants readable output

Workflow

Throughout this workflow, let <dir> be the directory containing the ODT file, <name> be its base name (without extension), and <path> be the full path without extension.

Step 1: Validate Input

Confirm the .odt file exists and is a valid OpenDocument file:

file <path>.odt

Step 2: Convert Document Body

pandoc <path>.odt -t markdown -o <path>.md --wrap=none --extract-media=<dir>/<name>-embedded

--wrap=none prevents hard line breaks. --extract-media extracts images pandoc recognizes into <name>-embedded/ and rewrites image references. If no images are extracted and no OLE objects exist, remove the empty directory.

Step 3: Extract Embedded Images and Visio Diagrams

ODT files are ZIP archives that can contain:

Pictures/ or media/ — Inline images. Pandoc handles Pictures/ but may fail on media/ paths, emitting []{.image} placeholders.
Object N — OLE-embedded objects (Visio, Excel, etc.). NOT handled by pandoc.
ObjectReplacements/Object N — EMF/WMF preview renderings of embedded objects.

Run these scripts in sequence:

Extract inline images (fixes []{.image} placeholders if pandoc missed them):

python scripts/extract_images.py <path>.odt <path>.md <dir>/<name>-embedded <name>

Extract OLE objects (Visio diagrams with PNG previews):

python scripts/extract_ole_objects.py <path>.odt <dir>/<name>-embedded

Fix Visio references in the markdown (replaces ObjectReplacements/ refs with extracted files):

python scripts/fix_visio_refs.py <path>.md <name>-embedded

Object Type	Output Files (in `<name>-embedded/`)
Visio .vsdx diagram	`object-<N>.vsdx` + `object-<N>.png`
Legacy Visio .vsd	`object-<N>.vsd` (no PNG — would need full Visio)
Other OLE objects	Skipped with a log message

Step 4: Extract Threaded Comments

python scripts/extract_comments.py <path>.odt <path>-comments.md

Comments are grouped into threads by anchor text. Within each thread, comments are sorted chronologically — the first is the opener (💬), subsequent ones are replies (↩️).

Step 5: Report Results

After all files are created, report:

Path to the body Markdown file and its size
Path to the comments Markdown file, total comment count, and thread count
Any extracted images (count and directory)
Any extracted Visio diagrams (.vsdx paths and PNG preview paths)
Any issues encountered (e.g., no comments found, pandoc warnings, olefile not installed)

Output Format

Body Markdown (`<name>.md`)

Standard pandoc Markdown output with --wrap=none.

Comments Markdown (`<name>-comments.md`)

# Comments from <filename>.odt

**Total:** N comments in M threads

---

## Thread 1 (K replies)

> **Anchor:** <highlighted text in document>

💬 **Author Name** — 2026-02-06T15:16:00

Opening comment text here.

↩️ **Reply Author** — 2026-02-06T15:46:00

> Reply text is blockquoted for visual distinction.

---

Edge Cases

No comments: Still generate the body .md. For comments file, write a note saying "No comments found."
Comments without anchor text: Show _(no anchor text)_ in place of the anchor quote.
Single-comment threads: Display without reply count or reply formatting.
Nested annotations: The comment extraction script strips nested <office:annotation> elements from anchor text to avoid duplication.
No embedded objects: Skip Step 3 silently — only report images/Visio if they exist.
olefile not installed: Print a warning and skip Visio extraction. The body and comments conversion still works.
LibreOffice not available: Extract the .vsdx file but skip PNG conversion. Print a warning.
Multiple embedded objects: Each gets a sequential number (object-1, object-2, etc.) matching the ODT's internal naming.
Non-Visio OLE objects: Log the CLSID and skip. Don't attempt extraction of unknown object types.

ODT to Markdown + Comments Skill

Convert an .odt file into companion output files:

<name>.md — The document body, converted via pandoc, with image refs updated to point at exported artifacts.
<name>-comments.md — All document comments, grouped into threads with anchor text and chronological reply ordering.
<name>-embedded/ — Subdirectory for all extracted media (images, Visio diagrams with PNG previews).

When to Use

User asks to convert an .odt file to Markdown
User asks to extract or review comments from an .odt file
User provides an .odt file path and wants readable output

Workflow

Throughout this workflow, let <dir> be the directory containing the ODT file, <name> be its base name (without extension), and <path> be the full path without extension.

Step 1: Validate Input

Confirm the .odt file exists and is a valid OpenDocument file:

file <path>.odt

Step 2: Convert Document Body

pandoc <path>.odt -t markdown -o <path>.md --wrap=none --extract-media=<dir>/<name>-embedded

Step 3: Extract Embedded Images and Visio Diagrams

ODT files are ZIP archives that can contain:

Pictures/ or media/ — Inline images. Pandoc handles Pictures/ but may fail on media/ paths, emitting []{.image} placeholders.
Object N — OLE-embedded objects (Visio, Excel, etc.). NOT handled by pandoc.
ObjectReplacements/Object N — EMF/WMF preview renderings of embedded objects.

Run these scripts in sequence:

Extract inline images (fixes []{.image} placeholders if pandoc missed them):

python scripts/extract_images.py <path>.odt <path>.md <dir>/<name>-embedded <name>

Extract OLE objects (Visio diagrams with PNG previews):

python scripts/extract_ole_objects.py <path>.odt <dir>/<name>-embedded

Fix Visio references in the markdown (replaces ObjectReplacements/ refs with extracted files):

python scripts/fix_visio_refs.py <path>.md <name>-embedded

Object Type	Output Files (in `<name>-embedded/`)
Visio .vsdx diagram	`object-<N>.vsdx` + `object-<N>.png`
Legacy Visio .vsd	`object-<N>.vsd` (no PNG — would need full Visio)
Other OLE objects	Skipped with a log message

Step 4: Extract Threaded Comments

python scripts/extract_comments.py <path>.odt <path>-comments.md

Comments are grouped into threads by anchor text. Within each thread, comments are sorted chronologically — the first is the opener (💬), subsequent ones are replies (↩️).

Step 5: Report Results

After all files are created, report:

Path to the body Markdown file and its size
Path to the comments Markdown file, total comment count, and thread count
Any extracted images (count and directory)
Any extracted Visio diagrams (.vsdx paths and PNG preview paths)
Any issues encountered (e.g., no comments found, pandoc warnings, olefile not installed)

Output Format

Body Markdown (`<name>.md`)

Standard pandoc Markdown output with --wrap=none.

Comments Markdown (`<name>-comments.md`)

# Comments from <filename>.odt

**Total:** N comments in M threads

---

## Thread 1 (K replies)

> **Anchor:** <highlighted text in document>

💬 **Author Name** — 2026-02-06T15:16:00

Opening comment text here.

↩️ **Reply Author** — 2026-02-06T15:46:00

> Reply text is blockquoted for visual distinction.

---

Edge Cases

No comments: Still generate the body .md. For comments file, write a note saying "No comments found."
Comments without anchor text: Show _(no anchor text)_ in place of the anchor quote.
Single-comment threads: Display without reply count or reply formatting.
Nested annotations: The comment extraction script strips nested <office:annotation> elements from anchor text to avoid duplication.
No embedded objects: Skip Step 3 silently — only report images/Visio if they exist.
olefile not installed: Print a warning and skip Visio extraction. The body and comments conversion still works.
LibreOffice not available: Extract the .vsdx file but skip PNG conversion. Print a warning.
Multiple embedded objects: Each gets a sequential number (object-1, object-2, etc.) matching the ODT's internal naming.
Non-Visio OLE objects: Log the CLSID and skip. Don't attempt extraction of unknown object types.

odt-convert

ODT to Markdown + Comments Skill

When to Use

Workflow

Step 1: Validate Input

Step 2: Convert Document Body

Step 3: Extract Embedded Images and Visio Diagrams

Step 4: Extract Threaded Comments

Step 5: Report Results

Output Format

Body Markdown (<name>.md)

Comments Markdown (<name>-comments.md)

Edge Cases

More from this repository

ODT to Markdown + Comments Skill

When to Use

Workflow

Step 1: Validate Input

Step 2: Convert Document Body

Step 3: Extract Embedded Images and Visio Diagrams

Step 4: Extract Threaded Comments

Step 5: Report Results

Output Format

Body Markdown (<name>.md)

Comments Markdown (<name>-comments.md)

Edge Cases

More from this repository

Body Markdown (`<name>.md`)

Comments Markdown (`<name>-comments.md`)

Body Markdown (`<name>.md`)

Comments Markdown (`<name>-comments.md`)