Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

$pwd:

document-converter

Name: Document Converter
Author: wentorai

// Convert Office documents (PPTX, DOCX, XLSX, PDF, HTML, CSV, JSON, XML, images) to Markdown using Microsoft MarkItDown. Provides the agent with conversion strategies for academic and research workflows.

In Manus ausführen

$ git log --oneline --stat

stars:828

forks:111

updated:18. April 2026 um 02:20

SKILL.md

readonly

name	document-converter
description	Convert Office documents (PPTX, DOCX, XLSX, PDF, HTML, CSV, JSON, XML, images) to Markdown using Microsoft MarkItDown. Provides the agent with conversion strategies for academic and research workflows.
tags	["document","conversion","markdown","office","pdf","pptx","docx"]
version	1.0.0
requirements	["markitdown","markitdown-mcp"]
install	pip install 'markitdown[all]' markitdown-mcp

Document Converter (MarkItDown)

Convert Office and structured documents to Markdown for LLM consumption using Microsoft MarkItDown.

Overview

MarkItDown is a Microsoft open-source tool that converts a wide range of document formats into clean Markdown text. This is essential for academic research workflows where source materials arrive in diverse formats (PDFs from journals, PPTX from conferences, DOCX from collaborators, XLSX data tables).

Supported Formats

Format	Extensions	Notes
PDF	`.pdf`	Text extraction; scanned PDFs require OCR
Word	`.docx`	Paragraphs, tables, lists, headings preserved
PowerPoint	`.pptx`	Slide-by-slide with speaker notes
Excel	`.xlsx`, `.xls`	Sheet-by-sheet, tables as Markdown tables
HTML	`.html`, `.htm`	Cleaned content extraction
CSV	`.csv`	Converted to Markdown table
JSON	`.json`	Pretty-printed structured output
XML	`.xml`	Structured text extraction
Images	`.jpg`, `.png`, `.gif`, `.bmp`, `.tiff`	OCR text extraction (requires optional deps)
Audio	`.mp3`, `.wav`	Transcription (requires optional deps)
ZIP	`.zip`	Recursively converts contained files

MCP Tool Usage

When the markitdown-mcp server is configured, use the convert_to_markdown tool:

Tool: convert_to_markdown
Parameters:
  uri: "file:///absolute/path/to/document.pptx"

The uri parameter accepts:

Local files: file:///path/to/file.docx
HTTP URLs: https://example.com/paper.pdf

Important

Always use absolute paths with the file:// URI scheme for local files.
The tool returns the full Markdown text content of the converted document.
For large documents, the output may be substantial. Consider summarizing or extracting relevant sections after conversion.

CLI Fallback

If the MCP server is not available, use the CLI directly via shell:

# Convert a single file
markitdown document.pptx

# Save output to file
markitdown document.pptx > output.md

# Convert from URL
markitdown https://example.com/paper.pdf

Academic Research Workflows

1. Conference Presentation Analysis

Convert a PPTX to Markdown, then analyze the content:

Convert the presentation with convert_to_markdown
The output preserves slide structure and speaker notes
Summarize key findings, methodology, and conclusions

2. Paper Review from PDF

Convert the PDF to Markdown
Extract sections: Abstract, Introduction, Methods, Results, Discussion
Cross-reference with literature library

3. Data Table Extraction

Convert XLSX to Markdown tables
Each sheet becomes a separate section
Use the extracted tables for analysis or comparison

4. Batch Literature Processing

When processing multiple documents:

Convert each document individually
Extract metadata (title, authors, date) from the content
Add to the literature library with library_add_paper

Output Quality Notes

Format	Quality	Caveats
DOCX	Excellent	Complex layouts may lose formatting
PPTX	Good	Diagrams/charts become text descriptions
XLSX	Good	Merged cells may not render perfectly
PDF	Variable	Depends on PDF type (text vs scanned)
HTML	Good	JavaScript-rendered content not captured
CSV	Excellent	Direct table conversion

When to Use This Skill

User uploads or references an Office document (PPTX, DOCX, XLSX)
User wants to analyze content from a PDF
User needs to extract text from images (OCR)
User wants to convert a document for inclusion in notes or workspace
User asks to read, summarize, or analyze a non-Markdown file
Batch processing of research materials in mixed formats

When NOT to Use

File is already Markdown or plain text
User wants to edit the original document format (use appropriate editor)
User needs pixel-perfect layout preservation (use the original format)
File is a scanned PDF without OCR support configured

Troubleshooting

Issue	Solution
`markitdown` not found	Run `pip install 'markitdown[all]'`
MCP tool not available	Ensure `markitdown-mcp` is installed and configured in `openclaw.json`
Empty output from PDF	PDF may be scanned/image-only; needs OCR dependencies
Encoding errors	Ensure file is not corrupted; try re-saving from source application
Large file timeout	Convert via CLI instead of MCP for very large files

Installation

Local (macOS/Linux)

pip install 'markitdown[all]' markitdown-mcp

Verify

markitdown --help

Docker

MarkItDown is pre-installed in the Research-Claw Docker image.

related-skills.json

gleiches Repository

coding-sop.md

from "wentorai/Research-Claw"

Standard operating procedure for research experiments, data analysis, and visualization. Covers Python/R script execution, statistical analysis, data wrangling (pandas/tidyverse), publication-quality figures (matplotlib/plotly/ggplot2), and experiment reproducibility.

2026-04-10828

workspace-sop.md

from "wentorai/Research-Claw"

Workspace file management, git versioning, directory conventions, and CLI execution safety rules for the Research-Claw workspace.

2026-04-10828

writing-sop.md

from "wentorai/Research-Claw"

Four-phase iterative protocol for academic writing and document production. Covers outline → draft → self-review → polish with Stop Points, material supplementation loops, version comparison, and mechanical self-check. Includes IMRaD, LaTeX, docx/markdown, citation formatting, and md2pdf.

2026-04-05828

plotting-sop.md

from "wentorai/Research-Claw"

Standard operating procedure for academic figure generation. Four rendering engines: Python (data viz), Mermaid (flowcharts), AI Image via NanoBanana/OpenRouter (complex diagrams), SVG (vector). Includes engine selection decision tree, ReAct self-correction, NanoBanana configuration, environment detection, and academic style rules.

2026-04-05828

search-sop.md

from "wentorai/Research-Claw"

Standard operating procedure for academic literature search. Covers local library search, 18 academic database APIs (L1), web_fetch direct access (L1.5), browser RPA for CNKI/Google Scholar (L2), and optional API services (L3). Includes domain routing, recency protocol, search iteration protocol (evaluate → refine → re-search up to 3 rounds), and Zotero/EndNote import bridges.

2026-04-05828

output-cards.md

from "wentorai/Research-Claw"

JSON schema reference for 6 structured output card types used by the Research-Claw dashboard: paper_card, task_card, progress_card, approval_card, file_card, monitor_digest. Read this before outputting any structured card for the first time in a session.

2026-03-30828

package.json

"author": "wentorai"

"repository": "wentorai/Research-Claw"

GitHub-Repository öffnen Creator-Repositorys ansehen

$ install --global

$ download --local

In Manus ausführen

$ useful --forSOC

Allgemeine BürokräfteBüro- und Verwaltungsberufe43-9061L4

name	document-converter
description	Convert Office documents (PPTX, DOCX, XLSX, PDF, HTML, CSV, JSON, XML, images) to Markdown using Microsoft MarkItDown. Provides the agent with conversion strategies for academic and research workflows.
tags	["document","conversion","markdown","office","pdf","pptx","docx"]
version	1.0.0
requirements	["markitdown","markitdown-mcp"]
install	pip install 'markitdown[all]' markitdown-mcp

Document Converter (MarkItDown)

Convert Office and structured documents to Markdown for LLM consumption using Microsoft MarkItDown.

Overview

Supported Formats

Format	Extensions	Notes
PDF	`.pdf`	Text extraction; scanned PDFs require OCR
Word	`.docx`	Paragraphs, tables, lists, headings preserved
PowerPoint	`.pptx`	Slide-by-slide with speaker notes
Excel	`.xlsx`, `.xls`	Sheet-by-sheet, tables as Markdown tables
HTML	`.html`, `.htm`	Cleaned content extraction
CSV	`.csv`	Converted to Markdown table
JSON	`.json`	Pretty-printed structured output
XML	`.xml`	Structured text extraction
Images	`.jpg`, `.png`, `.gif`, `.bmp`, `.tiff`	OCR text extraction (requires optional deps)
Audio	`.mp3`, `.wav`	Transcription (requires optional deps)
ZIP	`.zip`	Recursively converts contained files

MCP Tool Usage

When the markitdown-mcp server is configured, use the convert_to_markdown tool:

Tool: convert_to_markdown
Parameters:
  uri: "file:///absolute/path/to/document.pptx"

The uri parameter accepts:

Local files: file:///path/to/file.docx
HTTP URLs: https://example.com/paper.pdf

Important

Always use absolute paths with the file:// URI scheme for local files.
The tool returns the full Markdown text content of the converted document.
For large documents, the output may be substantial. Consider summarizing or extracting relevant sections after conversion.

CLI Fallback

If the MCP server is not available, use the CLI directly via shell:

# Convert a single file
markitdown document.pptx

# Save output to file
markitdown document.pptx > output.md

# Convert from URL
markitdown https://example.com/paper.pdf

Academic Research Workflows

1. Conference Presentation Analysis

Convert a PPTX to Markdown, then analyze the content:

Convert the presentation with convert_to_markdown
The output preserves slide structure and speaker notes
Summarize key findings, methodology, and conclusions

2. Paper Review from PDF

Convert the PDF to Markdown
Extract sections: Abstract, Introduction, Methods, Results, Discussion
Cross-reference with literature library

3. Data Table Extraction

Convert XLSX to Markdown tables
Each sheet becomes a separate section
Use the extracted tables for analysis or comparison

4. Batch Literature Processing

When processing multiple documents:

Convert each document individually
Extract metadata (title, authors, date) from the content
Add to the literature library with library_add_paper

Output Quality Notes

Format	Quality	Caveats
DOCX	Excellent	Complex layouts may lose formatting
PPTX	Good	Diagrams/charts become text descriptions
XLSX	Good	Merged cells may not render perfectly
PDF	Variable	Depends on PDF type (text vs scanned)
HTML	Good	JavaScript-rendered content not captured
CSV	Excellent	Direct table conversion

When to Use This Skill

User uploads or references an Office document (PPTX, DOCX, XLSX)
User wants to analyze content from a PDF
User needs to extract text from images (OCR)
User wants to convert a document for inclusion in notes or workspace
User asks to read, summarize, or analyze a non-Markdown file
Batch processing of research materials in mixed formats

When NOT to Use

File is already Markdown or plain text
User wants to edit the original document format (use appropriate editor)
User needs pixel-perfect layout preservation (use the original format)
File is a scanned PDF without OCR support configured

Troubleshooting

Issue	Solution
`markitdown` not found	Run `pip install 'markitdown[all]'`
MCP tool not available	Ensure `markitdown-mcp` is installed and configured in `openclaw.json`
Empty output from PDF	PDF may be scanned/image-only; needs OCR dependencies
Encoding errors	Ensure file is not corrupted; try re-saving from source application
Large file timeout	Convert via CLI instead of MCP for very large files

Installation

Local (macOS/Linux)

pip install 'markitdown[all]' markitdown-mcp

Verify

markitdown --help

Docker

MarkItDown is pre-installed in the Research-Claw Docker image.

document-converter

Document Converter (MarkItDown)

Overview

Supported Formats

MCP Tool Usage

Important

CLI Fallback

Academic Research Workflows

1. Conference Presentation Analysis

2. Paper Review from PDF

3. Data Table Extraction

4. Batch Literature Processing

Output Quality Notes

When to Use This Skill

When NOT to Use

Troubleshooting

Installation

Local (macOS/Linux)

Verify

Docker

Mehr aus diesem Repository

Mehr aus diesem Repository

Document Converter (MarkItDown)

Overview

Supported Formats

MCP Tool Usage

Important

CLI Fallback

Academic Research Workflows

1. Conference Presentation Analysis

2. Paper Review from PDF

3. Data Table Extraction

4. Batch Literature Processing

Output Quality Notes

When to Use This Skill

When NOT to Use

Troubleshooting

Installation

Local (macOS/Linux)

Verify

Docker