Run any Skill in Manus with one click

markitdown

Convert PDFs, Office files, images, HTML, EPUB/ZIP/audio, or other docs to LLM-friendly Markdown with MarkItDown; include install/CLI/API/batch/plugin help.

Run Skill in Manus

Overview

Convert PDFs, Office files, images, HTML, EPUB/ZIP/audio, or other docs to LLM-friendly Markdown with MarkItDown; include install/CLI/API/batch/plugin help.

Install command

npx skills add https://github.com/Eckii24/dotfiles --skill markitdown

Copy and paste this command into Claude Code to install the skill

Source

Eckii24/dotfiles

Stars0

Forks0

UpdatedMay 30, 2026 at 21:54

File Explorer

3 files

SKILL.md

readonly

name	markitdown
description	Convert PDFs, Office files, images, HTML, EPUB/ZIP/audio, or other docs to LLM-friendly Markdown with MarkItDown; include install/CLI/API/batch/plugin help.
compatibility	{"tools":"bash, read, write, edit","dependencies":"Python 3.10+, markitdown"}

MarkItDown

Use MarkItDown when the task is about converting existing files into structured Markdown for LLM or text-analysis workflows. Prefer it over ad hoc parsing when the source is a document format that MarkItDown already supports.

MarkItDown is a lightweight converter, not a pixel-perfect document renderer. It is a strong default when the user wants headings, lists, tables, links, and readable structure preserved in Markdown. It is a weaker fit when the user wants layout-faithful reproduction or visual formatting that must match the original exactly.

Supported inputs

MarkItDown currently supports common conversions including:

PDF
PowerPoint
Word
Excel
Images
Audio
HTML
Text-based formats such as CSV, JSON, and XML
ZIP archives
YouTube URLs
EPUBs

If the user asks about a format that might depend on optional extras, verify the needed dependency and install only what is necessary unless they explicitly ask for markitdown[all].

Default workflow

Identify the source input, desired output path, and whether the user wants you to actually run the conversion or just explain it.
Check whether markitdown is already available before suggesting installation.
Match the install to the task:
- broad coverage: pip install 'markitdown[all]'
- narrower installs: e.g. pip install 'markitdown[pdf,docx,pptx]'
Prefer the CLI for one-off conversions and simple shell workflows.
Prefer the Python API for loops, custom automation, app integration, or when the user wants a reusable script.
Save the Markdown to a sensible output file, usually next to the source file unless the user asked for another location.
Briefly sanity-check the result and call out likely limitations, especially for OCR-heavy, scanned, or layout-sensitive documents.

Installation guidance

Start with a virtual environment unless the user clearly wants a global install.

Example:

python3 -m venv .venv
source .venv/bin/activate
pip install 'markitdown[all]'
markitdown --version

If the user uses uv, prefer:

uv venv --python=3.12 .venv
source .venv/bin/activate
uv pip install 'markitdown[all]'

Ask before installing packages if the environment might be shared or managed.

CLI usage

Use the CLI for straightforward conversions.

Basic conversion

markitdown input.pdf > output.md

or:

markitdown input.pdf -o output.md

Reading from stdin

MarkItDown can read binary data from stdin. When the file type is ambiguous, provide an extension hint.

cat input.pdf | markitdown -x pdf > output.md

Useful hints:

-x, --extension for file extension hints
-m, --mime-type for MIME type hints
-c, --charset for text encodings
--keep-data-uris when the user explicitly wants embedded data URIs preserved instead of truncated

Plugins

Plugins are disabled by default.

List installed plugins:

markitdown --list-plugins

Enable them for a run:

markitdown --use-plugins input.pdf -o output.md

Only turn plugins on when they are relevant and installed.

Azure Document Intelligence

Use this only when the user specifically wants Azure Document Intelligence or needs a cloud-based extraction path:

markitdown input.pdf -d -e "<document_intelligence_endpoint>" -o output.md

Do not assume this is configured; ask for the endpoint if needed.

Python API usage

Use Python when the user wants scripts, batch conversion, or integration into a larger pipeline.

Minimal example

from markitdown import MarkItDown

md = MarkItDown(enable_plugins=False)
result = md.convert("report.pdf")
print(result.markdown)

result.text_content still exists as a soft-deprecated alias, but prefer result.markdown in new code.

With plugins enabled

from markitdown import MarkItDown

md = MarkItDown(enable_plugins=True)
result = md.convert("slides.pptx")
print(result.markdown)

With Azure Document Intelligence

from markitdown import MarkItDown

md = MarkItDown(docintel_endpoint="<document_intelligence_endpoint>")
result = md.convert("scan.pdf")
print(result.markdown)

With an LLM client for image descriptions

from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
result = md.convert("image.jpg")
print(result.markdown)

Use this path when the task specifically needs image understanding support and the required client is available.

Batch conversion pattern

For a directory of files, prefer a small script or shell loop instead of repeating one-off commands.

Shell example:

for f in docs/*.pdf; do
  markitdown "$f" -o "${f%.pdf}.md"
done

Python example:

from pathlib import Path
from markitdown import MarkItDown

md = MarkItDown()
for path in Path("docs").glob("*.docx"):
    result = md.convert(str(path))
    path.with_suffix(".md").write_text(result.markdown, encoding="utf-8")

Troubleshooting

If conversion fails:

Check whether the relevant optional dependency is installed.
Confirm the file is actually the format it claims to be.
If reading from stdin, add -x and possibly -m.
For scanned PDFs or image-heavy documents, explain that default offline extraction may be limited and suggest Azure Document Intelligence or an OCR-capable plugin when appropriate.
For plugin-based behavior, confirm the plugin is installed and that --use-plugins or enable_plugins=True is set.

Response style

When doing the work for the user:

Be explicit about the exact command or script you used.
Save outputs to clear file paths.
Mention any install step separately from the conversion step.
Keep the explanation concise unless the user asked for a deeper walkthrough.

When the user only wants guidance:

Give the shortest working command first.
Then add only the relevant variants: install, stdin, batch, Python API, plugins, or troubleshooting.

MarkItDown

Supported inputs

MarkItDown currently supports common conversions including:

PDF
PowerPoint
Word
Excel
Images
Audio
HTML
Text-based formats such as CSV, JSON, and XML
ZIP archives
YouTube URLs
EPUBs

If the user asks about a format that might depend on optional extras, verify the needed dependency and install only what is necessary unless they explicitly ask for markitdown[all].

Default workflow

Identify the source input, desired output path, and whether the user wants you to actually run the conversion or just explain it.
Check whether markitdown is already available before suggesting installation.
Match the install to the task:
- broad coverage: pip install 'markitdown[all]'
- narrower installs: e.g. pip install 'markitdown[pdf,docx,pptx]'
Prefer the CLI for one-off conversions and simple shell workflows.
Prefer the Python API for loops, custom automation, app integration, or when the user wants a reusable script.
Save the Markdown to a sensible output file, usually next to the source file unless the user asked for another location.
Briefly sanity-check the result and call out likely limitations, especially for OCR-heavy, scanned, or layout-sensitive documents.

Installation guidance

Start with a virtual environment unless the user clearly wants a global install.

Example:

python3 -m venv .venv
source .venv/bin/activate
pip install 'markitdown[all]'
markitdown --version

If the user uses uv, prefer:

uv venv --python=3.12 .venv
source .venv/bin/activate
uv pip install 'markitdown[all]'

Ask before installing packages if the environment might be shared or managed.

CLI usage

Use the CLI for straightforward conversions.

Basic conversion

markitdown input.pdf > output.md

or:

markitdown input.pdf -o output.md

Reading from stdin

MarkItDown can read binary data from stdin. When the file type is ambiguous, provide an extension hint.

cat input.pdf | markitdown -x pdf > output.md

Useful hints:

-x, --extension for file extension hints
-m, --mime-type for MIME type hints
-c, --charset for text encodings
--keep-data-uris when the user explicitly wants embedded data URIs preserved instead of truncated

Plugins

Plugins are disabled by default.

List installed plugins:

markitdown --list-plugins

Enable them for a run:

markitdown --use-plugins input.pdf -o output.md

Only turn plugins on when they are relevant and installed.

Azure Document Intelligence

Use this only when the user specifically wants Azure Document Intelligence or needs a cloud-based extraction path:

markitdown input.pdf -d -e "<document_intelligence_endpoint>" -o output.md

Do not assume this is configured; ask for the endpoint if needed.

Python API usage

Use Python when the user wants scripts, batch conversion, or integration into a larger pipeline.

Minimal example

from markitdown import MarkItDown

md = MarkItDown(enable_plugins=False)
result = md.convert("report.pdf")
print(result.markdown)

result.text_content still exists as a soft-deprecated alias, but prefer result.markdown in new code.

With plugins enabled

from markitdown import MarkItDown

md = MarkItDown(enable_plugins=True)
result = md.convert("slides.pptx")
print(result.markdown)

With Azure Document Intelligence

from markitdown import MarkItDown

md = MarkItDown(docintel_endpoint="<document_intelligence_endpoint>")
result = md.convert("scan.pdf")
print(result.markdown)

With an LLM client for image descriptions

from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
result = md.convert("image.jpg")
print(result.markdown)

Use this path when the task specifically needs image understanding support and the required client is available.

Batch conversion pattern

For a directory of files, prefer a small script or shell loop instead of repeating one-off commands.

Shell example:

for f in docs/*.pdf; do
  markitdown "$f" -o "${f%.pdf}.md"
done

Python example:

from pathlib import Path
from markitdown import MarkItDown

md = MarkItDown()
for path in Path("docs").glob("*.docx"):
    result = md.convert(str(path))
    path.with_suffix(".md").write_text(result.markdown, encoding="utf-8")

Troubleshooting

If conversion fails:

Check whether the relevant optional dependency is installed.
Confirm the file is actually the format it claims to be.
If reading from stdin, add -x and possibly -m.
For scanned PDFs or image-heavy documents, explain that default offline extraction may be limited and suggest Azure Document Intelligence or an OCR-capable plugin when appropriate.
For plugin-based behavior, confirm the plugin is installed and that --use-plugins or enable_plugins=True is set.

Response style

When doing the work for the user:

Be explicit about the exact command or script you used.
Save outputs to clear file paths.
Mention any install step separately from the conversion step.
Keep the explanation concise unless the user asked for a deeper walkthrough.

When the user only wants guidance:

Give the shortest working command first.
Then add only the relevant variants: install, stdin, batch, Python API, plugins, or troubleshooting.

markitdown

MarkItDown

Supported inputs

Default workflow

Installation guidance

CLI usage

Basic conversion

Reading from stdin

Plugins

Azure Document Intelligence

Python API usage

Minimal example

With plugins enabled

With Azure Document Intelligence

With an LLM client for image descriptions

Batch conversion pattern

Troubleshooting

Response style

Read more when needed

MarkItDown

Supported inputs

Default workflow

Installation guidance

CLI usage

Basic conversion

Reading from stdin

Plugins

Azure Document Intelligence

Python API usage

Minimal example

With plugins enabled

With Azure Document Intelligence

With an LLM client for image descriptions

Batch conversion pattern

Troubleshooting

Response style

Read more when needed

markitdown

MarkItDown

Supported inputs

Default workflow

Installation guidance

CLI usage

Basic conversion

Reading from stdin

Plugins

Azure Document Intelligence

Python API usage

Minimal example

With plugins enabled

With Azure Document Intelligence

With an LLM client for image descriptions

Batch conversion pattern

Troubleshooting

Response style

Read more when needed

More from this repository

More from this repository

MarkItDown

Supported inputs

Default workflow

Installation guidance

CLI usage

Basic conversion

Reading from stdin

Plugins

Azure Document Intelligence

Python API usage

Minimal example

With plugins enabled

With Azure Document Intelligence

With an LLM client for image descriptions

Batch conversion pattern

Troubleshooting

Response style

Read more when needed