Run any Skill in Manus with one click

$pwd:

office-to-md

Name: Office To Md
Author: shuyu-labs

// Convert Office documents (Word, Excel, PowerPoint, PDF) to Markdown format. ONLY use this skill when the user explicitly requests to CONVERT, TRANSFORM or PARSE a specific office file into Markdown. Do NOT trigger for general questions, documentation reading, or discussions about files.

Run Skill in Manus

$ git log --oneline --stat

stars:276

forks:42

updated:January 14, 2026 at 15:23

File Explorer

4 files

SKILL.md

readonly

name	office-to-md
description	Convert Office documents (Word, Excel, PowerPoint, PDF) to Markdown format. ONLY use this skill when the user explicitly requests to CONVERT, TRANSFORM or PARSE a specific office file into Markdown. Do NOT trigger for general questions, documentation reading, or discussions about files.

Office Document to Markdown Converter

Convert various Office document formats to structured Markdown with text, table, and image extraction.

File Description

enhanced_parser.py - Core document parser
doc_converter.py - DOC to DOCX converter (requires LibreOffice)
requirements.txt - Python dependencies

Install Dependencies

pip install -r requirements.txt

Additional Dependencies for DOC Format

.doc format requires LibreOffice:

# Windows: Install LibreOffice from official website
# https://www.libreoffice.org/download/

# Linux
sudo apt install libreoffice

# Mac
brew install --cask libreoffice

Quick Start

Python Code

from enhanced_parser import EnhancedDocumentParser

# Initialize parser
parser = EnhancedDocumentParser(
    image_base_url="http://localhost:5000",
    image_save_dir="./static/images",
    filter_headers_footers=True  # Filter headers and footers
)

# Parse document
result = parser.parse_document("document.docx")

if result["success"]:
    print(result["markdown"])
    print(f"Extracted {result['images_count']} images")

Start API Service

# Start service using app.py from project root
python app.py

# Visit http://localhost:5000/analyzer to upload files

Supported Formats

Format	Extensions	Notes
Word	.docx, .doc	.doc requires LibreOffice
Excel	.xlsx, .xls	Supports multiple worksheets and date formats
PowerPoint	.pptx	Extracts slide text and images
PDF	.pdf	Auto-detects tables and images

Features

Word Documents

Automatic heading level detection
Convert tables to Markdown tables
Extract inline images
Filter headers and footers
Preserve list formatting

Excel Workbooks

Support for multiple worksheets
Automatic date format detection (prevents display as numbers)
Convert to Markdown tables
Extract embedded images

PowerPoint Presentations

Extract content by slide
Extract images and text boxes
Preserve slide order

PDF Documents

Auto-detect tables (line detection + text position detection)
Extract page images
Intelligently identify headings and lists
Output content in original order

Advanced Options

DOC Conversion

# Test LibreOffice configuration
python doc_converter.py

PDF Table Strategy

parser = EnhancedDocumentParser(
    pdf_table_strategy="lines_strict"  # Default: strict line detection, fastest
    # "lines": Normal line detection
    # "text": Based on text position, more accurate but slower
)

Image Processing

parser = EnhancedDocumentParser(
    image_base_url="https://your-domain.com",  # Image access URL
    image_save_dir="./static/images"           # Image save directory
)

Return Format

{
  "success": true,
  "markdown": "# Document Title\n\nContent...",
  "images_count": 2,
  "images": [
    {
      "filename": "uuid.png",
      "url": "http://localhost:5000/static/images/uuid.png",
      "size": 12345
    }
  ],
  "file_type": "docx",
  "file_info": {
    "name": "document.docx",
    "size": 45678,
    "paragraphs": 50,
    "tables": 3
  }
}

Common Issues

DOC Conversion Failed

Ensure LibreOffice is installed
Run python doc_converter.py to test configuration

Dates Display as Numbers

Excel parsing automatically handles date formats
Ensure you're using the latest version of enhanced_parser.py

PDF Table Recognition Inaccurate

Try different pdf_table_strategy parameters
Use "lines_strict" for standard tables
Use "text" for complex tables

File Limitations

Maximum file size: 160MB
Supported extensions: docx, doc, pdf, xlsx, xls, pptx
Automatic cleanup of temporary files

related-skills.json

same repository

webcode-local-windows-tts-installer.md

from "shuyu-labs/WebCode"

Use when building a local Windows WebCode installer from this repo for machine testing, especially when the package must bundle the Kokoro or sherpa-onnx Reply TTS service, model files, ffmpeg, a private Python runtime, and non-system-drive deployment without publishing a GitHub Release.

2026-05-06276

distributed-task-orchestrator.md

from "shuyu-labs/WebCode"

Decompose complex tasks into parallel sub-agents. Use for multi-step operations, batch processing, or when user mentions "parallel", "agents", or "orchestrate".

2026-01-14276

ms-agent-framework-rag.md

from "shuyu-labs/WebCode"

Comprehensive guide for building Agentic RAG systems using Microsoft Agent Framework in C#. Use when creating RAG applications with semantic search, document indexing, and intelligent agent orchestration. Includes scaffolding scripts, reference implementations, and documentation for vector databases, embedding models, and multi-agent workflows.

2026-01-14276

package.json

"author": "shuyu-labs"

"repository": "shuyu-labs/WebCode"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Computer Occupations, All OtherComputer and Mathematical Occupations15-1299L4

Software DevelopersL4

name	office-to-md
description	Convert Office documents (Word, Excel, PowerPoint, PDF) to Markdown format. ONLY use this skill when the user explicitly requests to CONVERT, TRANSFORM or PARSE a specific office file into Markdown. Do NOT trigger for general questions, documentation reading, or discussions about files.

Office Document to Markdown Converter

Convert various Office document formats to structured Markdown with text, table, and image extraction.

File Description

enhanced_parser.py - Core document parser
doc_converter.py - DOC to DOCX converter (requires LibreOffice)
requirements.txt - Python dependencies

Install Dependencies

pip install -r requirements.txt

Additional Dependencies for DOC Format

.doc format requires LibreOffice:

# Windows: Install LibreOffice from official website
# https://www.libreoffice.org/download/

# Linux
sudo apt install libreoffice

# Mac
brew install --cask libreoffice

Quick Start

Python Code

from enhanced_parser import EnhancedDocumentParser

# Initialize parser
parser = EnhancedDocumentParser(
    image_base_url="http://localhost:5000",
    image_save_dir="./static/images",
    filter_headers_footers=True  # Filter headers and footers
)

# Parse document
result = parser.parse_document("document.docx")

if result["success"]:
    print(result["markdown"])
    print(f"Extracted {result['images_count']} images")

Start API Service

# Start service using app.py from project root
python app.py

# Visit http://localhost:5000/analyzer to upload files

Supported Formats

Format	Extensions	Notes
Word	.docx, .doc	.doc requires LibreOffice
Excel	.xlsx, .xls	Supports multiple worksheets and date formats
PowerPoint	.pptx	Extracts slide text and images
PDF	.pdf	Auto-detects tables and images

Features

Word Documents

Automatic heading level detection
Convert tables to Markdown tables
Extract inline images
Filter headers and footers
Preserve list formatting

Excel Workbooks

Support for multiple worksheets
Automatic date format detection (prevents display as numbers)
Convert to Markdown tables
Extract embedded images

PowerPoint Presentations

Extract content by slide
Extract images and text boxes
Preserve slide order

PDF Documents

Auto-detect tables (line detection + text position detection)
Extract page images
Intelligently identify headings and lists
Output content in original order

Advanced Options

DOC Conversion

# Test LibreOffice configuration
python doc_converter.py

PDF Table Strategy

parser = EnhancedDocumentParser(
    pdf_table_strategy="lines_strict"  # Default: strict line detection, fastest
    # "lines": Normal line detection
    # "text": Based on text position, more accurate but slower
)

Image Processing

parser = EnhancedDocumentParser(
    image_base_url="https://your-domain.com",  # Image access URL
    image_save_dir="./static/images"           # Image save directory
)

Return Format

{
  "success": true,
  "markdown": "# Document Title\n\nContent...",
  "images_count": 2,
  "images": [
    {
      "filename": "uuid.png",
      "url": "http://localhost:5000/static/images/uuid.png",
      "size": 12345
    }
  ],
  "file_type": "docx",
  "file_info": {
    "name": "document.docx",
    "size": 45678,
    "paragraphs": 50,
    "tables": 3
  }
}

Common Issues

DOC Conversion Failed

Ensure LibreOffice is installed
Run python doc_converter.py to test configuration

Dates Display as Numbers

Excel parsing automatically handles date formats
Ensure you're using the latest version of enhanced_parser.py

PDF Table Recognition Inaccurate

Try different pdf_table_strategy parameters
Use "lines_strict" for standard tables
Use "text" for complex tables

File Limitations

Maximum file size: 160MB
Supported extensions: docx, doc, pdf, xlsx, xls, pptx
Automatic cleanup of temporary files

office-to-md

Office Document to Markdown Converter

File Description

Install Dependencies

Additional Dependencies for DOC Format

Quick Start

Python Code

Start API Service

Supported Formats

Features

Word Documents

Excel Workbooks

PowerPoint Presentations

PDF Documents

Advanced Options

DOC Conversion

PDF Table Strategy

Image Processing

Return Format

Common Issues

DOC Conversion Failed

Dates Display as Numbers

PDF Table Recognition Inaccurate

File Limitations

More from this repository

More from this repository

Office Document to Markdown Converter

File Description

Install Dependencies

Additional Dependencies for DOC Format

Quick Start

Python Code

Start API Service

Supported Formats

Features

Word Documents

Excel Workbooks

PowerPoint Presentations

PDF Documents

Advanced Options

DOC Conversion

PDF Table Strategy

Image Processing

Return Format

Common Issues

DOC Conversion Failed

Dates Display as Numbers

PDF Table Recognition Inaccurate

File Limitations