一键导入
paddleocr-doc-parsing
// Advanced document parsing with PaddleOCR. Returns complete document structure including text, tables, formulas, charts, and layout information. The AI agent extracts relevant content based on user needs.
// Advanced document parsing with PaddleOCR. Returns complete document structure including text, tables, formulas, charts, and layout information. The AI agent extracts relevant content based on user needs.
抖音无水印视频下载和文案提取工具
Generate videos using Flyworks (a.k.a HiFly) Digital Humans. Create talking photo videos from images, use public avatars with TTS, or clone voices for custom audio.
漫剧生成器 - 基于 Seedance 的漫画风格短剧生成工具。支持以主角图片为基础,自动生成漫剧分镜脚本并生成视频。适用于创作漫画风格的短视频、角色故事、动画短片等。当用户想要生成漫画风格的视频短剧、角色故事或漫剧时使用此技能。
漫画风格视频生成器 - 专门生成日式治愈系、国风水墨、美式卡通等漫画风格的动画视频。内置8种漫画风格模板,支持图生视频,一键生成高质量漫画动画。当用户需要生成漫画风格、动画风格、手绘风格的视频时使用此技能。
使用字节跳动 Seedance 模型生成视频。支持文生视频和图生视频功能,通过 volcengine-ark SDK 调用 API。当用户需要生成视频、创建视频内容或基于文字/图片制作视频时激活此技能。
火山视频理解 - 使用火山方舟视频理解 API 分析视频内容。通过 Files API 上传视频(推荐),支持大文件(最大512MB),可用于视频内容分析、物体识别、动作理解等。当用户需要分析视频、理解视频内容、提取视频信息时激活此技能。
| name | paddleocr-doc-parsing |
| description | Advanced document parsing with PaddleOCR. Returns complete document structure including text, tables, formulas, charts, and layout information. The AI agent extracts relevant content based on user needs. |
| metadata | {"openclaw":{"requires":{"env":["PADDLEOCR_DOC_PARSING_API_URL","PADDLEOCR_ACCESS_TOKEN","PADDLEOCR_DOC_PARSING_TIMEOUT"],"bins":["python"]},"primaryEnv":"PADDLEOCR_ACCESS_TOKEN","emoji":"📄","homepage":"https://github.com/PaddlePaddle/PaddleOCR/tree/main/skills/paddleocr-doc-parsing"}} |
Use Document Parsing for:
Use Text Recognition instead for:
⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔
python scripts/vl_caller.pyIf the script execution fails (API not configured, network error, etc.):
Execute document parsing:
python scripts/vl_caller.py --file-url "URL provided by user" --pretty
Or for local files:
python scripts/vl_caller.py --file-path "file path" --pretty
Optional: explicitly set file type:
python scripts/vl_caller.py --file-url "URL provided by user" --file-type 0 --pretty
--file-type 0: PDF--file-type 1: imageDefault behavior: save raw JSON to a temp file:
--output is omitted, the script saves automatically under the system temp directory<system-temp>/paddleocr/doc-parsing/results/result_<timestamp>_<id>.json--output is provided, it overrides the default temp-file destination--stdout is provided, JSON is printed to stdout and no file is savedResult saved to: /absolute/path/...--stdout only when you explicitly want to skip file persistenceThe output JSON contains COMPLETE content with all document data:
Input type note:
Extract what the user needs from the output JSON using these fields:
textresult[n].markdownresult[n].prunedResultCRITICAL: You must display the COMPLETE extracted content to the user based on their needs.
text fieldWhat this means:
text, result[n].markdown, and result[n].prunedResultExample - Correct:
User: "Extract all the text from this document"
Agent: I've parsed the complete document. Here's all the extracted text:
[Display entire text field or concatenated regions in reading order]
Document Statistics:
- Total regions: 25
- Text blocks: 15
- Tables: 3
- Formulas: 2
Quality: Excellent (confidence: 0.92)
Example - Incorrect:
User: "Extract all the text"
Agent: "I found a document with multiple sections. Here's the beginning:
'Introduction...' (content truncated for brevity)"
The output JSON uses an envelope wrapping the raw API result:
{
"ok": true,
"text": "Full markdown/HTML text extracted from all pages",
"result": { ... }, // raw provider response
"error": null
}
Key fields:
text — extracted markdown text from all pages (use this for quick text display)result - raw provider response objectresult[n].prunedResult - structured parsing output for each page (layout/content/confidence and related metadata)result[n].markdown — full rendered page output in markdown/HTMLRaw result location (default): the temp-file path printed by the script on stderr
Example 1: Extract Full Document Text
python scripts/vl_caller.py \
--file-url "https://example.com/paper.pdf" \
--pretty
Then use:
text for quick full-text outputresult[n].markdown when page-level output is neededExample 2: Extract Structured Page Data
python scripts/vl_caller.py \
--file-path "./financial_report.pdf" \
--pretty
Then use:
result[n].prunedResult for structured parsing data (layout/content/confidence)result[n].markdown for rendered page contentExample 3: Print JSON Without Saving
python scripts/vl_caller.py \
--file-url "URL" \
--stdout \
--pretty
Then return:
text when user asks for full document contentresult[n].prunedResult and result[n].markdown when user needs complete structured page dataWhen API is not configured:
The error will show:
PADDLEOCR_DOC_PARSING_API_URL not configured. Get your API at: https://paddleocr.com
Configuration workflow:
Show the exact error message to the user (including the URL).
Guide the user to configure securely:
- PADDLEOCR_DOC_PARSING_API_URL
- PADDLEOCR_ACCESS_TOKEN
- Optional: PADDLEOCR_DOC_PARSING_TIMEOUT
If the user provides credentials in chat anyway (accept any reasonable format):
PADDLEOCR_DOC_PARSING_API_URL=https://xxx.paddleocr.com/layout-parsing, PADDLEOCR_ACCESS_TOKEN=abc123...Here's my API: https://xxx and token: abc123Parse and validate the values:
PADDLEOCR_DOC_PARSING_API_URL (look for URLs with paddleocr.com or similar)PADDLEOCR_DOC_PARSING_API_URL is a full endpoint ending with /layout-parsingPADDLEOCR_ACCESS_TOKEN (long alphanumeric string, usually 40+ chars)Ask the user to confirm the environment is configured:
configure.py or create a local .env file by default if the skill is installed under a host application directory (for example, ~/.claude/skills)Retry only after confirmation:
IMPORTANT: The error message format is STRICT and must be shown exactly as provided by the script. Do not modify or paraphrase it.
There is no file size limit for the API. For PDFs, the maximum is 100 pages per request.
Tips for large files:
For very large local files, prefer --file-url over --file-path to avoid base64 encoding overhead:
python scripts/vl_caller.py --file-url "https://your-server.com/large_file.pdf"
If you only need certain pages from a large PDF, extract them first:
# Extract pages 1-5
python scripts/split_pdf.py large.pdf pages_1_5.pdf --pages "1-5"
# Mixed ranges are supported
python scripts/split_pdf.py large.pdf selected_pages.pdf --pages "1-5,8,10-12"
# Then process the smaller file
python scripts/vl_caller.py --file-path "pages_1_5.pdf"
Authentication failed (403):
error: Authentication failed
→ Token is invalid, reconfigure with correct credentials
API quota exceeded (429):
error: API quota exceeded
→ Daily API quota exhausted, inform user to wait or upgrade
Unsupported format:
error: Unsupported file format
→ File format not supported, convert to PDF/PNG/JPG
references/output_schema.md - Output format specificationNote: Model version and capabilities are determined by your API endpoint (
PADDLEOCR_DOC_PARSING_API_URL).
Load these reference documents into context when:
To verify the skill is working properly:
python scripts/smoke_test.py
This tests configuration and optionally API connectivity.