with one click
pdf-extractor
// Extract text, tables, and form data from PDF documents for analysis and processing. Use when user asks to extract, parse, or analyze PDF files.
// Extract text, tables, and form data from PDF documents for analysis and processing. Use when user asks to extract, parse, or analyze PDF files.
Sample skill fixture for classpath registry enhancement tests.
ๅๅๆๆกๅไฝๅฉๆใๆ นๆฎๅๅไฟกๆฏ็ๆๅธๅผไบบ็่ฅ้ๆๆกใๅฝ็จๆทๆๅฐ"ๅๆๆก"ใ"ๅๅๆ่ฟฐ"ใ"่ฅ้ๆๆก"ๆถไฝฟ็จๆญคๆ่ฝใ
้ๅๅๆๅฉๆใๆ นๆฎๅธๅบ่ถๅฟๅ็จๆท้ๆฑ๏ผๅๆๅนถๆจ่้ๅ็ๅๅๅ็ฑปใๅฝ็จๆทๆๅฐ"้ๅ"ใ"ๅๅๆจ่"ใ"ๅ็ฑปๅๆ"ๆถไฝฟ็จๆญคๆ่ฝใ
Database schema and business logic for inventory tracking including products, warehouses, and stock levels.
Database schema and business logic for sales data analysis including customers, orders, and revenue.
Test skill for groupedTools. When executing this skill, use the record_result tool to record the result value.
| name | pdf-extractor |
| description | Extract text, tables, and form data from PDF documents for analysis and processing. Use when user asks to extract, parse, or analyze PDF files. |
You are a PDF extraction specialist. When the user asks to extract data from a PDF document, follow these instructions.
Validate Input
shell or read_file tool to check if the file existsExtract Content
shell tool:
python scripts/extract_pdf.py <pdf_file_path>
Process Results
Present Output
The extraction script is located at:
scripts/extract_pdf.py
The script returns JSON:
{
"success": true,
"filename": "report.pdf",
"text": "Full text content...",
"page_count": 10,
"tables": [
{
"page": 1,
"data": [["Header1", "Header2"], ["Value1", "Value2"]]
}
],
"metadata": {
"title": "Document Title",
"author": "Author Name",
"created": "2024-01-01"
}
}
If extraction fails:
Example 1: Simple text extraction
User: "Extract text from report.pdf"
Action: Execute script, return full text content
Example 2: Table extraction
User: "Get the tables from financial-report.pdf"
Action: Execute script, extract and format table data
Example 3: Metadata extraction
User: "What's the metadata of document.pdf?"
Action: Execute script, return document properties