Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

pdf-to-llm

Use when the user wants to read, summarize, analyze, compare, or ask questions about a PDF and the built-in PDF reader produces noisy or incomplete results. Also use when the user mentions PDF parsing, markdown extraction, OCR, scanned PDFs, or preparing documents for LLM input. Triggers: "PDF 변환", "PDF 읽어줘", "PDF 분석", "스캔 PDF", "OCR", "PDF to markdown", "pdf-to-llm", "PDF 파싱", "문서 변환", "convert PDF", "extract text from PDF", "PDF 텍스트 추출".

Exécuter dans Manus

Aperçu

Commande d'installation

npx skills add https://github.com/onejaejae/skills --skill pdf-to-llm

Copiez et collez cette commande dans Claude Code pour installer le skill

Source

onejaejae/skills

Étoiles2

Forks0

Mis à jour17 avril 2026 à 05:19

SKILL.md

readonly

name

pdf-to-llm

description

pdf-to-llm

Convert PDFs into clean Markdown and structured JSON using opendataloader-pdf, so you can work with the content instead of fighting layout noise.

Why this skill exists

The built-in PDF reader (Read tool) works for simple PDFs but struggles with complex layouts, tables, multi-column documents, and scanned pages. opendataloader-pdf handles these cases by producing structured Markdown (for reading) and JSON (for page-level citations and coordinates). The difference matters most for documents where layout carries meaning — research papers, financial reports, contracts, forms.

Preflight

Before converting, check dependencies in this order:

Confirm the input PDF path or URL exists.
Check java -version — Java 11+ is required. If missing, stop and tell the user.
Check if opendataloader-pdf is installed: pip show opendataloader-pdf

If not installed:

pip install -U opendataloader-pdf

Only install the heavier hybrid package when the PDF is scanned, image-based, or OCR-dependent:

pip install -U "opendataloader-pdf[hybrid]"

Conversion modes

Fast mode (default)

Use this first for normal digital PDFs. It handles most cases well.

opendataloader-pdf INPUT.pdf \
  --output-dir OUTPUT_DIR \
  --format markdown,json \
  --use-struct-tree \
  --quiet

--use-struct-tree is safe to try first — tagged PDFs benefit, untagged ones fall back to visual heuristics.

Hybrid mode (OCR / scanned PDFs)

Escalate to hybrid only when fast mode fails or produces clearly degraded output:

No selectable text in the PDF
Scanned or image-only pages
Badly broken tables or reading order
Multilingual OCR (e.g., Korean + English)

Start the hybrid server:

opendataloader-pdf-hybrid --port 5002 --force-ocr --ocr-lang "ko,en"

Then convert:

opendataloader-pdf INPUT.pdf \
  --output-dir OUTPUT_DIR \
  --format markdown,json \
  --hybrid docling-fast \
  --quiet

For formulas or image descriptions, add --hybrid-mode full on the client side.

Working with the output

Prefer Markdown for reading, JSON for structure.

Read the generated .md file first — it's the primary artifact.
Use .json only when page numbers, element types, or bounding boxes matter (citations, coordinates).
Don't paste the whole converted file into chat. Quote only the relevant section and keep the full file on disk.
Ignore repeated headers/footers unless the user asks for them.
Summarize from Markdown, not from raw OCR output.

After conversion, always report

Where the .md and .json files were written
Whether fast or hybrid mode was used
Whether OCR was required
Any obvious caveats (broken tables, missing text, garbled sections)

Common tasks

Task	Approach
PDF 요약	Convert to markdown → read → summarize by section (not by page)
PDF 2개 비교	Convert both in one batch → compare headings, sections, tables from Markdown
스캔된 PDF	Install hybrid deps → start server with OCR flags → convert with `--hybrid docling-fast`
특정 페이지만	Convert full PDF first, then read only the relevant section from Markdown

Common mistakes

Mistake	Fix
Dumping the entire converted file into chat	Quote only relevant sections — the full file stays on disk
Using JSON as the reading format	JSON is for structure/citations. Read from Markdown.
Installing hybrid deps for a normal digital PDF	Try fast mode first. Only escalate when output is clearly degraded.
Skipping the preflight check	Java missing = cryptic errors downstream. Always verify.
Running OCR without specifying language	Set `--ocr-lang` explicitly for better accuracy, especially with Korean.

Failure handling

Java missing: Stop immediately, tell the user Java 11+ is required.
Fast mode output is degraded: Retry with hybrid before concluding the PDF is unreadable.
Hybrid deps unavailable: Fall back to fast mode and note the caveats explicitly.
Remote PDF: Download to a temp file first, then convert.

Plus depuis ce dépôt

même dépôt

clarify-vague

onejaejae/skills

Use when the user's request is too vague or ambiguous to act on safely, even if they haven't explicitly asked for clarification. Also use when another skill says "clarify first" or when jumping to implementation would be risky because the problem statement is unclear. Triggers: "clarify requirements", "refine requirements", "make this concrete", "요구사항 정리", "막연한데 정리해줘", "아이디어 구체화", "명확하게 해줘", "뭘 원하는지 모르겠어", "정리부터 하자", "vague idea", "clarify this", "구체화해줘", "스펙 정리", "뭘 만들어야 할지 모르겠어". Prefer over /discuss when the goal is convergence, not open exploration.

2026-04-172

graphify

onejaejae/skills

any input (code, docs, papers, images) → knowledge graph → clustered communities → HTML + JSON + audit report

2026-04-172

notion-jira-subtask-migrator

onejaejae/skills

Migrate Notion pages or child pages into Jira subtasks. Use when the user says "노션 페이지를 지라 하위작업으로 옮겨줘", "노션 하위 페이지를 서브태스크로", "notion to jira", "jira subtask로 이식", "하위 작업 추가", "지라로 옮겨", or provides a Notion page URL and a Jira parent issue together. Prefer a one-page test first, then batch-create the rest sequentially. Supports fixed assignee values like enzo.cho / enzo.cho@kakaohealthcare.com and relies on Jira's default create status, which should verify as To Do after creation.

2026-04-172

youtube-notion-digest

onejaejae/skills

Use when the user shares a YouTube video URL and wants transcript extraction, a structured digest, expanded insights, a quality audit, or a polished Notion page. Trigger on requests like "유튜브 영상 정리", "영상 스크립트 가져와", "이 유튜브 노션에 정리", "transcript this YouTube video", "summarize this talk into Notion", "영상 인사이트 정리", or a bare YouTube link plus a request to summarize, archive, or extract learnings. Prefer this over generic digest when the source is a YouTube video and the user wants transcript-grounded analysis rather than a lightweight summary.

2026-04-172

gh-pr-create

onejaejae/skills

GitHub Draft PR 생성 스킬. push 전에 lint/포맷 게이트, 커밋 메시지 컨벤션, 테스트 안전성 확인 같은 pre-flight 체크를 먼저 돌린 뒤 draft PR을 생성한다. Use when "PR 생성", "PR 올려줘", "draft PR", "gh pr create", "/ship", "풀리퀘스트 만들어줘", "create PR", "open PR", "PR 열어줘", "PR 만들어", "push하고 PR 올려줘", "코드 올려줘", "ship it". 단순 draft PR 생성부터 린트/커밋포맷/테스트 안전성까지 한 번에 게이트.

2026-04-132

hackathon-judge

onejaejae/skills

Snowflake Korea Hackathon 2026 테크트랙 공식 평가 기준(5 카테고리 × 3 항목 × 4 체크포인트 = 60점 루브릭)으로 제품을 지속 평가/개선한다. 증거 기반 체크포인트 채점 → action plan 생성 → dry-run diff → 사용자 승인 → 일괄 구현 → 재평가 루프. 95점 달성 OR 3 cycle 수렴 OR 마감 4시간 전 종료. 이 프로젝트(이사 결정 AI 시뮬레이터) 전용.

2026-04-132

Source

onejaejae

onejaejae/skills

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Commande d'installation

Téléchargement

Exécuter dans Manus

Utile pourSOC

Commis au traitement des sinistres et des polices d'assuranceBureau et soutien administratif43-9041L4

name

pdf-to-llm

description

pdf-to-llm

Convert PDFs into clean Markdown and structured JSON using opendataloader-pdf, so you can work with the content instead of fighting layout noise.

Why this skill exists

Preflight

Before converting, check dependencies in this order:

Confirm the input PDF path or URL exists.
Check java -version — Java 11+ is required. If missing, stop and tell the user.
Check if opendataloader-pdf is installed: pip show opendataloader-pdf

If not installed:

pip install -U opendataloader-pdf

Only install the heavier hybrid package when the PDF is scanned, image-based, or OCR-dependent:

pip install -U "opendataloader-pdf[hybrid]"

Conversion modes

Fast mode (default)

Use this first for normal digital PDFs. It handles most cases well.

opendataloader-pdf INPUT.pdf \
  --output-dir OUTPUT_DIR \
  --format markdown,json \
  --use-struct-tree \
  --quiet

--use-struct-tree is safe to try first — tagged PDFs benefit, untagged ones fall back to visual heuristics.

Hybrid mode (OCR / scanned PDFs)

Escalate to hybrid only when fast mode fails or produces clearly degraded output:

No selectable text in the PDF
Scanned or image-only pages
Badly broken tables or reading order
Multilingual OCR (e.g., Korean + English)

Start the hybrid server:

opendataloader-pdf-hybrid --port 5002 --force-ocr --ocr-lang "ko,en"

Then convert:

opendataloader-pdf INPUT.pdf \
  --output-dir OUTPUT_DIR \
  --format markdown,json \
  --hybrid docling-fast \
  --quiet

For formulas or image descriptions, add --hybrid-mode full on the client side.

Working with the output

Prefer Markdown for reading, JSON for structure.

Read the generated .md file first — it's the primary artifact.
Use .json only when page numbers, element types, or bounding boxes matter (citations, coordinates).
Don't paste the whole converted file into chat. Quote only the relevant section and keep the full file on disk.
Ignore repeated headers/footers unless the user asks for them.
Summarize from Markdown, not from raw OCR output.

After conversion, always report

Where the .md and .json files were written
Whether fast or hybrid mode was used
Whether OCR was required
Any obvious caveats (broken tables, missing text, garbled sections)

Common tasks

Task	Approach
PDF 요약	Convert to markdown → read → summarize by section (not by page)
PDF 2개 비교	Convert both in one batch → compare headings, sections, tables from Markdown
스캔된 PDF	Install hybrid deps → start server with OCR flags → convert with `--hybrid docling-fast`
특정 페이지만	Convert full PDF first, then read only the relevant section from Markdown

Common mistakes

Mistake	Fix
Dumping the entire converted file into chat	Quote only relevant sections — the full file stays on disk
Using JSON as the reading format	JSON is for structure/citations. Read from Markdown.
Installing hybrid deps for a normal digital PDF	Try fast mode first. Only escalate when output is clearly degraded.
Skipping the preflight check	Java missing = cryptic errors downstream. Always verify.
Running OCR without specifying language	Set `--ocr-lang` explicitly for better accuracy, especially with Korean.

Failure handling

Java missing: Stop immediately, tell the user Java 11+ is required.
Fast mode output is degraded: Retry with hybrid before concluding the PDF is unreadable.
Hybrid deps unavailable: Fall back to fast mode and note the caveats explicitly.
Remote PDF: Download to a temp file first, then convert.