Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

pdf-markitdown

Convert PDF to structured Markdown via markitdown and ingest into the knowledge graph. Preserves tables, headings, and formatting better than plain-text extraction.

Ejecutar en Manus

Resumen

Convert PDF to structured Markdown via markitdown and ingest into the knowledge graph. Preserves tables, headings, and formatting better than plain-text extraction.

Comando de instalación

npx skills add https://github.com/okikusan-public/knowledge_graph --skill pdf-markitdown

Copia y pega este comando en Claude Code para instalar la habilidad

Fuente

okikusan-public/knowledge_graph

Estrellas6

Forks0

Actualizado11 de abril de 2026, 02:02

SKILL.md

readonly

name: pdf-markitdown description: Convert PDF to structured Markdown via markitdown and ingest into the knowledge graph. Preserves tables, headings, and formatting better than plain-text extraction. disable-model-invocation: false argument-hint: [filename.pdf] [--project project_name] [--auto] allowed-tools: Bash, Read, Write, Glob

PDF Markitdown Ingestion

Convert PDF documents to structured Markdown using Microsoft markitdown (preserving tables, headings, and formatting), then ingest into the knowledge graph with entity extraction.

Usage

/pdf-markitdown report.pdf
/pdf-markitdown report.pdf --project project_a
/pdf-markitdown report.pdf --auto

Mode Selection

Parse arguments from $ARGUMENTS:

--auto flag present → Automated Mode (skip to Automated Workflow below)
No --auto flag → Interactive Mode (default)

Interactive Mode (default)

1. Find and Convert PDF

Search for the file in docs/ (Glob docs/**/$FILENAME). If not found, list files in docs/ and prompt the user.

Convert to structured Markdown:

python ${CLAUDE_SKILL_DIR}/../../../scripts/pdf_markitdown.py "<file_path>"

Capture the output path (last line of stdout). The output is saved as {stem}_markitdown.md alongside the input.

2. Ingest Markdown into Graph

python ${CLAUDE_SKILL_DIR}/../../../scripts/auto_ingest.py upsert "<md_file_path>" $1 $2

Note the chunk count from the output.

3. Extract Entities (Claude Code = Claude)

Read the generated markdown file using the Read tool. Analyze the structured content and extract entities and relationships as JSON.

Entity types:

PERSON, ORGANIZATION, TECHNOLOGY, REQUIREMENT, SCHEDULE, BUDGET, RISK
PROPOSAL_PATTERN, EVALUATION_CRITERIA, DELIVERABLE, SECURITY, DOMAIN, CONCEPT

Extraction rules:

Extract concrete, specific entities (not generic terms like "system" or "data")
Normalize entity names (consistent casing, resolve abbreviations)
Each entity needs a brief description (1-2 sentences)
Relationships should capture meaningful connections between entities
Relationship types should be descriptive verbs/phrases (e.g., "uses", "manages", "depends_on")
Leverage the structured Markdown (tables, headings) for more precise extraction

4. Detect Entity Conflicts

Before saving, check for conflicts with existing entities:

python -c "
import sys, json
sys.path.insert(0, '.')
from neo4j import GraphDatabase
from config import get_config
from scripts.save_entities import query_existing_entities
cfg = get_config()
driver = GraphDatabase.driver(cfg.neo4j_uri, auth=cfg.neo4j_auth)
names = [COMMA_SEPARATED_ENTITY_NAMES]
result = query_existing_entities(driver, names)
driver.close()
for name, info in result.items():
    print(json.dumps({'name': name, 'type': info['type'], 'description': info['description'], 'sources': info['sources']}, ensure_ascii=False))
"

For each entity that already exists:

No conflict (additive or equivalent): Proceed. Use the more comprehensive description.
Conflict detected (contradicting facts): Present both to the user and let them decide.

5. Save Entities to Graph

cat <<'ENTITIES_JSON' | python ${CLAUDE_SKILL_DIR}/../../../scripts/save_entities.py --source-path "<md_file_path>" $1 $2
{"entities": [...], "relationships": [...]}
ENTITIES_JSON

Note: Community detection and relationship discovery run automatically via the post-entity-save hook.

6. Report Results

Report: original PDF filename, markdown output path, chunk count, entity count, relationship count, and any conflicts resolved.

Automated Mode (`--auto`)

Run the full pipeline script for headless/batch processing:

${CLAUDE_SKILL_DIR}/../../../scripts/ingest_pipeline.sh "<file_path>" $1 $2

This executes: markitdown conversion → auto_ingest → entity extraction (via claude --print) → save_entities → community detection → relationship discovery.

Report the pipeline output when complete.

Comparison with Other Skills

Feature	/ingest	/pdf-markitdown	/visual-extract
PDF handling	pymupdf (plain text)	markitdown (structured MD)	render → Claude vision
Table preservation	No	Yes	Yes (via image)
Heading structure	No	Yes	Yes (via image)
Diagram extraction	No	No	Yes
OCR (scanned PDF)	No	No	Yes
Automated mode	No	Yes (`--auto`)	No
Best for	General docs	Structured PDFs	Visual/diagram-heavy PDFs

Notes

Requires: pip install 'markitdown[pdf]'
The _markitdown.md file is saved alongside the original for reference and re-use
For scanned PDFs or diagram-heavy documents, use /visual-extract instead
Entity extraction is performed by Claude Code itself (interactive) or claude --print (automated)

Más de este repositorio

mismo repositorio

agentic-search

okikusan-public/knowledge_graph

Autonomous multi-tool search agent that dynamically selects search strategies, evaluates results, and iterates until sufficient context is gathered to answer complex questions

2026-04-146

x-search

okikusan-public/knowledge_graph

Search X (Twitter) posts via Grok API and ingest results into the knowledge graph.

2026-04-126

youtube-markitdown

okikusan-public/knowledge_graph

Convert YouTube video to structured Markdown via markitdown (metadata + transcript) and ingest into the knowledge graph.

2026-04-126

add-knowledge

okikusan-public/knowledge_graph

Add knowledge directly to the graph from free-form text input without requiring a document file. Supports Note, WebSource, Conversation, or any custom source type.

2026-04-116

quiz

okikusan-public/knowledge_graph

Spaced repetition quiz using GraphRAG entities. Generates questions, grades answers, and tracks learning progress with optimal review intervals.

2026-03-226

graph-search

okikusan-public/knowledge_graph

Hybrid search combining vector similarity with graph traversal for multi-hop reasoning over the knowledge graph

2026-03-206

Fuente

okikusan-public

okikusan-public/knowledge_graph

Abrir repositorio de GitHub Ver repositorios del creador

Comando de instalación

Descarga

Ejecutar en Manus

Útil paraSOC

Desarrolladores de softwareOcupaciones informáticas y matemáticas15-1252L4

name: pdf-markitdown description: Convert PDF to structured Markdown via markitdown and ingest into the knowledge graph. Preserves tables, headings, and formatting better than plain-text extraction. disable-model-invocation: false argument-hint: [filename.pdf] [--project project_name] [--auto] allowed-tools: Bash, Read, Write, Glob

PDF Markitdown Ingestion

Convert PDF documents to structured Markdown using Microsoft markitdown (preserving tables, headings, and formatting), then ingest into the knowledge graph with entity extraction.

Usage

/pdf-markitdown report.pdf
/pdf-markitdown report.pdf --project project_a
/pdf-markitdown report.pdf --auto

Mode Selection

Parse arguments from $ARGUMENTS:

--auto flag present → Automated Mode (skip to Automated Workflow below)
No --auto flag → Interactive Mode (default)

Interactive Mode (default)

1. Find and Convert PDF

Search for the file in docs/ (Glob docs/**/$FILENAME). If not found, list files in docs/ and prompt the user.

Convert to structured Markdown:

python ${CLAUDE_SKILL_DIR}/../../../scripts/pdf_markitdown.py "<file_path>"

Capture the output path (last line of stdout). The output is saved as {stem}_markitdown.md alongside the input.

2. Ingest Markdown into Graph

python ${CLAUDE_SKILL_DIR}/../../../scripts/auto_ingest.py upsert "<md_file_path>" $1 $2

Note the chunk count from the output.

3. Extract Entities (Claude Code = Claude)

Read the generated markdown file using the Read tool. Analyze the structured content and extract entities and relationships as JSON.

Entity types:

PERSON, ORGANIZATION, TECHNOLOGY, REQUIREMENT, SCHEDULE, BUDGET, RISK
PROPOSAL_PATTERN, EVALUATION_CRITERIA, DELIVERABLE, SECURITY, DOMAIN, CONCEPT

Extraction rules:

Extract concrete, specific entities (not generic terms like "system" or "data")
Normalize entity names (consistent casing, resolve abbreviations)
Each entity needs a brief description (1-2 sentences)
Relationships should capture meaningful connections between entities
Relationship types should be descriptive verbs/phrases (e.g., "uses", "manages", "depends_on")
Leverage the structured Markdown (tables, headings) for more precise extraction

4. Detect Entity Conflicts

Before saving, check for conflicts with existing entities:

python -c "
import sys, json
sys.path.insert(0, '.')
from neo4j import GraphDatabase
from config import get_config
from scripts.save_entities import query_existing_entities
cfg = get_config()
driver = GraphDatabase.driver(cfg.neo4j_uri, auth=cfg.neo4j_auth)
names = [COMMA_SEPARATED_ENTITY_NAMES]
result = query_existing_entities(driver, names)
driver.close()
for name, info in result.items():
    print(json.dumps({'name': name, 'type': info['type'], 'description': info['description'], 'sources': info['sources']}, ensure_ascii=False))
"

For each entity that already exists:

No conflict (additive or equivalent): Proceed. Use the more comprehensive description.
Conflict detected (contradicting facts): Present both to the user and let them decide.

5. Save Entities to Graph

cat <<'ENTITIES_JSON' | python ${CLAUDE_SKILL_DIR}/../../../scripts/save_entities.py --source-path "<md_file_path>" $1 $2
{"entities": [...], "relationships": [...]}
ENTITIES_JSON

Note: Community detection and relationship discovery run automatically via the post-entity-save hook.

6. Report Results

Report: original PDF filename, markdown output path, chunk count, entity count, relationship count, and any conflicts resolved.

Automated Mode (`--auto`)

Run the full pipeline script for headless/batch processing:

${CLAUDE_SKILL_DIR}/../../../scripts/ingest_pipeline.sh "<file_path>" $1 $2

This executes: markitdown conversion → auto_ingest → entity extraction (via claude --print) → save_entities → community detection → relationship discovery.

Report the pipeline output when complete.

Comparison with Other Skills

Feature	/ingest	/pdf-markitdown	/visual-extract
PDF handling	pymupdf (plain text)	markitdown (structured MD)	render → Claude vision
Table preservation	No	Yes	Yes (via image)
Heading structure	No	Yes	Yes (via image)
Diagram extraction	No	No	Yes
OCR (scanned PDF)	No	No	Yes
Automated mode	No	Yes (`--auto`)	No
Best for	General docs	Structured PDFs	Visual/diagram-heavy PDFs

Notes

Requires: pip install 'markitdown[pdf]'
The _markitdown.md file is saved alongside the original for reference and re-use
For scanned PDFs or diagram-heavy documents, use /visual-extract instead
Entity extraction is performed by Claude Code itself (interactive) or claude --print (automated)

pdf-markitdown

PDF Markitdown Ingestion

Usage

Mode Selection

Interactive Mode (default)

1. Find and Convert PDF

2. Ingest Markdown into Graph

3. Extract Entities (Claude Code = Claude)

4. Detect Entity Conflicts

5. Save Entities to Graph

6. Report Results

Automated Mode (--auto)

Comparison with Other Skills

Notes

Más de este repositorio

Más de este repositorio

PDF Markitdown Ingestion

Usage

Mode Selection

Interactive Mode (default)

1. Find and Convert PDF

2. Ingest Markdown into Graph

3. Extract Entities (Claude Code = Claude)

4. Detect Entity Conflicts

5. Save Entities to Graph

6. Report Results

Automated Mode (--auto)

Comparison with Other Skills

Notes

Automated Mode (`--auto`)

Automated Mode (`--auto`)