تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

document-indexing

Name: Document Indexing
Author: boringdata

// Extract structured metadata from documents using AI. Classify content types, extract topics and tools. Supports async batch processing.

تشغيل في Manus

$ git log --oneline --stat

stars:٢

forks:٠

updated:٢٩ أكتوبر ٢٠٢٥ في ٢٠:٠٩

SKILL.md

readonly

name	document-indexing
description	Extract structured metadata from documents using AI. Classify content types, extract topics and tools. Supports async batch processing.

Document Indexing

Overview

Extract structured metadata from fetched documents using LLM:

Content type: blog, tutorial, guide, reference, etc.
Topics & Tools: Main subjects and technologies
Structure: Code examples, procedures, narrative

Creates DocumentMetadata records for search and clustering.

Quick Start

# Index single document
kurt index 5494cc13

# Batch index (async, 5-10x faster)
kurt index --url-prefix https://example.com/

# Re-index with custom concurrency
kurt index --url-prefix https://example.com/ --force --max-concurrent 10

Prerequisites: Documents must be FETCHED (kurct content fetch)

Commands

# Single
kurt index <doc-id>
kurt index <doc-id> --force

# Batch (async parallel)
kurt index --url-prefix <url>
kurt index --url-contains <string>
kurt index --max-concurrent 10     # Default: 5

# Filters
kurt index --status FETCHED --url-prefix <url>

Content Types

Extracted Metadata

{
  "content_type": "TUTORIAL",
  "extracted_title": "Machine Learning Guide",
  "primary_topics": ["Machine Learning", "Python"],
  "tools_technologies": ["TensorFlow", "Pandas"],
  "has_code_examples": true,
  "has_step_by_step_procedures": true,
  "has_narrative_structure": false
}

Performance

Sequential: ~3-5s per document
Parallel (5 concurrent): ~1s per document avg
Example: 92 docs in 30s (parallel) vs 5 mins (sequential)

Python API

from kurt.indexing import extract_document_metadata, batch_extract_document_metadata
import asyncio

# Single
result = extract_document_metadata("abc-123")

# Batch
results = asyncio.run(batch_extract_document_metadata(
    ["abc-123", "def-456"],
    max_concurrent=5
))

Troubleshooting

Issue	Solution
"Document not FETCHED"	Run `kurct content fetch <id>` first
"Content file not found"	Re-fetch document
Slow batch	Increase `--max-concurrent`
Rate limits	Reduce `--max-concurrent`

Next Steps

ingest-content-skill - Fetch documents first
document-management-skill - Query and manage documents

related-skills.json

نفس المستودع

intelligence.md

from "boringdata/kurt-demo"

Information gathering utilities (analytics, research, content analysis) (general)

2025-11-042

project-management.md

from "boringdata/kurt-demo"

Manage Kurt projects - add sources/targets, update project.md, detect missing content, track progress. (project)

2025-11-042

onboarding.md

from "boringdata/kurt-demo"

One-time team setup that creates Kurt profile and foundation rules

2025-11-042

feedback.md

from "boringdata/kurt-demo"

Collect content feedback and identify patterns for rule updates

2025-11-042

writing-rules.md

from "boringdata/kurt-demo"

Extract and manage writing rules (style, structure, persona, publisher, custom) (project)

2025-11-042

cms-interaction.md

from "boringdata/kurt-demo"

Configure CMS connections and perform ad-hoc content searches (Sanity, Contentful, WordPress)

2025-11-032

package.json

"author": "boringdata"

"repository": "boringdata/kurt-demo"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

مطوّرو البرمجياتمهن الحاسوب والرياضيات15-1252L4

name	document-indexing
description	Extract structured metadata from documents using AI. Classify content types, extract topics and tools. Supports async batch processing.

Document Indexing

Overview

Extract structured metadata from fetched documents using LLM:

Content type: blog, tutorial, guide, reference, etc.
Topics & Tools: Main subjects and technologies
Structure: Code examples, procedures, narrative

Creates DocumentMetadata records for search and clustering.

Quick Start

# Index single document
kurt index 5494cc13

# Batch index (async, 5-10x faster)
kurt index --url-prefix https://example.com/

# Re-index with custom concurrency
kurt index --url-prefix https://example.com/ --force --max-concurrent 10

Prerequisites: Documents must be FETCHED (kurct content fetch)

Commands

# Single
kurt index <doc-id>
kurt index <doc-id> --force

# Batch (async parallel)
kurt index --url-prefix <url>
kurt index --url-contains <string>
kurt index --max-concurrent 10     # Default: 5

# Filters
kurt index --status FETCHED --url-prefix <url>

Content Types

Extracted Metadata

{
  "content_type": "TUTORIAL",
  "extracted_title": "Machine Learning Guide",
  "primary_topics": ["Machine Learning", "Python"],
  "tools_technologies": ["TensorFlow", "Pandas"],
  "has_code_examples": true,
  "has_step_by_step_procedures": true,
  "has_narrative_structure": false
}

Performance

Sequential: ~3-5s per document
Parallel (5 concurrent): ~1s per document avg
Example: 92 docs in 30s (parallel) vs 5 mins (sequential)

Python API

from kurt.indexing import extract_document_metadata, batch_extract_document_metadata
import asyncio

# Single
result = extract_document_metadata("abc-123")

# Batch
results = asyncio.run(batch_extract_document_metadata(
    ["abc-123", "def-456"],
    max_concurrent=5
))

Troubleshooting

Issue	Solution
"Document not FETCHED"	Run `kurct content fetch <id>` first
"Content file not found"	Re-fetch document
Slow batch	Increase `--max-concurrent`
Rate limits	Reduce `--max-concurrent`

Next Steps

ingest-content-skill - Fetch documents first
document-management-skill - Query and manage documents

document-indexing

Document Indexing

Overview

Quick Start

Commands

Content Types

Extracted Metadata

Performance

Python API

Troubleshooting

Next Steps

المزيد من هذا المستودع

المزيد من هذا المستودع

Document Indexing

Overview

Quick Start

Commands

Content Types

Extracted Metadata

Performance

Python API

Troubleshooting

Next Steps