بنقرة واحدة
document-indexing
// Extract structured metadata from documents using AI. Classify content types, extract topics and tools. Supports async batch processing.
// Extract structured metadata from documents using AI. Classify content types, extract topics and tools. Supports async batch processing.
Information gathering utilities (analytics, research, content analysis) (general)
Manage Kurt projects - add sources/targets, update project.md, detect missing content, track progress. (project)
One-time team setup that creates Kurt profile and foundation rules
Collect content feedback and identify patterns for rule updates
Extract and manage writing rules (style, structure, persona, publisher, custom) (project)
Configure CMS connections and perform ad-hoc content searches (Sanity, Contentful, WordPress)
| name | document-indexing |
| description | Extract structured metadata from documents using AI. Classify content types, extract topics and tools. Supports async batch processing. |
Extract structured metadata from fetched documents using LLM:
Creates DocumentMetadata records for search and clustering.
# Index single document
kurt index 5494cc13
# Batch index (async, 5-10x faster)
kurt index --url-prefix https://example.com/
# Re-index with custom concurrency
kurt index --url-prefix https://example.com/ --force --max-concurrent 10
Prerequisites: Documents must be FETCHED (kurct content fetch)
# Single
kurt index <doc-id>
kurt index <doc-id> --force
# Batch (async parallel)
kurt index --url-prefix <url>
kurt index --url-contains <string>
kurt index --max-concurrent 10 # Default: 5
# Filters
kurt index --status FETCHED --url-prefix <url>
BLOG | TUTORIAL | GUIDE | REFERENCE | WHITEPAPER | CASE_STUDY | FAQ | CHANGELOG | MARKETING | OTHER
{
"content_type": "TUTORIAL",
"extracted_title": "Machine Learning Guide",
"primary_topics": ["Machine Learning", "Python"],
"tools_technologies": ["TensorFlow", "Pandas"],
"has_code_examples": true,
"has_step_by_step_procedures": true,
"has_narrative_structure": false
}
from kurt.indexing import extract_document_metadata, batch_extract_document_metadata
import asyncio
# Single
result = extract_document_metadata("abc-123")
# Batch
results = asyncio.run(batch_extract_document_metadata(
["abc-123", "def-456"],
max_concurrent=5
))
| Issue | Solution |
|---|---|
| "Document not FETCHED" | Run kurct content fetch <id> first |
| "Content file not found" | Re-fetch document |
| Slow batch | Increase --max-concurrent |
| Rate limits | Reduce --max-concurrent |