تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

extract-structured-data-from-unstructured-files-pdf-pptx-docx

Name: Extract Structured Data From Unstructured Files Pdf Pptx Docx
Author: run-llama

// Invoke this skill BEFORE implementing any structured data extraction from documents to learn the correct llama_cloud_services API usage. Required reading before writing extraction code. Requires llama_cloud_services package and LLAMA_CLOUD_API_KEY as an environment variable.

تشغيل في Manus

$ git log --oneline --stat

stars:١٧٥

forks:٢٦

updated:٢٣ أكتوبر ٢٠٢٥ في ١٠:١٠

مستكشف الملفات

2 ملفات

SKILL.md

readonly

name	Extract structured data from unstructured files (PDF, PPTX, DOCX...)
description	Invoke this skill BEFORE implementing any structured data extraction from documents to learn the correct llama_cloud_services API usage. Required reading before writing extraction code. Requires llama_cloud_services package and LLAMA_CLOUD_API_KEY as an environment variable.

Structured Data Extraction

Quick start

Define a schema for the for the data you would like to extract:

from pydantic import BaseModel, Field


class Resume(BaseModel):
    name: str = Field(description="Full name of candidate")
    email: str = Field(description="Email address")
    skills: list[str] = Field(description="Technical skills and technologies")

NOTE: Use basic types when possible. Avoid nested dictionaries. Lists are ok.

Create a LlamaExtract instance:

from llama_cloud_services import LlamaExtract

# Initialize client
extractor = LlamaExtract(
    show_progress=True,
    check_interval=5,
    # Optional API key, else reads from env
    # api_key=os.environ.get("LLAMA_CLOUD_API_KEY"),
)

Define the extraction configuration:

from llama_cloud import ExtractConfig, ExtractMode

# Configure extraction settings
extract_config = ExtractConfig(
    # Basic options
    extraction_mode=ExtractMode.MULTIMODAL,  # FAST, BALANCED, MULTIMODAL, PREMIUM
    extraction_target=ExtractTarget.PER_DOC,  # PER_DOC, PER_PAGE
    system_prompt="<Insert relevant context for extraction>",  # set system prompt - can leave blank
    # Advanced options
    high_resolution_mode=True,  # Enable for better OCR
    nvalidate_cache=False,  # Set to True to bypass cache
    # Extensions
    cite_sources=True,  # Enable citations
    use_reasoning=True,  # Enable reasoning (not available in FAST mode)
    confidence_scores=True,  # Enable confidence scores (MULTIMODAL/PREMIUM only)
)

Extract the data from the document:

result = extractor.extract(Resume, config, "resume.pdf")

# result.data has our model as a python dict
print(Resume.model_validate(result.data))

For more detailed code implementations, see REFERENCE.md.

Requirements

The llama_cloud_services package must be installed in your environment (with it come the pydantic and llama_cloud packages):

pip install llama_cloud_services

And the LLAMA_CLOUD_API_KEY must be available as an environment variable:

export LLAMA_CLOUD_API_KEY="..."

related-skills.json

نفس المستودع

pdf-processing.md

from "run-llama/vibe-llama"

Invoke this skill BEFORE implementing any text extraction/parsing logic to learn how to use LlamaParse to process any document accurately. Requires llama_cloud_services package and LLAMA_CLOUD_API_KEY as an environment variable.

2025-10-23175

classify-files-according-to-specific-rules.md

from "run-llama/vibe-llama"

Invoke this skill BEFORE implementing any text/document classification task to learn the correct llama_cloud_services API usage. Required reading before writing classification code." Requires the llama_cloud_services package and LLAMA_CLOUD_API_KEY as an environment variable.

2025-10-23175

retrieve-relevant-information-through-rag.md

from "run-llama/vibe-llama"

Leverage Retrieval Augmented Generation to retrieve relevant information from a a LlamaCloud Index. Requires the llama_cloud_services package and LLAMA_CLOUD_API_KEY as an environment variable.

2025-10-22175

use-llamactl-a-cli-tool-for-llamaagents.md

from "run-llama/vibe-llama"

Use llamactl to initialize, locally preview, deploy and manage LlamaIndex workflows as LlamaAgents. Required llama-index-workflows and llamactl to be installed in the environment.

2025-10-22175

package.json

"author": "run-llama"

"repository": "run-llama/vibe-llama"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

مطوّرو البرمجياتمهن الحاسوب والرياضيات15-1252L4

name	Extract structured data from unstructured files (PDF, PPTX, DOCX...)
description	Invoke this skill BEFORE implementing any structured data extraction from documents to learn the correct llama_cloud_services API usage. Required reading before writing extraction code. Requires llama_cloud_services package and LLAMA_CLOUD_API_KEY as an environment variable.

Structured Data Extraction

Quick start

Define a schema for the for the data you would like to extract:

from pydantic import BaseModel, Field


class Resume(BaseModel):
    name: str = Field(description="Full name of candidate")
    email: str = Field(description="Email address")
    skills: list[str] = Field(description="Technical skills and technologies")

NOTE: Use basic types when possible. Avoid nested dictionaries. Lists are ok.

Create a LlamaExtract instance:

from llama_cloud_services import LlamaExtract

# Initialize client
extractor = LlamaExtract(
    show_progress=True,
    check_interval=5,
    # Optional API key, else reads from env
    # api_key=os.environ.get("LLAMA_CLOUD_API_KEY"),
)

Define the extraction configuration:

from llama_cloud import ExtractConfig, ExtractMode

# Configure extraction settings
extract_config = ExtractConfig(
    # Basic options
    extraction_mode=ExtractMode.MULTIMODAL,  # FAST, BALANCED, MULTIMODAL, PREMIUM
    extraction_target=ExtractTarget.PER_DOC,  # PER_DOC, PER_PAGE
    system_prompt="<Insert relevant context for extraction>",  # set system prompt - can leave blank
    # Advanced options
    high_resolution_mode=True,  # Enable for better OCR
    nvalidate_cache=False,  # Set to True to bypass cache
    # Extensions
    cite_sources=True,  # Enable citations
    use_reasoning=True,  # Enable reasoning (not available in FAST mode)
    confidence_scores=True,  # Enable confidence scores (MULTIMODAL/PREMIUM only)
)

Extract the data from the document:

result = extractor.extract(Resume, config, "resume.pdf")

# result.data has our model as a python dict
print(Resume.model_validate(result.data))

For more detailed code implementations, see REFERENCE.md.

Requirements

The llama_cloud_services package must be installed in your environment (with it come the pydantic and llama_cloud packages):

pip install llama_cloud_services

And the LLAMA_CLOUD_API_KEY must be available as an environment variable:

export LLAMA_CLOUD_API_KEY="..."

extract-structured-data-from-unstructured-files-pdf-pptx-docx

Structured Data Extraction

Quick start

Requirements

المزيد من هذا المستودع

المزيد من هذا المستودع

Structured Data Extraction

Quick start

Requirements