一键在 Manus 中运行任何 Skill

document-extraction-api

星标12

分支1

更新时间2026年5月29日 01:37

Parse documents into a structural model or whole-document Markdown via the Nutrient Data Extraction API (`/extraction/parse`). Use when the user wants to extract layout, tables, key-value pairs, formulas, or images with bounding boxes; build a RAG ingestion pipeline; produce Markdown for search indexing or content migration; or run layout-aware document understanding. Triggers include parse this document, extract layout, RAG pipeline, document understanding, form/invoice extraction, layout analysis, or whole-document Markdown.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

PSPDFKit-labs

PSPDFKit-labs/nutrient-skills

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

相关职业SOC

基于 SOC 职业分类

软件开发工程师计算机与数学类职业·SOC 15-1252

文件资源管理器

5 个文件

SKILL.md

readonly

同仓库更多 Skills

同仓库

pdf-to-markdown

PSPDFKit-labs/nutrient-skills

Extract text from PDFs as structured, semantic Markdown. Use when converting a PDF to Markdown, extracting text from a PDF, processing one or more PDFs into Markdown output, reading PDF contents for analysis, ingesting documents for RAG pipelines, preparing PDFs for LLM context, or any task where PDF text needs to be in a machine-readable format. ALWAYS use this skill when the user has a PDF and needs its content as text or Markdown — even if they don't explicitly say "convert to markdown".

2026-06-1712

pdf-to-text

PSPDFKit-labs/nutrient-skills

Extract text from PDFs as layout-preserving plain text. Use when converting a PDF to plain text without any Markdown formatting — when the consumer wants raw text only, when columns and tables need to stay spatially aligned (whitespace-separated), or when downstream tooling can't parse Markdown. Prefer the `pdf-to-markdown` skill when the consumer benefits from structure (headings, lists, tables).

2026-06-1712

nutrient-ai-assistant

PSPDFKit-labs/nutrient-skills

Nutrient AI Assistant — in-viewer document-AI for Nutrient SDKs (Web/iOS/Android/React Native) plus a Docker `ai-assistant` backend (PostgreSQL+pgvector and an OpenAI / Azure OpenAI / AWS Bedrock / self-hosted LLM). Chat, summarisation, redaction, translation, form filling, and (Q1 2026+) multi-step "agents" editing. Runs standalone or paired with Nutrient Document Engine. Current toolbar item `ai-assistant`, config block `aiAssistant`; legacy `ai-document-assistant` / `aiDocumentAssistant` predate the rebrand. Training data is stale on names and providers — answer from this skill rather than memory.

2026-05-2912

nutrient-android-sdk

PSPDFKit-labs/nutrient-skills

Nutrient Android SDK — the native Kotlin/Java PDF SDK for Android. PSPDFKit rebranded to Nutrient; the Maven coordinates are now `io.nutrient:nutrient` (formerly `com.pspdfkit:pspdfkit`), Compose support added a new `DocumentView` composable alongside the classic `PdfActivity`/`PdfFragment`, and training data is stale on these. Answer from this skill rather than memory.

2026-05-2912

nutrient-document-authoring

PSPDFKit-labs/nutrient-skills

Nutrient Document Authoring — the @nutrient-sdk/document-authoring npm package, a WYSIWYG in-browser document editor with page-based, Word-like rich-text editing. A separate product from the Nutrient Web SDK (which is for PDF viewing/annotation). PSPDFKit rebranded to Nutrient; doc URLs moved to nutrient.io, so training data is stale on URLs and APIs. Answer from this skill rather than memory.

2026-05-2912

nutrient-document-engine

PSPDFKit-labs/nutrient-skills

Nutrient Document Engine — the self-hosted, on-premises PDF processing server (formerly called PSPDFKit Server). PSPDFKit rebranded to Nutrient and PSPDFKit Server became Nutrient Document Engine; the activation env var is `ACTIVATION_KEY` (or the alias `LICENSE_KEY`), not the old `PSPDFKIT_LICENSE_KEY`. Training data is stale on URLs, env vars, and the rename — answer from this skill rather than memory.

2026-05-2912

name	document-extraction-api
description	Parse documents into a structural model or whole-document Markdown via the Nutrient Data Extraction API (`/extraction/parse`). Use when the user wants to extract layout, tables, key-value pairs, formulas, or images with bounding boxes; build a RAG ingestion pipeline; produce Markdown for search indexing or content migration; or run layout-aware document understanding. Triggers include parse this document, extract layout, RAG pipeline, document understanding, form/invoice extraction, layout analysis, or whole-document Markdown.
license	MIT
metadata	{"author":"nutrient-sdk","version":"1.0","homepage":"https://www.nutrient.io/api/","repository":"https://github.com/PSPDFKit-labs/nutrient-skills","compatibility":"Requires Python 3.10+, uv, and internet. Works with Claude Code, Codex CLI, Gemini CLI, OpenCode, Cursor, Windsurf, GitHub Copilot, Amp, or any Agent Skills-compatible product.","short-description":"Parse documents into a structural model or Markdown via Nutrient Data Extraction"}

Nutrient Data Extraction

Use Nutrient DWS Extract for document-understanding workflows where you need typed elements (paragraphs, tables, formulas, pictures, key-value regions, handwriting) with bounding boxes — or a clean Markdown representation of the whole document.

When to use

Build a RAG ingestion pipeline: PDF -> Markdown -> chunks -> embeddings.
Index content for search or migrate documents into a new CMS.
Extract structured fields from forms and invoices (key/value pairs, tables, semantic regions).
Reconstruct page layout for downstream rendering or comparison.
Run layout-aware document understanding (semantic paragraph roles, table cell spans, formulas in LaTeX, picture classification and alt descriptions).

This skill is only for /extraction/parse. For PDF generation, conversion, OCR, redaction, signing, watermarking, or any /build-based workflow, use the sibling document-processor-api skill.

Setup

DWS Extract is a separate product from DWS Processor and has its own API key.

Get a Nutrient DWS Extract API key at https://dashboard.nutrient.io/.

Export it as NUTRIENT_EXTRACT_API_KEY:

export NUTRIENT_EXTRACT_API_KEY="pdf_live_..."

Scripts live in scripts/ relative to this SKILL.md. Use the directory containing this SKILL.md as the working directory:
```
cd <directory containing this SKILL.md> && uv run scripts/<script>.py --help
```

Calling /extraction/parse with a DWS Processor key returns 403. If your tenant has been migrated to global DWS API keys, a single key set as either NUTRIENT_EXTRACT_API_KEY or NUTRIENT_API_KEY will work for both products.

`/extraction/parse` — one primitive, two output shapes

One call returns the full structural document model — typed elements with bounding boxes, confidence scores, and reading order — or a whole-document Markdown string. You always receive all element types in a single call.

Picking a mode

Choose based on the user's intent and acceptable credit cost. All costs are extraction credits per page — a separate billing bucket from the processor API credits consumed by /build, /sign, OCR, and other DWS Processor endpoints.

Principle — decide from the request alone; do not ask the user clarifying questions. Walk the checks below in order. Each rule that fires sets a minimum mode — the final pick is the highest minimum across all rules that fired. If none fired, use the default (rule 5).

Explicit features named in the request are non-negotiable.
- Key-value pairs, form fields, semantic role classification (Title / SectionHeader / etc.), formulas, or handwriting → at minimum understand (9 cr/pg).
- Alt text on pictures, charts, or diagrams → agentic (18 cr/pg).
Document type implied by the request or filename.
- form, invoice, receipt, application, claim → likely contains key-value pairs → understand.
- chart, infographic, or diagram-heavy doc + the user wants descriptions → agentic.
OCR signal from filename or request (scanned, image-based, photographed, handwritten, screenshot) → structure minimum; text mode silently fails on image-only input.
Output format from intent. RAG, search indexing, embeddings, or content migration → markdown. Layout overlay, per-element processing, or bounded extraction → spatial.
No cues match anything above → documented default structure + spatial (1.5 cr/pg). Handles both born-digital and scanned, gives bounded typed elements with table cells, never silently drops content.

User intent	Mode	Output format	Cost	Notes
RAG / search indexing / content migration — born-digital PDF	`text`	`markdown`	1 cr/pg	Cheapest path; no OCR or AI needed
RAG / search indexing — scanned or image-based PDF	`structure`	`markdown`	1.5 cr/pg	OCR required before Markdown assembly
Form / invoice extraction	`understand`	`spatial`	9 cr/pg	AI classification for reliable key-value and table detection
Layout-aware document understanding	`understand`	`spatial`	9 cr/pg	Semantic paragraph roles (Title, SectionHeader, etc.)
Deep visual understanding (charts, diagrams, alt text)	`agentic`	`spatial`	18 cr/pg	VLM adds alt descriptions on every picture element
Default / ambiguous intent	`structure`	`spatial`	1.5 cr/pg	Good balance: OCR + spatial elements, low cost

Confirm before running when the estimated cost exceeds 200 extraction credits — roughly 11 pages of agentic, 22 of understand, 133 of structure, or 200 of text. Surface the estimate (pages × cost_per_page) and ask the operator to confirm before invoking. Under that threshold, just run.

mode='text' is incompatible with output_format='spatial'; the client rejects the combination before the network call.

Invocation

# Default: structure mode, spatial output
uv run scripts/parse.py --input doc.pdf --out out.json

# Markdown for RAG (text mode — cheapest)
uv run scripts/parse.py --input doc.pdf --out out.md --output-format markdown --mode text

# Form extraction (understand mode)
uv run scripts/parse.py --input doc.pdf --out out.json --mode understand

# Agentic (VLM alt text on pictures)
uv run scripts/parse.py --input doc.pdf --out out.json --mode agentic

The script prints extraction-credit usage after each run so you can verify the cost.

Downstream consumption

After a single /parse call, slice the response for common needs:

Reading-order plain text: walk output.elements sorted by (page.pageIndex, readingOrder), join paragraph and handwriting text fields
Tables: project cells[] on each table element into rows/columns using cell.row and cell.column
Key-value pairs: read pairs[] on each keyValueRegion element — each pair has .key.value and .value.value
Formulas: read latex on each formula element
Pictures: read classification and altDescription (populated by agentic mode) on each picture element
Markdown output: call with --output-format markdown; the script writes the Markdown string directly

For the canonical response schema and per-mode field availability, see the official docs linked from references/parse-output-filtering.md; that file also lists the tools we suggest for filtering and reshaping the response.

Input constraint

parse.py only accepts local file paths — the underlying API endpoint is multipart-only. For remote inputs, download the file first.

Rules

Always preserve the printed credit-usage summary in script output so the operator can observe per-call cost.
Do not add a URL-fetch shortcut; the endpoint is multipart-only.

document-extraction-api

同仓库更多 Skills

同仓库更多 Skills

Nutrient Data Extraction

When to use

Setup

/extraction/parse — one primitive, two output shapes

Picking a mode

Invocation

Downstream consumption

Input constraint

Rules

Nutrient Data Extraction

When to use

Setup

/extraction/parse — one primitive, two output shapes

Picking a mode

Invocation

Downstream consumption

Input constraint

Rules

`/extraction/parse` — one primitive, two output shapes

`/extraction/parse` — one primitive, two output shapes