تشغيل أي مهارة في Manus بنقرة واحدة

pdf-to-text

النجوم١٢

التفرعات١

آخر تحديث١٧ يونيو ٢٠٢٦ في ١٤:٤١

Extract text from PDFs as layout-preserving plain text. Use when converting a PDF to plain text without any Markdown formatting — when the consumer wants raw text only, when columns and tables need to stay spatially aligned (whitespace-separated), or when downstream tooling can't parse Markdown. Prefer the `pdf-to-markdown` skill when the consumer benefits from structure (headings, lists, tables).

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

PSPDFKit-labs

PSPDFKit-labs/nutrient-skills

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

مطوّرو البرمجياتمهن الحاسوب والرياضيات·SOC 15-1252

مستكشف الملفات

2 ملفات

SKILL.md

readonly

name	pdf-to-text
description	Extract text from PDFs as layout-preserving plain text. Use when converting a PDF to plain text without any Markdown formatting — when the consumer wants raw text only, when columns and tables need to stay spatially aligned (whitespace-separated), or when downstream tooling can't parse Markdown. Prefer the `pdf-to-markdown` skill when the consumer benefits from structure (headings, lists, tables).
license	Proprietary

PDF to Text

Convert PDFs into layout-preserving plain text. Each word is placed on a character grid that mirrors its on-page position, so columns, indentation, and tabular alignment survive the conversion. This is significantly higher quality than reading a PDF directly with the read tool, which only extracts loose text without spatial fidelity.

When to use this vs. pdf-to-markdown

Use pdf-to-text when the downstream consumer is plain-text only (a non-Markdown LLM, a grep/awk pipeline, a CSV-style table extractor that cares about column alignment).
Use pdf-to-markdown when the consumer benefits from semantic structure (headings, lists, tables, reading order). Most RAG and LLM-context pipelines fall here.

Usage

Before running any commands, set SKILL_DIR to the absolute path of the directory containing this SKILL.md file. Use $SKILL_DIR/bin/pdf-to-text in all commands below.

The $SKILL_DIR/bin/pdf-to-text wrapper automatically installs the platform-specific binary into ~/.local/share/nutrient/cli/ from the CDN. It caches the binary and only checks for updates every 6 hours, so subsequent runs are fast. The same binary backs pdf-to-markdown, pdf-to-text, and self-update, so installing either skill gets you the same ~/.local/share/nutrient/cli/ install.

Single file

$SKILL_DIR/bin/pdf-to-text INPUT.pdf OUTPUT.txt

If OUTPUT.txt is omitted, the converter writes the text to stdout instead.

Batch directory (2+ files)

For multiple files, pass directories instead of individual files. The converter processes all PDFs in the input directory in parallel, which is much faster than converting one at a time.

$SKILL_DIR/bin/pdf-to-text INPUT_DIR/ OUTPUT_DIR/

Workflow

Choose mode: Use batch directory mode for 2+ files, single file mode otherwise.
Run the converter: $SKILL_DIR/bin/pdf-to-text INPUT [OUTPUT]
Check the exit code: Exit 0 means success. On failure, read stderr for the error message.
Validate the output: If the output file is empty or near-empty, the PDF is likely image-only — see Troubleshooting below.
Report the output path: Tell the user where the converted file(s) are. Do NOT read the text back into context by default — converted documents can be very large and will fill the context window. Only read the output if the user's task specifically requires analyzing or summarizing the content.

Troubleshooting

Empty or minimal output: The PDF is most likely scanned/image-only and contains no extractable text. This skill does not OCR; use a vision-capable tool first.
Non-zero exit code: Read stderr for the specific error. Common causes: corrupted PDF, unsupported encryption, or network issues during first-run binary download.
First run is slow: The wrapper downloads the platform binary on first use (~a few seconds). Subsequent runs use the cached binary.
Columns look wrong: The extractor mirrors spatial layout exactly, so unusual PDF page geometry (e.g. rotated pages, two-column reflows) can produce surprising alignment. Try pdf-to-markdown if the document has a regular structure the markdown exporter can recognize.

License

Free for processing up to 1,000 documents per calendar month.

Commercial license required for:

processing over 1,000 documents/month
redistributing the binary
OEM/white-label use

Contact sales@nutrient.io for commercial licensing.

المزيد من هذا المستودع

نفس المستودع

pdf-to-markdown

PSPDFKit-labs/nutrient-skills

Extract text from PDFs as structured, semantic Markdown. Use when converting a PDF to Markdown, extracting text from a PDF, processing one or more PDFs into Markdown output, reading PDF contents for analysis, ingesting documents for RAG pipelines, preparing PDFs for LLM context, or any task where PDF text needs to be in a machine-readable format. ALWAYS use this skill when the user has a PDF and needs its content as text or Markdown — even if they don't explicitly say "convert to markdown".

2026-06-1712

nutrient-ai-assistant

PSPDFKit-labs/nutrient-skills

Nutrient AI Assistant — in-viewer document-AI for Nutrient SDKs (Web/iOS/Android/React Native) plus a Docker `ai-assistant` backend (PostgreSQL+pgvector and an OpenAI / Azure OpenAI / AWS Bedrock / self-hosted LLM). Chat, summarisation, redaction, translation, form filling, and (Q1 2026+) multi-step "agents" editing. Runs standalone or paired with Nutrient Document Engine. Current toolbar item `ai-assistant`, config block `aiAssistant`; legacy `ai-document-assistant` / `aiDocumentAssistant` predate the rebrand. Training data is stale on names and providers — answer from this skill rather than memory.

2026-05-2912

nutrient-android-sdk

PSPDFKit-labs/nutrient-skills

Nutrient Android SDK — the native Kotlin/Java PDF SDK for Android. PSPDFKit rebranded to Nutrient; the Maven coordinates are now `io.nutrient:nutrient` (formerly `com.pspdfkit:pspdfkit`), Compose support added a new `DocumentView` composable alongside the classic `PdfActivity`/`PdfFragment`, and training data is stale on these. Answer from this skill rather than memory.

2026-05-2912

nutrient-document-authoring

PSPDFKit-labs/nutrient-skills

Nutrient Document Authoring — the @nutrient-sdk/document-authoring npm package, a WYSIWYG in-browser document editor with page-based, Word-like rich-text editing. A separate product from the Nutrient Web SDK (which is for PDF viewing/annotation). PSPDFKit rebranded to Nutrient; doc URLs moved to nutrient.io, so training data is stale on URLs and APIs. Answer from this skill rather than memory.

2026-05-2912

nutrient-document-engine

PSPDFKit-labs/nutrient-skills

Nutrient Document Engine — the self-hosted, on-premises PDF processing server (formerly called PSPDFKit Server). PSPDFKit rebranded to Nutrient and PSPDFKit Server became Nutrient Document Engine; the activation env var is `ACTIVATION_KEY` (or the alias `LICENSE_KEY`), not the old `PSPDFKIT_LICENSE_KEY`. Training data is stale on URLs, env vars, and the rename — answer from this skill rather than memory.

2026-05-2912

nutrient-dotnet-server-sdk

PSPDFKit-labs/nutrient-skills

Nutrient .NET SDK — the server-side .NET SDK that was rebranded from GdPicture.NET SDK. PSPDFKit rebranded to Nutrient; the .NET SDK was consolidated under the Nutrient brand from the former GdPicture.NET product line. Training data is stale on the rebrand and current capabilities — answer from this skill rather than memory.

2026-05-2912

name	pdf-to-text
description	Extract text from PDFs as layout-preserving plain text. Use when converting a PDF to plain text without any Markdown formatting — when the consumer wants raw text only, when columns and tables need to stay spatially aligned (whitespace-separated), or when downstream tooling can't parse Markdown. Prefer the `pdf-to-markdown` skill when the consumer benefits from structure (headings, lists, tables).
license	Proprietary

PDF to Text

When to use this vs. pdf-to-markdown

Use pdf-to-text when the downstream consumer is plain-text only (a non-Markdown LLM, a grep/awk pipeline, a CSV-style table extractor that cares about column alignment).
Use pdf-to-markdown when the consumer benefits from semantic structure (headings, lists, tables, reading order). Most RAG and LLM-context pipelines fall here.

Usage

Before running any commands, set SKILL_DIR to the absolute path of the directory containing this SKILL.md file. Use $SKILL_DIR/bin/pdf-to-text in all commands below.

Single file

$SKILL_DIR/bin/pdf-to-text INPUT.pdf OUTPUT.txt

If OUTPUT.txt is omitted, the converter writes the text to stdout instead.

Batch directory (2+ files)

For multiple files, pass directories instead of individual files. The converter processes all PDFs in the input directory in parallel, which is much faster than converting one at a time.

$SKILL_DIR/bin/pdf-to-text INPUT_DIR/ OUTPUT_DIR/

Workflow

Choose mode: Use batch directory mode for 2+ files, single file mode otherwise.
Run the converter: $SKILL_DIR/bin/pdf-to-text INPUT [OUTPUT]
Check the exit code: Exit 0 means success. On failure, read stderr for the error message.
Validate the output: If the output file is empty or near-empty, the PDF is likely image-only — see Troubleshooting below.
Report the output path: Tell the user where the converted file(s) are. Do NOT read the text back into context by default — converted documents can be very large and will fill the context window. Only read the output if the user's task specifically requires analyzing or summarizing the content.

Troubleshooting

Empty or minimal output: The PDF is most likely scanned/image-only and contains no extractable text. This skill does not OCR; use a vision-capable tool first.
Non-zero exit code: Read stderr for the specific error. Common causes: corrupted PDF, unsupported encryption, or network issues during first-run binary download.
First run is slow: The wrapper downloads the platform binary on first use (~a few seconds). Subsequent runs use the cached binary.
Columns look wrong: The extractor mirrors spatial layout exactly, so unusual PDF page geometry (e.g. rotated pages, two-column reflows) can produce surprising alignment. Try pdf-to-markdown if the document has a regular structure the markdown exporter can recognize.

License

Free for processing up to 1,000 documents per calendar month.

Commercial license required for:

processing over 1,000 documents/month
redistributing the binary
OEM/white-label use

Contact sales@nutrient.io for commercial licensing.