Run any Skill in Manus with one click

mistral-pdf-to-markdown

Convert PDFs to Markdown using Mistral OCR API with image extraction. Use when you need to extract structured text and images from PDFs, especially for scanned documents or documents with complex formatting. Outputs Markdown with embedded images.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/FuZhiyu/AgentContract --skill mistral-pdf-to-markdown

Copy and paste this command into Claude Code to install the skill

Source

FuZhiyu/AgentContract

Stars0

Forks0

UpdatedApril 7, 2026 at 18:50

File Explorer

4 files

SKILL.md

readonly

name	mistral-pdf-to-markdown
description	Convert PDFs to Markdown using Mistral OCR API with image extraction. Use when you need to extract structured text and images from PDFs, especially for scanned documents or documents with complex formatting. Outputs Markdown with embedded images.
user-invocable	true

Mistral PDF to Markdown Converter

Convert PDF documents to Markdown format using Mistral's OCR API. Automatically extracts text, formatting, and images.

When to Use

Converting research papers or documents to Markdown
Extracting text from scanned PDFs (OCR capability)
Preserving document structure with headers and formatting
Extracting embedded images from PDFs

Quick Start

Use the conversion script from this skill's directory:

# Convert entire PDF
uv run python <skill-dir>/scripts/convert_pdf_to_markdown.py input.pdf output.md

# Convert specific pages
uv run python <skill-dir>/scripts/convert_pdf_to_markdown.py input.pdf output.md --pages "1-5"
uv run python <skill-dir>/scripts/convert_pdf_to_markdown.py input.pdf output.md --pages "1,3,5"

Replace <skill-dir> with the directory that contains this SKILL.md.

Output Structure

Output/PDFConversions/
├── document.md          # Markdown with text and image references
└── images/
    ├── img-0.jpeg      # Extracted images
    ├── img-1.jpeg
    └── ...

Usage in Code

from pathlib import Path
import subprocess

# Run conversion script
result = subprocess.run([
    "uv", "run", "python",
    "<skill-dir>/scripts/convert_pdf_to_markdown.py",
    "input.pdf",
    "Output/PDFConversions/output.md",
    "--pages", "1-10"
], capture_output=True, text=True)

print(result.stdout)

Key Features

Markdown formatting: Preserves headers, lists, and structure
Image extraction: Saves images to images/ subfolder automatically
Page selection: Extract specific pages or ranges
Scanned PDF support: True OCR capability for image-based PDFs
Relative paths: Image references use ![...](images/img-X.jpeg)

Requirements

The script requires:

Mistral API key (see API Key Setup below)
Python packages: mistralai, python-dotenv, pypdf

API Key Setup

The script checks these locations in order (first match wins):

Environment variable MISTRAL_API_KEY — recommended for personal use (e.g., add export MISTRAL_API_KEY=your-key to secrets.sh)
Shared config — .claude/agent-contract.yaml or ~/.config/agent-contract/config.yaml under paper-reader.mistral_api_key
Notes/.env — add MISTRAL_API_KEY=your-key. This file is gitignored but Dropbox-synced, making it convenient for teams sharing a project folder

Never commit API keys to git. Use environment variables or Dropbox-synced Notes/.env instead.

Common Use Cases

Convert Research Paper

uv run python <skill-dir>/scripts/convert_pdf_to_markdown.py \
  "Data/papers/research.pdf" \
  "Notes/Paper Markdown/research.md"

Extract Specific Sections

# Extract pages 10-20 (introduction and methods)
uv run python <skill-dir>/scripts/convert_pdf_to_markdown.py \
  "paper.pdf" \
  "Notes/Paper Markdown/intro_methods.md" \
  --pages "10-20"

Extract Figures Only

# Extract pages with figures
uv run python <skill-dir>/scripts/convert_pdf_to_markdown.py \
  "paper.pdf" \
  "Notes/Paper Markdown/figures.md" \
  --pages "25,27,30,35"

Error Handling

API Key Not Found:

Error: Mistral API key not found

→ See API Key Setup above for three ways to configure it

Page Out of Range:

Warning: Page 100 out of range, skipping

→ Check PDF page count and adjust page selection

API Rate Limit: → Wait a moment and retry, or reduce page count per request

Notes

Images are saved as JPEG files in images/ subfolder
Markdown image references are automatically updated to images/img-X.jpeg
Large PDFs may take longer to process due to API limits
For simple text extraction without OCR, consider using the pdf skill instead
Scanned PDFs benefit most from this skill's OCR capability

Guide for rigorous economic data analysis. Use PROACTIVELY whenever performing data analysis on economic or financial datasets — importing, cleaning, merging, constructing variables, or producing summary statistics. Three core principles: (1) describe before and after every transformation, (2) document in jupytext percent format with interleaved code/narrative/outputs, (3) validate against economic intuition, literature, and cross-variable relationships. Includes pitfall checklists for merges, aggregations, filtering, and variable construction. Language-agnostic (Python, Julia). Trigger: any data analysis task involving economic, financial, or panel data.

2026-04-080

wrds-data

FuZhiyu/AgentContract

Search and download financial data from WRDS (Wharton Research Data Services). Use when asked to "download from WRDS", "search WRDS", "get CRSP data", "download Compustat", "find WRDS table", "get stock returns", "download IBES", or any WRDS/financial database task involving CRSP, Compustat, IBES, TAQ, OptionMetrics, Fama-French, BoardEx, DealScan, or other WRDS datasets.

2026-04-080

draft-review

FuZhiyu/AgentContract

Comprehensive academic paper review covering mathematical correctness, writing clarity, consistency, argumentation, proofreading, and citations. Use when user asks to 'review draft', 'check paper', 'proofread manuscript', or requests feedback on academic writing. Can also verify code-paper consistency when source code is available. Defaults to comprehensive + standard review, with optional deep parallel review when Codex multi-agent support is available.

2026-04-070

research-project-template

FuZhiyu/AgentContract

Create new academic research projects with two-folder architecture. Use when user wants to create a new research project, start a new paper, set up a new analysis project, or mentions needing a project structure for research.

2026-04-070

work-journal

FuZhiyu/AgentContract

Create formal, fact-checked work journal entries after completing analysis work. Use when user asks to "summarize work", "document results", or "create work journal entry". Ensures code is committed, copies figures to attachments, and creates objective summaries with mandatory citations plus a report-quality verification pass. For quick reports without fact-checking, use the `report-in-markdown` skill.

2026-04-070

worktree-data-sync

FuZhiyu/AgentContract

Sync non-git data between existing git worktrees. Supports seed, diff, and apply modes using explicit --from/--to endpoints. Does not create/remove worktrees or manage sandbox settings.

2026-04-070

Source

FuZhiyu

FuZhiyu/AgentContract

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	mistral-pdf-to-markdown
description	Convert PDFs to Markdown using Mistral OCR API with image extraction. Use when you need to extract structured text and images from PDFs, especially for scanned documents or documents with complex formatting. Outputs Markdown with embedded images.
user-invocable	true

Mistral PDF to Markdown Converter

Convert PDF documents to Markdown format using Mistral's OCR API. Automatically extracts text, formatting, and images.

When to Use

Converting research papers or documents to Markdown
Extracting text from scanned PDFs (OCR capability)
Preserving document structure with headers and formatting
Extracting embedded images from PDFs

Quick Start

Use the conversion script from this skill's directory:

# Convert entire PDF
uv run python <skill-dir>/scripts/convert_pdf_to_markdown.py input.pdf output.md

# Convert specific pages
uv run python <skill-dir>/scripts/convert_pdf_to_markdown.py input.pdf output.md --pages "1-5"
uv run python <skill-dir>/scripts/convert_pdf_to_markdown.py input.pdf output.md --pages "1,3,5"

Replace <skill-dir> with the directory that contains this SKILL.md.

Output Structure

Output/PDFConversions/
├── document.md          # Markdown with text and image references
└── images/
    ├── img-0.jpeg      # Extracted images
    ├── img-1.jpeg
    └── ...

Usage in Code

from pathlib import Path
import subprocess

# Run conversion script
result = subprocess.run([
    "uv", "run", "python",
    "<skill-dir>/scripts/convert_pdf_to_markdown.py",
    "input.pdf",
    "Output/PDFConversions/output.md",
    "--pages", "1-10"
], capture_output=True, text=True)

print(result.stdout)

Key Features

Markdown formatting: Preserves headers, lists, and structure
Image extraction: Saves images to images/ subfolder automatically
Page selection: Extract specific pages or ranges
Scanned PDF support: True OCR capability for image-based PDFs
Relative paths: Image references use ![...](images/img-X.jpeg)

Requirements

The script requires:

Mistral API key (see API Key Setup below)
Python packages: mistralai, python-dotenv, pypdf

API Key Setup

The script checks these locations in order (first match wins):

Environment variable MISTRAL_API_KEY — recommended for personal use (e.g., add export MISTRAL_API_KEY=your-key to secrets.sh)
Shared config — .claude/agent-contract.yaml or ~/.config/agent-contract/config.yaml under paper-reader.mistral_api_key
Notes/.env — add MISTRAL_API_KEY=your-key. This file is gitignored but Dropbox-synced, making it convenient for teams sharing a project folder

Never commit API keys to git. Use environment variables or Dropbox-synced Notes/.env instead.

Common Use Cases

Convert Research Paper

uv run python <skill-dir>/scripts/convert_pdf_to_markdown.py \
  "Data/papers/research.pdf" \
  "Notes/Paper Markdown/research.md"

Extract Specific Sections

# Extract pages 10-20 (introduction and methods)
uv run python <skill-dir>/scripts/convert_pdf_to_markdown.py \
  "paper.pdf" \
  "Notes/Paper Markdown/intro_methods.md" \
  --pages "10-20"

Extract Figures Only

# Extract pages with figures
uv run python <skill-dir>/scripts/convert_pdf_to_markdown.py \
  "paper.pdf" \
  "Notes/Paper Markdown/figures.md" \
  --pages "25,27,30,35"

Error Handling

API Key Not Found:

Error: Mistral API key not found

→ See API Key Setup above for three ways to configure it

Page Out of Range:

Warning: Page 100 out of range, skipping

→ Check PDF page count and adjust page selection

API Rate Limit: → Wait a moment and retry, or reduce page count per request

Notes

Images are saved as JPEG files in images/ subfolder
Markdown image references are automatically updated to images/img-X.jpeg
Large PDFs may take longer to process due to API limits
For simple text extraction without OCR, consider using the pdf skill instead
Scanned PDFs benefit most from this skill's OCR capability

mistral-pdf-to-markdown

Mistral PDF to Markdown Converter

When to Use

Quick Start

Output Structure

Usage in Code

Key Features

Requirements

API Key Setup

Common Use Cases

Convert Research Paper

Extract Specific Sections

Extract Figures Only

Error Handling

Notes

See Also

Mistral PDF to Markdown Converter

When to Use

Quick Start

Output Structure

Usage in Code

Key Features

Requirements

API Key Setup

Common Use Cases

Convert Research Paper

Extract Specific Sections

Extract Figures Only

Error Handling

Notes

See Also

mistral-pdf-to-markdown

Mistral PDF to Markdown Converter

When to Use

Quick Start

Output Structure

Usage in Code

Key Features

Requirements

API Key Setup

Common Use Cases

Convert Research Paper

Extract Specific Sections

Extract Figures Only

Error Handling

Notes

See Also

More from this repository

More from this repository

Mistral PDF to Markdown Converter

When to Use

Quick Start

Output Structure

Usage in Code

Key Features

Requirements

API Key Setup

Common Use Cases

Convert Research Paper

Extract Specific Sections

Extract Figures Only

Error Handling

Notes

See Also