mit einem Klick
markdrop
// Professional AI skill and usage instructions for the Markdrop package, a Python tool for converting PDFs to Markdown/HTML with AI-powered image/table descriptions.
// Professional AI skill and usage instructions for the Markdrop package, a Python tool for converting PDFs to Markdown/HTML with AI-powered image/table descriptions.
| name | markdrop |
| description | Professional AI skill and usage instructions for the Markdrop package, a Python tool for converting PDFs to Markdown/HTML with AI-powered image/table descriptions. |
Welcome to the markdrop skill. markdrop is a powerful Python package and CLI tool used to convert PDF documents into structured Markdown and interactive HTML, while natively leveraging AI vision models to interpret and describe extracted images and tables.
If you are an AI agent or a user aiming to process PDFs and augment them with text or image descriptions, this document serves as your complete guide on utilizing markdrop efficiently and accurately.
Before using AI features, API keys must be available in the root .env file or environment variables.
If deploying programmatically, you can run the built-in CLI command, or inject them into os.environ:
markdrop setup gemini # -> GEMINI_API_KEY
markdrop setup openai # -> OPENAI_API_KEY
markdrop setup anthropic # -> ANTHROPIC_API_KEY
markdrop setup groq # -> GROQ_API_KEY
markdrop setup openrouter # -> OPENROUTER_API_KEY
markdrop setup litellm # -> LITELLM_API_KEY
The Python API is the recommended way to embed markdrop into applications.
Use markdrop function combined with add_downloadable_tables:
from markdrop import markdrop, MarkDropConfig, add_downloadable_tables
from pathlib import Path
import logging
# Configuration block
config = MarkDropConfig(
image_resolution_scale=2.0,
download_button_color='#444444',
log_level=logging.INFO,
log_dir='logs',
excel_dir='markdrop-excel-tables',
)
# 1. Convert PDF to base HTML (and markdown locally). URL supported here too: "https://url.to/pdf"
html_path = markdrop("path/to/document.pdf", "output_directory", config)
# 2. Enrich HTML to allow downloading tables as Excel sheets
enhanced_html_path = add_downloadable_tables(html_path, config)
If you have a Markdown file containing image/table links, process_markdown automatically routes vision requests to the chosen provider and inserts contextual descriptions.
from markdrop import process_markdown, ProcessorConfig, AIProvider
config = ProcessorConfig(
input_path="output_directory/document.md",
output_dir="output_directory",
ai_provider=AIProvider.GEMINI, # Available: GEMINI, OPENAI, ANTHROPIC, GROQ, OPENROUTER, LITELLM
# Target configurations
remove_images=False,
remove_tables=False,
table_descriptions=True,
image_descriptions=True,
# Provider-Specific overrides (Optional)
# Allows granular decoupling of vision parsing vs table text-parsing
model_name_override="gemini-2.0-flash", # Primary vision analysis model
text_model_name_override="gemini-2.0-flash" # Lean text-only model for generic parsing
)
# Executes AI processing and saves the enriched document
output_path = process_markdown(config)
For standalone image directories or files:
from markdrop import generate_descriptions
generate_descriptions(
input_path='images_folder/',
output_dir='descriptions_output/',
prompt='Analyze this image and describe all textual and structural elements.',
llm_client=['gemini', 'openai'],
)
As an agent, you can also trigger markdrop workflows via Bash.
Convert PDF to MD/HTML (including tables):
markdrop convert <input_path_or_url> --output_dir <dir> --add_tables
Run AI Provider over the Markdown Output with exact models:
markdrop describe <markdown_file> \
--ai_provider anthropic \
--model claude-opus-4-6 \
--text-model claude-sonnet-4-5 \
--remove_images
Only Analyze / Extract Images:
# Also accepts URLs directly
markdrop analyze https://domain.com/report.pdf --output_dir pdf_analysis --save_images
Batch Image Description:
markdrop generate images/ --output_dir descriptions/ \
--prompt "Describe in detail." \
--llm_client gemini openai
gemini (Gemini 2.0 Flash) is frequently the fastest and cheapest for large scale document evaluation.anthropic with the latest Claude models (claude-opus-4-6 or claude-sonnet-4-5) excel in reasoning and formatting.groq using LLaMA models.Whenever instantiating ProcessorConfig, be exact about paths—use absolute paths if the current working directory is dynamically changing.