Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

ocr

Automatically extract text from images using Tesseract when the model doesn't support images. This is a pi extension that hooks into before_agent_start and runs OCR transparently. No manual triggering needed. Only activates on vision-free models. Do NOT trigger manually — the extension handles it automatically.

Exécuter dans Manus

Aperçu

Commande d'installation

npx skills add https://github.com/leandronsp/dotfiles --skill ocr

Copiez et collez cette commande dans Claude Code pour installer le skill

Source

leandronsp/dotfiles

Étoiles32

Forks0

Mis à jour26 avril 2026 à 02:00

SKILL.md

readonly

name	ocr
description	Automatically extract text from images using Tesseract when the model doesn't support images. This is a pi extension that hooks into before_agent_start and runs OCR transparently. No manual triggering needed. Only activates on vision-free models. Do NOT trigger manually — the extension handles it automatically.

OCR Extension (Auto-Vision-Fallback)

This runs automatically. No manual triggering needed.

A pi extension at ~/.pi/agent/extensions/ocr/ that hooks into before_agent_start. When the active model doesn't support images but the user attaches one, it automatically:

Runs Tesseract OCR on each image
Cleans up the output (noise removal, bracket fixes)
Injects the extracted text as a context message into the conversation

How it works

User pastes image
  └─► before_agent_start hook
        ├── Model supports images? → Do nothing. Model sees it natively.
        └── Model text-only? → Auto-run Tesseract
              ├── PSM 3 (default)
              ├── PSM 6 (if short output — for terminal/code)
              ├── Enhancement pipeline (if still short — scale 3x, contrast, sharpen)
              └── Inject OCR text as context message

No manual action needed

The extension is auto-discovered from ~/.pi/agent/extensions/ocr/index.ts. It runs on every prompt that includes images. If the model supports images (Claude, GPT-4o, Gemini), it does nothing. If the model is text-only (GLM, DeepSeek text modes), it automatically extracts and injects text.

Limitations

Image type	Tesseract quality	What happens
Screenshots, terminal, docs	✅ Good	Text injected, LLM can work with it
Landing pages, UI	⚠️ Gets gist, noisy	Partial text injected with cleanup
Dense math/code	⚠️ Structure ok, details mangled	Approximate text, may have errors
Drawings, diagrams, photos	❌ Empty output	Message says "no text detected, use vision model"
Handwriting	❌ Unreliable	Likely garbage text or empty

Manual fallback

If the auto-OCR isn't enough, you can still run Tesseract manually:

tesseract <image_path> stdout 2>/dev/null
tesseract <image_path> stdout --psm 6 2>/dev/null  # terminal/code

Installation

Requires Tesseract and Pillow:

brew install tesseract
pip3 install pillow

Plus depuis ce dépôt

même dépôt

finance-ask

leandronsp/dotfiles

Answer questions about personal finances by searching only the ~/vault/finance/ section of the Obsidian vault. Use when the user asks "quanto gastei", "como estou em [mês]", "qual minha despesa fixa", "posso gastar X", "forecast", "carteira", "minha situação financeira", or any question about their money, bills, cards, or investments.

2026-04-2132

finance-update

leandronsp/dotfiles

Process credit card bills, bank statements, or screenshots, extract expenses, update current month snapshot and 6/12-month forecasts in the Obsidian finance vault. Use when the user shares faturas, extratos, prints, or asks to "atualizar finanças", "processar contas", "atualiza fatura", "roda forecast", "atualiza mês". Writes ONLY to ~/vault/finance/.

2026-04-2132

dev

leandronsp/dotfiles

Senior engineer. Scouts the codebase, clarifies requirements, proposes test cases, then implements with strict TDD in 3 modes (agent-pair, solo, pair-with-me). Accepts a prompt, issue URL, PRD file, or no args. Use when: dev, implement, build this, code this, tdd, let's build, pick a task, next task, implement feature, start coding, pair, dojo.

2026-04-1632

bugfix

leandronsp/dotfiles

Bug hunter. Reproduces bugs with failing tests (RED), then fixes with TDD. Accepts a prompt, issue URL, or bug description. Simpler than /dev, focused solely on fixing bugs. Use when: bugfix, fix bug, debug, broken, regression, failing, doesn't work, something's wrong, fix this.

2026-04-1632

note

leandronsp/dotfiles

Capture an insight, idea, or note into the Obsidian vault. Use when the user wants to save something for later, jot down an idea, record a TIL, start a blog draft, or log anything to their second brain. Trigger on phrases like "save this", "note this down", "I had an idea", "TIL", "remember this", "add to vault".

2026-04-1632

pagespeed

leandronsp/dotfiles

Run PageSpeed audit against a URL. Requires a URL argument. Use when the user says "pagespeed", "check performance", "audit the blog", "run lighthouse", "test page speed", or wants to catch performance issues.

2026-04-1632

Source

leandronsp

leandronsp/dotfiles

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Commande d'installation

Téléchargement

Exécuter dans Manus

Utile pourSOC

Développeurs de logicielsProfessions informatiques et mathématiques15-1252L4

name	ocr
description	Automatically extract text from images using Tesseract when the model doesn't support images. This is a pi extension that hooks into before_agent_start and runs OCR transparently. No manual triggering needed. Only activates on vision-free models. Do NOT trigger manually — the extension handles it automatically.

OCR Extension (Auto-Vision-Fallback)

This runs automatically. No manual triggering needed.

A pi extension at ~/.pi/agent/extensions/ocr/ that hooks into before_agent_start. When the active model doesn't support images but the user attaches one, it automatically:

Runs Tesseract OCR on each image
Cleans up the output (noise removal, bracket fixes)
Injects the extracted text as a context message into the conversation

How it works

User pastes image
  └─► before_agent_start hook
        ├── Model supports images? → Do nothing. Model sees it natively.
        └── Model text-only? → Auto-run Tesseract
              ├── PSM 3 (default)
              ├── PSM 6 (if short output — for terminal/code)
              ├── Enhancement pipeline (if still short — scale 3x, contrast, sharpen)
              └── Inject OCR text as context message

No manual action needed

Limitations

Image type	Tesseract quality	What happens
Screenshots, terminal, docs	✅ Good	Text injected, LLM can work with it
Landing pages, UI	⚠️ Gets gist, noisy	Partial text injected with cleanup
Dense math/code	⚠️ Structure ok, details mangled	Approximate text, may have errors
Drawings, diagrams, photos	❌ Empty output	Message says "no text detected, use vision model"
Handwriting	❌ Unreliable	Likely garbage text or empty

Manual fallback

If the auto-OCR isn't enough, you can still run Tesseract manually:

tesseract <image_path> stdout 2>/dev/null
tesseract <image_path> stdout --psm 6 2>/dev/null  # terminal/code

Installation

Requires Tesseract and Pillow:

brew install tesseract
pip3 install pillow