원클릭으로
image-to-text
// Extract text from images using OCR. Use when the user shares a screenshot and you need to read the text content, copy UI labels, or extract copy from a design mockup.
// Extract text from images using OCR. Use when the user shares a screenshot and you need to read the text content, copy UI labels, or extract copy from a design mockup.
Multi-model agent orchestration using specialized agents for planning, coding, research, math/science, visual analysis, and adversarial review. Use when tasks are complex enough to benefit from different models' strengths, when you want adversarial review to catch blind spots, or when coordinating multi-step workflows across agent roles. Triggers on complex projects, multi-step tasks, architecture decisions, or when explicitly requested.
Web design reference for building production-grade interfaces. Covers layout, typography, color, spacing, shadows, animation, accessibility, responsive design, components, performance, and UX psychology. Use when building UI, reviewing design quality, choosing design tokens, or making any visual design decision.
Check color contrast ratios against WCAG AA and AAA accessibility standards. Use when the user wants to verify if their color combinations are accessible, check contrast between text and background colors, or audit a palette for accessibility.
Compare two images pixel-by-pixel and get a visual diff. Use when the user wants to compare their implementation against a design, spot differences between two screenshots, or verify visual regression.
Extract color palettes from images (screenshots, Figma exports, design mockups) to help implement matching UI. Use when the user shares a screenshot, design image, or asks to "match these colors", "extract colors from this image", "implement this design", or "get the color palette".
| name | image-to-text |
| description | Extract text from images using OCR. Use when the user shares a screenshot and you need to read the text content, copy UI labels, or extract copy from a design mockup. |
| metadata | {"author":"pascalorg","version":"1.0.0"} |
Extract all readable text from an image using OCR (Tesseract). Returns the full text content along with word-level bounding boxes and confidence scores.
bash <skill-path>/scripts/image-to-text.sh <image-path> [language]
Arguments:
image-path — Path to the image file (required)language — OCR language code (optional, defaults to eng). Common: eng, fra, deu, spa, chi_sim, jpnExamples:
# Extract text from a screenshot
bash <skill-path>/scripts/image-to-text.sh ./screenshot.png
# Extract French text
bash <skill-path>/scripts/image-to-text.sh ./mockup.png fra
{
"text": "Request work\nSuggestions\nPlumbing\nHVAC\nCleaning\nElectrical",
"confidence": 87.4,
"words": [
{
"text": "Request",
"confidence": 94.2,
"bbox": { "x0": 142, "y0": 180, "x1": 268, "y1": 204 }
},
{
"text": "work",
"confidence": 96.1,
"bbox": { "x0": 274, "y0": 180, "x1": 332, "y1": 204 }
}
],
"lines": [
{
"text": "Request work",
"confidence": 95.1,
"bbox": { "x0": 142, "y0": 180, "x1": 332, "y1": 204 }
}
]
}
| Field | Type | Description |
|---|---|---|
| text | String | Full extracted text, newline-separated |
| confidence | Number | Overall confidence score (0-100) |
| words | Array | Each word with text, confidence, and bounding box |
| lines | Array | Each line with text, confidence, and bounding box |
After extracting text, present the content grouped by lines:
Extracted text (87.4% confidence):
Request work
Suggestions
Plumbing
HVAC
Cleaning
Electrical
Found 6 lines, 6 words.
Use the extracted text directly when implementing UI copy from a design.
Low confidence / garbled text — Tesseract works best with clean, high-contrast text. Screenshots of rendered UI work well. Photos of text at angles or with noise may produce poor results.
Wrong language — Pass the correct language code as the second argument. Tesseract needs the right language model to recognize characters.
First run is slow — Tesseract downloads language data (~4MB for English) on the first run. Subsequent runs are faster.