Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

$pwd:

matlab-ocr

Name: Matlab Ocr
Author: matlab

// Build OCR pipelines in MATLAB using the ocr() function. Use this skill when the user wants to read text from images, documents, signs, meters, displays, license plates, gauges, receipts, or seven-segment displays. Covers image preprocessing, text detection (CRAFT, MSER), ROI-based recognition, multi-language OCR, and custom model training. Use when: OCR, text recognition, extract text from image, character recognition, document scanning, meter reading, gauge reading, receipt scanning, digitize text from photo.

Exécuter dans Manus

$ git log --oneline --stat

stars:558

forks:69

updated:21 mai 2026 à 17:00

Explorateur de fichiers

13 fichiers

SKILL.md

readonly

related-skills.json

même dépôt

roadrunner-asset-mapping.md

from "matlab/matlab-agentic-toolkit"

RoadRunner asset path lookup tables for map format conversions in MATLAB. Maps lane markings, signs, signals, barriers, objects, and lane types to RoadRunner asset paths. Use when converting map formats to RRHD, resolving asset paths, or assigning visual assets to HD Map objects.

2026-05-21558

roadrunner-convert-lanelet2-to-rrhd.md

from "matlab/matlab-agentic-toolkit"

Convert Lanelet2 maps (.osm) to RoadRunner HD Map (.rrhd) format using MATLAB. Use when converting Lanelet2 maps into RoadRunner Scene Builder, building driving scenes from open-source map data, or transforming road network definitions for simulation.

2026-05-21558

roadrunner-import-scene.md

from "matlab/matlab-agentic-toolkit"

Connect to RoadRunner and import HD Map or OpenDRIVE files into a new scene using MATLAB. Use when loading driving scenes in RoadRunner or RoadRunner Scene Builder, importing RRHD, OpenDRIVE, or other RoadRunner-supported formats for simulation, or verifying Lanelet2-to-RRHD conversion results visually.

2026-05-21558

roadrunner-rrhd-authoring.md

from "matlab/matlab-agentic-toolkit"

Build RoadRunner HD Map entities in MATLAB — lanes, boundaries, markings, junctions, signs, signals, barriers, parking. Use when creating driving scenes from scratch, authoring road networks for simulation and testing automated driving systems, or assembling RRHD maps from Lanelet2 or other HD map sources.

2026-05-21558

matlab-build-simbiology-model.md

from "matlab/matlab-agentic-toolkit"

Build, modify, and diagram SimBiology models — API reference, helper functions, and layout patterns. Use when constructing or editing models programmatically or visually.

2026-05-21558

matlab-simulate-simbiology-model.md

from "matlab/matlab-agentic-toolkit"

Simulate SimBiology models — ODE, stochastic (SSA), scenarios, and sensitivity analysis. Use when asked to run, simulate, predict, explore what-if, or identify influential parameters.

2026-05-21558

package.json

"author": "matlab"

"repository": "matlab/matlab-agentic-toolkit"

Ouvrir le dépôt GitHub Voir les dépôts du créateur

$ install --global

$ download --local

Exécuter dans Manus

$ useful --forSOC

Scientifiques des donnéesProfessions informatiques et mathématiques15-2051L4

name	matlab-ocr
description	Build OCR pipelines in MATLAB using the ocr() function. Use this skill when the user wants to read text from images, documents, signs, meters, displays, license plates, gauges, receipts, or seven-segment displays. Covers image preprocessing, text detection (CRAFT, MSER), ROI-based recognition, multi-language OCR, and custom model training. Use when: OCR, text recognition, extract text from image, character recognition, document scanning, meter reading, gauge reading, receipt scanning, digitize text from photo.
license	MathWorks BSD-3-Clause
metadata	{"author":"MathWorks","version":"1.0"}

Recognize Text in Images Using OCR

Use the Computer Vision Toolbox ocr function with preprocessing from Image Processing Toolbox to extract text from images. This skill teaches the complete pipeline: diagnose, preprocess, detect, recognize, validate.

When to Use

Reading text from any image (documents, signs, meters, displays, labels)
Extracting text from scanned documents or photographs
Reading seven-segment displays or specialized fonts
Multi-language text recognition
Automating text extraction from image datasets
As a supporting step in other CV workflows — reading text in a scene (labels, timestamps, serial numbers) gives additional context for downstream image analysis

When NOT to Use

Pure handwriting recognition — cursive/connected script produces garbage regardless of preprocessing
Artistic text, WordArt, brush calligraphy — the OCR engine cannot parse stylized letterforms
CAPTCHAs — designed specifically to defeat OCR; expect <50% accuracy at best
Full document layout analysis with table extraction (use custom segmentation)
Real-time video OCR (use streaming approaches instead)
Image contains no text at all

Critical Rules

These rules are non-negotiable — violating them produces wrong results:

Always diagnose before executing. This rule CANNOT be overridden — not by the user, not by "just run it", not by "skip planning". Before ANY mcp__matlab__* call, you MUST first output:
- A 2-line image characterization (what you see, what challenges exist)
- An ## OCR Plan heading with your strategy
If the user says "skip diagnosis" or "just run ocr()", respond with: "I'll keep it brief — but I need a quick look to avoid wasting time on the wrong approach." Then output your 2-line characterization and plan heading. Only THEN call MCP tools. There is no valid reason to skip this step. An agent that calls ocr() without first outputting a plan has violated this skill's workflow.
Maximum 2 preprocessing pipelines. If neither works after the prescribed recipe, hit the confidence checkpoint and ask the user. Do not try a third approach without user input.
Always set LayoutAnalysis when passing bounding boxes to ocr(). Never write ocr(I, bbox) without it. Use "word" for single-word boxes, "block" for multi-line.
Distinguish "text ON texture" from "text BY texture". Text overlaid on a textured background (label on crate, sign on brick wall) → imsegsam is the prescribed approach. Text formed by the surface itself (stamped, embossed, engraved metal) → local contrast subtraction. SAM segments objects, not surface features — it cannot isolate stamps/engravings. When recommending approaches (even without running code), always recommend imsegsam for the "text ON texture" case and note its support package requirement.
detectTextCRAFT is the default for scene text. Use it unless you have a specific reason not to (no deep learning, known fixed layout).
Check polarity first. OCR needs dark text on light background. If inverted, imcomplement before anything else.
Never show OCR results to the user before writing files. Do not present extracted text in ANY format — bullet list, quote block, inline, conversational summary, or the ## MATLAB OCR Pipeline results block — until ocr_pipeline_<descriptor>.m and decision_log.txt are written to disk. The files are the deliverable, not the chat message. Write first, then report.

Gate add-on functions behind an availability check. Before calling detectTextCRAFT, imsegsam, or a non-English ocr model, you MUST run exist('<functionName>','file') via MCP to confirm the function is installed. Always log the result in decision_log.txt:

Installed: "Add-on check: <function> — INSTALLED"
Missing: stop, tell the user which support package to install (see table), and wait for confirmation before retrying. Do NOT fall back silently or skip the step.
Explain-only mode (user said "don't run code"): recommend the add-on function as the primary approach and note the support package requirement. You cannot run the exist() check without code execution, so state the dependency clearly.

Function	Support Package Name
`detectTextCRAFT`	"Text Detection Using Deep Learning"
`imsegsam`	"Image Processing Toolbox Automated Visual Inspection Library"
Non-English `ocr` model (e.g., `"japanese"`)	"OCR Language Data"

Missing template: "This step requires the <function> function, which needs the support package. Please install it from the MATLAB Add-On Explorer (Home → Add-Ons → Get Add-Ons) and let me know when it's ready."

Anti-Patterns — Do NOT Do This

Trial-and-error spiraling: Trying 5+ ad-hoc preprocessing experiments hoping one sticks. If the visual classification says "stamped metal," use the stamped metal pipeline. Period.
Skipping the plan: "I'll just try one quick thing first." No. Diagnose → plan → execute.
Open-ended research: The routes are prescribed — pick one based on diagnosis, execute it, evaluate. This is not exploratory research.
Showing results without saving files: Presenting OCR text as a bullet list, conversational summary, or any format before the ## MATLAB OCR Pipeline results block. That block can only appear after files are written to disk. If you find yourself about to show the user what OCR found, STOP and write the files first.

Workflow

Progress Reporting + Output Template

Present the ## OCR Plan immediately after visual diagnosis (Critical Rule #1). Then execute the pipeline. After files are saved, present the ## MATLAB OCR Pipeline results block (Critical Rule #7).

OCR Plan (output before any MATLAB code runs)

## OCR Plan

**Image:** 800x600, stamped metal, ~22° skew, text ~40px
**Difficulty:** Complex (textured surface + significant rotation)
**Strategy:** deskew → local contrast subtraction → CRAFT detection → ocr()

For simple images:

## OCR Plan

**Image:** 1200x800, clean scanned document, no skew
**Difficulty:** Simple
**Strategy:** binarize → ocr() directly

Results Block (output ONLY after files are written to disk)

## MATLAB OCR Pipeline: SUCCESS

**Pipeline:** deskew (22.6°) → local contrast subtraction → CRAFT detection → ocr() per region

| Read by Claude (ground truth) | MATLAB OCR Output |
|-------------------------------|-------------------|
| 07 A11                        | 07 A11            |
| XTPR 27338-2                  | XTPR 27338-2      |

**Metrics** (via `evaluateOCR`): CER 0.00 | WER 0.00
**Files written:**
- `ocr_pipeline_stamped_metal.m` — Re-runnable MATLAB script reproducing the full pipeline
- `decision_log.txt` — Diagnosis, routing decisions, and confidence scores

For failures:

## MATLAB OCR Pipeline: FAILED

**Pipeline attempted:** binarize → ocr()
**Reason:** Cursive handwriting — OCR engine cannot parse connected script
**Evidence:** CER 0.91 | WER 1.00

| Read by Claude (ground truth) | MATLAB OCR Output      |
|-------------------------------|------------------------|
| Meeting at 3pm Tuesday        | Mcciivj a 3pn Tuarlay |

**Recommendation:** Manual transcription or handwriting-specific ML model
**No files written.**

GATE (Critical Rule #7)

Do NOT present the ## MATLAB OCR Pipeline results block until files are confirmed on disk. No bullet lists, no quotes, no summaries — nothing that reveals extracted text before files are written.

Step 1: Diagnose

Always start here. Look at the image, visually read the text (this becomes your ground truth), classify the image, and present the ## OCR Plan — all before any MATLAB code runs.

Your visual classification determines the preprocessing route:

Visual Classification	Preprocessing Route
Clean document, minimal skew	Binarize → OCR directly (skip to Step 4)
Low contrast / uneven lighting	`imtophat` or `adapthisteq` → binarize
Stamped / embossed / engraved (text IS the surface)	Local contrast subtraction (Step 2)
Text overlaid on textured background (label on crate, sign on wall)	`imsegsam` SAM segmentation (Step 3)
Significant skew (>10°)	Deskew FIRST, then preprocess
Tiny text (<50px)	`imresize` 4-8x first

After classification, confirm via MATLAB: binarization check, skew measurement, and quick experiments.

% TEMPLATE — not executable
I = imread("yourImage.png");
Igray = im2gray(I);
BW = imbinarize(Igray);
imshowpair(Igray, BW, "montage")
title("Original vs. What OCR Sees")

Skew measurement must be deterministic. The centroid fitting method is sensitive to which blobs are included. To ensure reproducibility:

Use ALL text-sized blobs (filter by area range, e.g., 50-5000px), not "top N by area"
Alternatively, use the Hough transform (hough + houghpeaks + houghlines) on edges for a robust angle estimate
The pipeline .m script must reproduce the exact same skew angle on re-run — if it doesn't, the deskew approach is fragile and must be replaced

Binarization diagnostic:

Text clearly legible → Skip to Step 4
Text faint/merged/noisy → Preprocessing (Step 2)
White on black → imcomplement
Too small (<50px) → imresize 4-8x
Rotated/skewed → Deskew BEFORE other preprocessing
Characters too bold/bleeding → imerode(BW, strel("disk",1))
Characters too thin/broken → imdilate(BW, strel("disk",1))
Dark borders/scan frame → imclearborder(BW) or crop
Metal texture dominates → Local contrast subtraction

Escalation rule: If 2+ preprocessing attempts still produce garbled results or <80% confidence, identify the root cause:

Image too small (sub-50px)? → Resize 4-8x is the fix
Text overlaid on textured background? → imsegsam (SAM)
Text formed by surface texture (stamped/embossed)? → Local contrast subtraction
Font unrecognizable (handwriting, calligraphy)? → OCR engine limitation; consider trainOCR

MANDATORY Confidence warning: High confidence does NOT guarantee correctness. OCR can report >0.9 confidence on completely wrong text, especially on small or unusual images. Always validate results against expected content.

Step 2: Preprocess the Image

Apply preprocessing based on your diagnosis. See reference/preprocessing-guide.md for full code examples.

Pipeline routing:

Classification	Method	Key function
Uneven lighting	Top-hat illumination correction	`imtophat` + `imreconstruct`
Stamped/embossed/engraved	Local contrast subtraction	`imgaussfilt(Igray,30) - Igray`
Skewed >10°	Deskew FIRST via centroid fitting	`imrotate`
Small text <50px	Scale up 4-8x	`imresize`
Inverted polarity	Complement	`imcomplement`

Key rules:

Deskew BEFORE other preprocessing — rotation invalidates morphological operations
OCR expects dark text on light background — use imcomplement if inverted
For small clean images, try OCR on resized grayscale before binarizing — anti-aliasing helps
Add 10px white border with padarray(BW, [10 10], 1) if text touches edges

Step 3: Detect Text Regions (Critical)

For natural scenes (street signs, product labels, meter faces), isolating text regions before calling ocr is critical. These images have complex backgrounds that confuse the OCR engine. For scanned documents with uniform backgrounds, this step is less important — ocr has built-in page layout analysis that handles well-formatted documents automatically.

Default choice: detectTextCRAFT. Unless you have a specific reason to use another method (no deep learning available, known fixed layout, etc.), always prefer CRAFT for scene text. It outperforms MSER on complex backgrounds.

MANDATORY before using CRAFT or SAM: Run exist('detectTextCRAFT','file') or exist('imsegsam','file') via MCP. If the result is 0, stop and ask the user to install the required support package (Critical Rule #8). Do NOT proceed without confirmation.

Choose a detection method based on your image type:

Image Type	Method	When to Use
Natural scene (signs, products)	`detectTextCRAFT`	Preferred for complex backgrounds
Document with sparse text	`detectMSERFeatures`	Classic approach, no deep learning needed
Known fixed location	Manual ROI `[x y w h]`	Consistent image layout (e.g., meter face always in same spot)
Text on uniform background	`regionprops` on binary	Find connected components by area/aspect ratio
Color-distinct text	Color segmentation	Isolate text by color channel thresholding
Text overlaid on textured background	`imsegsam` (SAM)	Label on crate, sign on brick wall (text is a separate object)
Simple clean document	None (use `ocr` directly)	Clean scans with uniform background

SAM warning: Do NOT use SAM for stamped, embossed, engraved, or etched text — these are surface features, not objects. Use local contrast subtraction (Step 2) instead. SAM takes ~60-120s and needs full resolution (never downscale).

See reference/detection-methods.md for code examples of each method (CRAFT, MSER, SAM, manual ROI, connected components, color segmentation).

Step 4: Recognize Text

RULE: Always set LayoutAnalysis when passing bounding boxes to ocr(). Never call ocr(I, bbox) without it — the default "auto" fails on small regions. Use "word" for single-word boxes, "block" for multi-line regions.

Common patterns: ocr(I), ocr(I, bbox, LayoutAnalysis="word"), ocr(BW, CharacterSet="0123456789"), ocr(I, roi, Model="seven-segment", LayoutAnalysis="word"). See reference/function-reference.md for full examples.

If detection succeeded but recognition returns garbage: The problem is preprocessing or skew, not detection. Return to Step 2:

Skew >10° causes total failure even on clean binary images
Try local contrast subtraction if the image is textured
Try OCR on preprocessed grayscale rather than binary — anti-aliasing helps

Step 5: Validate Results

Filter by results.WordConfidences > 0.7. Check results.CharacterConfidences for suspect characters. See reference/function-reference.md for validation code patterns.

MANDATORY ACTION: Run evaluateOCR with ground truth.

You MUST compute OCR metrics using evaluateOCR. The ground truth is either:

Text you (Claude) read visually from the image — use this by default
Text the user provided — use this if the user supplies expected text or corrects your read

Run this via MCP after recognition:

% evaluateOCR signature:
%   metrics = evaluateOCR(ocrTextObj, groundTruthCellArray)
%   - ocrTextObj: the ocrText object returned directly by ocr()
%   - groundTruthCellArray: cell array of strings, one per expected text region
metrics = evaluateOCR(results, {"07 A11"; "XTPR 27338-2"});
fprintf("Character Error Rate (CER): %.2f\n", metrics.CharacterErrorRate);
fprintf("Word Error Rate (WER): %.2f\n", metrics.WordErrorRate);

Always run evaluateOCR via MCP and report the actual CER and WER values.

MANDATORY ACTION: Confidence checkpoint — when to ask the user:

If ANY of these are true, show the user what you got and ask them to confirm or correct:

Mean word confidence < 0.6
CER > 0.10 or WER > 0.20
You attempted 2+ preprocessing approaches and results still look wrong
The recognized text doesn't make semantic sense AND you can't read it yourself in the image
OCR returns fewer words than you can visually count in the image

Ask clearly: "I'm not confident in this result. OCR returned [text] (CER: 0.12, WER: 0.20). Can you tell me what the text actually says? I'll use that to tune the pipeline."

If the user provides ground truth, re-run evaluateOCR with the corrected ground truth and adjust preprocessing to minimize CER/WER.

After validation, proceed immediately to Step 6 — do not present results until files are on disk.

Step 6: Save Pipeline + Log

Save the pipeline as a .m script that is re-runnable on different images. Never hardcode values computed interactively — include all detection code (skew angle measurement, threshold selection, ROI detection).

Naming: ocr_pipeline_<descriptor>.m (e.g., ocr_pipeline_stamped_metal.m)

MANDATORY Requirements:

Include comments explaining each DECISION (why this approach, not just what)
Always show diagnostic figures (imshowpair, insertShape for bounding boxes, insertObjectAnnotation for boxes and text)
Use detectTextCRAFT + per-region ocr() with CharacterSet and LayoutAnalysis="word" for textured/complex images (not full-image LayoutAnalysis="block" which gives lower confidence)
Always include timestamp of when file is created. And include comment that says "Created using matlab-ocr Skill". Also produce a decision_log.txt — one line per decision point, format: [step] TYPE: description. Types: VISUAL, EXPERIMENT, DECISION, RESULT, GAP. Example:

[1] VISUAL: Stamped metal, ~20° skew, embossed characters
[2] DECISION: Local contrast subtraction (stamped text confirmed)
[3] EXPERIMENT: Centroid fitting → skew angle = 22.6°
[4] RESULT: "07 AN / XTPR 27338-2" — confidence 0.74 mean
[5] GAP: "A11" misread as "AN" — thin stamps merge at this resolution

This file is what the user sends back if the skill didn't fully work.

Quick Reference

LayoutAnalysis rule: Always set LayoutAnalysis when passing ROI boxes ("word", "block", or "none"). Default "auto" fails on small regions.

Key functions: ocr, detectTextCRAFT, imsegsam, imbinarize, imtophat, imcomplement, imresize, bwareaopen, imrotate. See reference/function-reference.md and reference/batch-pipeline.md.

Custom Model Training

MANDATORY: Present the gating checklist BEFORE recommending training. Train only if ALL are true: (1) preprocessing + parameter tuning still fails, (2) font/charset not in built-in models, (3) you have labeled ground truth, (4) use case is high-volume/repeated. Try first: Model="seven-segment", Model="french", CharacterSet, LayoutAnalysis.

Programmatic API (always use these, not the ocrTrainer app): trainOCR, ocrTrainingOptions, ocrTrainingData. See reference/training-guide.md for full workflow.

Multi-Language Support

MANDATORY before using a non-English model: Run exist('ocrLanguageData','dir') via MCP. If the result is 0, stop and ask the user to install the "OCR Language Data" support package (Critical Rule #8). Do NOT proceed without confirmation.

Use Model="japanese" (or "french-fast" for quantized). Requires OCR Language Data support package. See reference/supported-languages.md.

Troubleshooting

Symptom	Cause	Fix
Empty results	Light text on dark background	`imcomplement(imbinarize(Igray))`
Garbage characters	No preprocessing	Binarize, crop to ROI, use `CharacterSet`
"O"→"0" or "l"→"1"	Ambiguous glyphs	`CharacterSet` to exclude impossible chars
Seven-segment wrong	Layout splits lines	`LayoutAnalysis="word"`
Text too small	<20px tall	`imresize(I, 3)` before OCR
Words split oddly	Skewed text	Deskew via centroid fitting + `imrotate`

When OCR fails, return to Step 1 and re-inspect the binarized output. Do not loop on unsolvable cases (cursive handwriting, CAPTCHAs, <30px text, curved/arc text) — tell the user early. See reference/limitations.md.

matlab-ocr

Plus depuis ce dépôt

Plus depuis ce dépôt

Recognize Text in Images Using OCR

When to Use

When NOT to Use

Critical Rules

Anti-Patterns — Do NOT Do This

Workflow

Progress Reporting + Output Template

OCR Plan (output before any MATLAB code runs)

Results Block (output ONLY after files are written to disk)

GATE (Critical Rule #7)

Step 1: Diagnose

Step 2: Preprocess the Image

Step 3: Detect Text Regions (Critical)

Step 4: Recognize Text

Step 5: Validate Results

Step 6: Save Pipeline + Log

Quick Reference

Custom Model Training

Multi-Language Support

Troubleshooting

Recognize Text in Images Using OCR

When to Use

When NOT to Use

Critical Rules

Anti-Patterns — Do NOT Do This

Workflow

Progress Reporting + Output Template

OCR Plan (output before any MATLAB code runs)

Results Block (output ONLY after files are written to disk)

GATE (Critical Rule #7)

Step 1: Diagnose

Step 2: Preprocess the Image

Step 3: Detect Text Regions (Critical)

Step 4: Recognize Text

Step 5: Validate Results

Step 6: Save Pipeline + Log

Quick Reference

Custom Model Training

Multi-Language Support

Troubleshooting