| name | mmskills |
| description | Use MMSkills for GUI, computer-use, visual-agent, OSWorld, Ubuntu desktop, macOS, Minecraft, or Mario tasks where reusable multimodal skill packages can guide planning, state recognition, or verification. Also use when installing, downloading, searching, or adapting MMSkills packages for Codex, OpenClaw, Claude Code, or other agent products. |
MMSkills Agent Adapter
MMSkills Agent Adapter is a unified adapter for locating and using multimodal skill packages from the public MMSkills dataset. The same skill package schema should be used across Codex, OpenClaw, Claude Code, and other agent products; product-specific integration should stay in thin adapters, not in duplicated skill content.
Asset Source
The full public skill library is hosted on Hugging Face:
The dataset currently exposes 515 packages across Ubuntu, macOS, VAB-Minecraft, and Mario. Do not assume all packages are installed locally. Search and download only the relevant packages when needed.
Quick Commands
Run these commands from this skill directory:
python scripts/search_skills.py "chrome bookmark" --package ubuntu
python scripts/download_skill.py ubuntu/chrome/CHROME_Manage_Bookmarks_Reading_List_And_Shortcuts
python scripts/inspect_skill.py ~/.cache/mmskills/skills/ubuntu/chrome/CHROME_Manage_Bookmarks_Reading_List_And_Shortcuts
Use MMSKILLS_HOME=/path/to/cache to store downloaded packages outside the default ~/.cache/mmskills.
Use Workflow
- Infer the task surface: package, platform, app, and operation family.
- Search the dataset index with
scripts/search_skills.py.
- Download the best-matching package with
scripts/download_skill.py if it is not already local.
- Read the package in this order:
SKILL.md for procedure, applicability, preconditions, and failure modes.
runtime_state_cards.json for compact state cues and verification signals.
Images/ only when visual grounding, state comparison, or UI verification is needed.
- Use the package as guidance, not as a hard coordinate script. Transfer the operational pattern, visible cues, and verification logic to the live task state.
Cross-Agent Contract
MMSkills packages are product-neutral. Each package may contain:
<package>/<domain>/<skill>/
├── SKILL.md
├── runtime_state_cards.json
├── plan.json
└── Images/
Codex, OpenClaw, and Claude Code should share this package format. The only product-specific pieces should be:
- where packages are cached;
- which tool opens images or screenshots;
- how the agent routes retrieved guidance back into its planner;
- whether the product can use visual references directly or should fall back to text-only cues.
Read references/integration_targets.md before implementing a new product adapter. Read references/asset_sources.md when changing dataset URLs, cache locations, or download behavior.
When Visual Evidence Is Useful
Load visual references when:
- the task depends on recognizing a specific UI state, dialog, control, or result surface;
- text-only procedure is ambiguous;
- the package has a verification state that matches the current screenshot;
- the agent needs a recovery cue after a failed or uncertain GUI action.
Prefer focus crops for local control recognition and full frames for global state recognition.