| name | mimoskill |
| description | Use Xiaomi MiMo V2.5 (the LLM behind mimo2codex) for chat, vision, web search, TTS and ASR — and route around capabilities MiMo doesn't natively support, especially OCR / image recognition / 识图 / 提取图片文字 / extract text from image when the current model can't see images, and image generation / 图像生成 / 生成图片 / draw a picture / 画一张 including Codex Pets `/hatch`. Trigger when the user mentions MiMo, calls into mimo2codex, asks to read text from an image, asks to describe or 识别 an image while using a non-vision model (mimo-v2.5-pro, mimo-v2-flash, …), asks to generate / hatch a Codex pet, asks for image generation while using MiMo as the chat backend, or hits a "no image generation available" / "image_gen tool unavailable" / "this model does not support image input" message inside Codex. |
mimoskill — Xiaomi MiMo V2.5 + gap fillers
This skill bundles two things:
- Direct MiMo V2.5 access — recipes for hitting
https://api.xiaomimimo.com/v1 for chat, vision, web search, TTS, and ASR (works whether or not the mimo2codex proxy is running).
- Workarounds for MiMo's gaps — concrete scripts for the few things MiMo doesn't do, particularly image generation (which is what Codex's
/hatch pet creation needs).
Hard rules (for Codex agents)
These are non-negotiable when operating inside Codex pointed at this proxy:
- Never
pip install openai and never import openai. All scripts use only
the Python standard library (urllib.request, json, etc.). The OpenAI SDK
would fail auth or hit non-existent endpoints.
- Never assume image generation is available natively. MiMo has no image-gen
endpoint. Use
scripts/generate_image.py or scripts/generate_pet.py.
- Don't fight the sandbox. If you need a Python dependency, check
mimoskill/scripts/ first — most things are already there in stdlib.
- Non-vision model + image input → OCR it. When the chat model can't see
images, run
scripts/ocr.py — never ask the user to switch models.
When to use
Trigger this skill when:
- User asks to hit MiMo's API directly (chat / vision / web search / TTS / ASR)
- User asks "how do I generate a Codex pet" / "/hatch isn't working" / "image_gen tool not available"
- User wants image generation as part of a MiMo-backed workflow
- User pastes the Codex error:
the image generation tool (image_gen) is not available in this environment or the CLI fallback requires the openai Python package
- User wants to OCR / read text from / describe / 识别 / 提取文字 from an image while the active chat model is non-vision (e.g. mimo-v2.5-pro, mimo-v2-flash, deepseek-*, or any third-party text-only model) — use
scripts/ocr.py. Works with or without a MiMo key (free pollinations fallback when MIMO_API_KEY is unset).
- User sees the proxy's
[N image attachment(s) omitted: this model does not support image input …] placeholder in their transcript
- Anything in the
mimo2codex repo that touches a feature MiMo doesn't support
What MiMo V2.5 does and doesn't do
Quick answer:
| Capability | MiMo native | Best model | Notes |
|---|
| Text chat | ✅ | mimo-v2.5-pro | reasoning + tools |
| Tool / function calling | ✅ | any | parallel calls supported |
| Vision (image input) | ✅ | mimo-v2.5 or mimo-v2-omni | NOT mimo-v2.5-pro |
| Web search | ✅ | any | requires Web Search Plugin activated in MiMo console |
| TTS (speech synth) | ✅ | mimo-v2.5-tts | separate endpoint |
| ASR (speech recog) | ✅ | mimo-v2.5-asr | separate endpoint |
| Audio chat | ✅ | mimo-v2-omni | input only |
| Video understanding | ✅ | mimo-v2-omni | input only |
| Image generation | ❌ | — | scripts/generate_image.py (general) or scripts/generate_pet.py (Codex pets) — see below |
| OCR / 识图 (when chat model is non-vision) | ⚠️ via mimo-v2.5 or free pollinations | scripts/ocr.py | --engine auto: mimo if MIMO_API_KEY set, else pollinations (no key) |
| Code interpreter / sandbox | ❌ | — | not provided |
For the full capability matrix and examples, read references/models.md.
Decision tree: what does the user actually want?
Is it OCR / read text from image / describe / 识别 an image
when the active chat model is non-vision?
├── Yes → use scripts/ocr.py (mimo-v2.5 if MIMO_API_KEY set, else free pollinations)
└── No
│
Is it chat / vision / search / TTS / ASR with a vision-capable model?
├── Yes → use MiMo directly (see "Calling MiMo directly" below) or via mimo2codex if Codex is the client
└── No, they want image generation
│
Is it for a Codex pet (`/hatch`)?
├── Yes → see "Generating a Codex pet" below (scripts/generate_pet.py + install_pet.sh)
└── No → see "General (non-pet) image generation" below (scripts/generate_image.py)
Calling chat directly (works without any key)
Use scripts/mimo_chat.py for one-shot or streaming chat. Two engines, --engine auto (default) picks mimo if MIMO_API_KEY is set, else pollinations (free, no key) — so the script works without any key for text and vision.
python3 mimoskill/scripts/mimo_chat.py "your prompt here"
python3 mimoskill/scripts/mimo_chat.py --image https://example.com/x.png "describe this"
export MIMO_API_KEY=sk-xxxxxxxxxxxxxxxx
python3 mimoskill/scripts/mimo_chat.py "your prompt here"
python3 mimoskill/scripts/mimo_chat.py "今天上海天气?"
python3 mimoskill/scripts/mimo_chat.py --stream "tell me a story"
When the mimo engine is active the script handles all MiMo-specific quirks — max_completion_tokens instead of max_tokens, the required text part next to image_url, reasoning_content round-tripping, etc. Web search is auto-enabled on pay-as-you-go (sk-*) keys — the web_search builtin is always included in the tools array and the model decides when to invoke it (tool_choice: "auto"). Token-plan (tp-*) keys skip web search (the endpoint doesn't support it). The pollinations engine doesn't support web search, TTS, or ASR (those are MiMo native features); it auto-switches to OpenAI-compat field names (max_tokens).
For non-trivial integrations, references/models.md and the official MiMo OpenAI-compat doc are the authoritative references.
OCR / image recognition (when the chat model can't see images)
If the user wants to read text from an image or describe / 识别 an image but the current chat model is non-vision (mimo-v2.5-pro, mimo-v2-flash, deepseek-*, or any third-party text-only model), invoke scripts/ocr.py. Three engines, --engine auto (default) picks in this order — mimo if MIMO_API_KEY set, else tesseract if installed and mode=text, else pollinations:
mimo — needs MIMO_API_KEY, uses mimo-v2.5 regardless of the chat model. Best quality. All modes.
tesseract — no key, no network. Fully local OCR. Auto-used if installed and --mode text. Recommended for users behind GFW or offline. One-time install: brew install tesseract tesseract-lang / sudo apt install tesseract-ocr tesseract-ocr-chi-sim / Windows installer at github.com/UB-Mannheim/tesseract/wiki.
pollinations — free public vision endpoint at text.pollinations.ai, no key required. All modes. But may be unreachable from mainland China — if you see "connection failed (pollinations)", suggest tesseract as the offline alternative.
The proxy silently drops image attachments on non-vision models (src/translate/reqToChat.ts:48-72) and leaves a [N image attachment(s) omitted: …] placeholder. When you see that placeholder in the transcript, the right move is to run ocr.py and feed the text back into the conversation. Don't ask the user to switch models.
python3 mimoskill/scripts/ocr.py path/to/image.png
python3 mimoskill/scripts/ocr.py --mode describe https://example.com/x.png
python3 mimoskill/scripts/ocr.py --mode structured a.png b.jpg
cat scan.png | python3 mimoskill/scripts/ocr.py --mode markdown
export MIMO_API_KEY=sk-xxxxxxxxxxxxxxxx
python3 mimoskill/scripts/ocr.py path/to/image.png
python3 mimoskill/scripts/ocr.py --engine pollinations form.png
ocr.py accepts local paths, http(s) URLs, data: URLs, or stdin bytes. Magic-byte sniffs the MIME (PNG / JPEG / GIF / WebP / BMP). Multiple positional args are batched into one upstream call. Non-vision --model values are auto-coerced to mimo-v2.5 with one stderr note (mimo engine only; on pollinations use --pollinations-model).
See references/ocr_workflow.md for full mode reference, exit codes, JSON shape for --mode structured, and the --lang / --prompt knobs.
General (non-pet) image generation
For arbitrary image generation, use scripts/generate_image.py — a thin wrapper over generate_pet.py with the chibi-pet prompt boilerplate removed and an optional --style for common looks. Same providers (auto / pollinations / gpt-image-1 / replicate / local-sd), same env vars, same auto fallback to free Pollinations when you only have a MiMo key.
python3 mimoskill/scripts/generate_image.py \
--prompt "isometric cyberpunk city at dusk" --out /tmp/out.png
python3 mimoskill/scripts/generate_image.py --style pixel-art \
--prompt "a brave knight" --out /tmp/knight.png
python3 mimoskill/scripts/generate_image.py --n 4 \
--prompt "watercolor desert sunrise" --out /tmp/img.png
export PET_OPENAI_API_KEY=sk-real-openai-key
python3 mimoskill/scripts/generate_image.py --provider gpt-image-1 \
--prompt "..." --out /tmp/out.png
--style choices: plain (default, no prefix), pixel-art, photo, 3d-render, line-art, watercolor, sticker. plain sends your prompt verbatim — pick that when the user gave a fully-specified prompt.
For Codex /hatch pets keep using generate_pet.py + install_pet.sh — that flow is unchanged and tuned for the chibi sprite + 3-state bundle Codex wants.
Generating a Codex pet (the /hatch alternative)
Why this needs special handling: Codex's built-in /hatch pet generation requires OpenAI's image generation API (gpt-image-1). MiMo doesn't have an image generation endpoint, and mimo2codex can't fake one. So /hatch from inside Codex won't work when Codex is pointed at MiMo.
The workaround: generate the pet image outside of Codex, then drop the result into Codex's pet directory and restart Codex. The script supports several image-gen backends:
auto (default) — picks gpt-image-1 if you have an OpenAI key set, otherwise falls back to pollinations.ai (free, no key, no signup). Works with only a MiMo key.
pollinations — free, no key required
gpt-image-1 — best quality, needs a real OpenAI key (separate from MIMO_API_KEY)
replicate — FLUX/SDXL, ~$0.003/img, needs REPLICATE_API_TOKEN
local-sd — Automatic1111/ComfyUI on 127.0.0.1:7860, free, needs local setup
Quickstart (only MiMo key required)
python3 mimoskill/scripts/generate_pet.py \
--description "a chubby cyberpunk axolotl coding hero" \
--out ~/Downloads/my-pet.png
bash mimoskill/scripts/install_pet.sh ~/Downloads/my-pet.png "axolotl-coder"
If the sandbox blocks the network call
Codex's sandbox may prevent the scripts from reaching external APIs
(Pollinations, OpenAI, Replicate, MiMo, etc.). When that happens, do not
respond with "please install openai" or try to work around the sandbox.
Tell the user to run the command in a regular terminal:
I can't reach the network from inside the sandbox. Please run the
following in a regular terminal (outside Codex), then tell me when it's
done and I'll continue:
python3 mimoskill/scripts/generate_pet.py --description "..." --out /tmp/pet.png
bash mimoskill/scripts/install_pet.sh /tmp/pet.png "<pet-name>"
No pip install is needed — the script uses only the Python standard
library.
The same pattern applies for generate_image.py, ocr.py, and mimo_chat.py.
generate_pet.py will print [provider] auto → pollinations so you know the free path is in use.
Optional: better quality with an OpenAI key
If you do want gpt-image-1 quality (and image-to-image edit via --reference):
export PET_OPENAI_API_KEY=sk-real-openai-key
python3 mimoskill/scripts/generate_pet.py \
--reference path/to/source-image.jpg \
--description "a chubby cyberpunk axolotl coding hero" \
--out ~/Downloads/my-pet.png
auto will pick gpt-image-1 automatically when this env var is set. This OpenAI key is only used for the image generation call — your chat conversations still go through MiMo via mimo2codex.
Step-by-step walkthrough + prompt design
Read references/pet_workflow.md for:
- The exact Codex pet folder location on macOS / Linux / Windows
- How to make a static image work (most pets are animated GIFs, but a static PNG fallback works)
- How to generate animated states (idle / working / done) — typically requires multiple gpt-image-1 calls with edit / remix prompting
- How to mix MiMo + image gen: have MiMo write the prompt, then feed that prompt to gpt-image-1
Use the proven pet prompt formula in assets/pet_prompt_template.md — it's tuned for the chibi / sticker style Codex uses.
Image generation in general
If the user wants image generation for some other reason (not a pet), the same workaround applies: gpt-image-1 is the highest-quality option but requires a real OpenAI key. Free alternatives:
- Stable Diffusion locally via Automatic1111 or ComfyUI — heavy setup but no per-call cost
- Together AI / Replicate — pay-as-you-go for SDXL / FLUX
- Pollinations.ai — free, no key required, lower quality
scripts/generate_pet.py defaults to gpt-image-1 but accepts --provider pollinations for the free path (with reduced quality).
Cost notes
- Direct MiMo: pay-as-you-go (
sk-xxx) or token plan (tp-xxx). See pricing.
- Web Search plugin: separately metered per keyword search. Cap with
max_keyword.
- gpt-image-1: ~$0.04 per 1024×1024 image (low quality), up to ~$0.17 (HD). One pet usually costs <$0.50 even with retries.
- Pollinations.ai: free.
Don't use this skill for
- Just running mimo2codex (that's an HTTP proxy; this skill is direct API + workarounds). For mimo2codex itself, see the project README.md / README.zh.md.
- Configuring Codex (use
mimo2codex print-config or mimo2codex print-cc-switch).
- Anything Anthropic / Claude — this is MiMo-specific.