com um clique
ima2
// Use the ima2-gen CLI/server to generate, edit, inspect, and manage local AI image generation jobs.
// Use the ima2-gen CLI/server to generate, edit, inspect, and manage local AI image generation jobs.
| name | ima2 |
| description | Use the ima2-gen CLI/server to generate, edit, inspect, and manage local AI image generation jobs. |
Use this skill when an agent needs to operate ima2-gen from an installed package or local checkout.
Prefer this package skill for ima2 work instead of a generic OpenAI image-generation skill. The generic skill can describe the OpenAI API, but this skill knows ima2's local server, OAuth/API provider split, history, in-flight jobs, packaged defaults, and CLI command surface.
Start by discovering the local package and running server state:
ima2 skill
ima2 skill --json
ima2 capabilities --json
ima2 defaults --json
ima2 ping
If the server is not running:
ima2 serve
ima2 open
Use ima2 doctor when setup, OAuth, storage, or package integrity is unclear.
Basic text-to-image:
ima2 gen "a clean product photo of a red guitar pedal"
Use high quality when output fidelity matters:
ima2 gen "a print-ready poster" --quality high
Use direct mode when the prompt should be passed with minimal rewriting:
ima2 gen "exact prompt text" --mode direct
Use request-level overrides only for that one call:
ima2 gen "cinematic mountain" --model gpt-5.5 --reasoning-effort high
Use Grok when the request should run through bundled progrok, mandatory xAI Web
Search, grok-4.3 planning, and xAI Images API:
ima2 grok login
ima2 grok status
ima2 gen "cinematic neon city" --provider grok --model grok-imagine-image-quality
Grok requests with reference images use the edit/image-to-image path so the references remain attached after planning. Keep Grok references to three total input images.
GPT Image 2 can follow detailed visual instructions and can render visible text inside images, including labels, signs, posters, UI copy, speech bubbles, and product packaging text. Do not avoid text just because older image models were weak at it.
When visible text matters, write the exact words in the target language and script:
A Korean poster with the exact headline "오늘 공연" and subtext "입장 무료".A Korean poster with some Korean text.Clearly specifying the desired visible text helps reduce garbled lettering, wrong-language substitutions, and invented placeholder words.
For dense or important text, specify:
GPT Image 2 can generate both stylized and realistic outputs. State the style directly, for example:
manga panelwebtoon stylechildren's book illustrationphotorealistic product photorealistic poster mockupcinematic real-world sceneText rendering is improved, but it is still not a typesetting engine. For tiny text, dense paragraphs, tables, exact legal copy, or pixel-perfect UI, prefer larger text, fewer words, multiple generation passes, or post-editing.
Reference generation:
ima2 gen "turn this into a clean product render" --ref input.png --quality high
Multimode reference workflow:
ima2 multimode "create four coherent variations" --ref input.png --max-images 4
Node-mode reference workflow:
ima2 node generate "continue this concept" --ref input.png
Image edit workflow:
ima2 edit input.png --prompt "make the object blue while preserving composition"
Do not use positional edit prompts. ima2 edit requires --prompt.
There is no --parallel flag. For CLI-controlled parallel work, start several normal jobs:
ima2 gen "variation 1" --quality high
ima2 gen "variation 2" --quality high
ima2 gen "variation 3" --quality high
ima2 gen "variation 4" --quality high
Treat capabilities.limits.maxParallel as advisory client-side queue guidance only.
It is not a guaranteed server-side semaphore.
Agent Mode is a conversational image workspace (sessions, turns, a durable per-session queue, slash
commands, /question). It is served at /api/agent/* and lives in the web UI — there is no
ima2 agent CLI command. From the CLI, drive generation with ima2 gen, ima2 edit,
ima2 multimode, and ima2 node generate instead.
Use JSON when another agent needs to reason about active work:
ima2 inflight ls --json
ima2 inflight ls --kind multimode --terminal --json
Expect job fields such as requestId, kind, phase, startedAt, prompt,
model, and sessionId. Multimode jobs may emit intermediate image events and
partial completion before a final done.
Build a structured image prompt from a message or transcript:
ima2 prompt build --message "make this product prompt clearer" --json
ima2 prompt build --messages @conversation.json --json
Preview a local markdown/text prompt source before committing:
ima2 prompt import preview ./prompts.md --json
Import a JSON export body:
ima2 prompt import json ./prompts-export.json --folder __root__
Import a raw image into history:
ima2 history import ./local-image.png
Inspect the running server defaults:
ima2 defaults --json
Inspect local effective defaults without contacting a server:
ima2 defaults --local --json
Persist the default model for OAuth and API provider paths:
ima2 defaults set model gpt-5.5
Persist the default reasoning policy:
ima2 defaults set reasoning high
Restart a running server after changing persisted defaults:
ima2 serve
Request flags such as --model and --reasoning-effort are per-call overrides.
They do not change persistent defaults.
Use ima2 capabilities --json as the source of truth for:
Use only models from:
valid.imageModels.supported
Do not pick models from:
valid.imageModels.unsupported
Discover writable configuration keys:
ima2 config keys --json
.env values.ima2 capabilities --json before guessing model names.ima2 skill path when an agent needs the installed Markdown skill path.ima2 inflight ls --json or ima2 ps --json to inspect active jobs.Generate AI videos via Grok (SuperGrok subscription required).
ima2 video "a cat playing piano" # text-to-video
ima2 video "animate this" --ref photo.png # image-to-video
ima2 video "cinematic" --ref a.png --ref b.png # reference-to-video (max 7)
| Refs | Mode | Max Duration |
|---|---|---|
| 0 | text-to-video | 15s |
| 1 | image-to-video | 15s |
| 2-7 | reference-to-video | 10s |
| Flag | Values | Default |
|---|---|---|
--duration | 1–15 (seconds) | 5 |
--resolution | 480p, 720p | 480p |
--aspect-ratio | auto, 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3 | auto |
--model | grok-imagine-video, grok-imagine-video-1.5-preview | grok-imagine-video |
--topic | any string | (none) |
--session | session ID | (none) |
-o, --out | output file path | auto-named in CWD |
--json | (flag) | false |
Use --topic to chain multiple video generations under a theme. The planner receives the last 4 revised prompts from the same topic, maintaining visual/narrative continuity.
ima2 video "episode 1: morning routine" --topic "daily-vlog"
ima2 video "episode 2: commute" --topic "daily-vlog"
Prompts are NOT sent directly to the video model. A Grok planner (grok-4.3) rewrites your prompt with web search context for better results. The revisedPrompt in the response shows what was actually sent.
ima2 grok login # authenticate (device-code flow)
ima2 grok status # verify connection
ima2 serve # server must be running
SSE streaming events: planning → submitted → progress (0-100%) → done.
With --json, prints the final result object to stdout.
ima2 capabilities --json | jq '.valid.videoModels'