| name | ai-image-creator |
| description | Generate PNG images using AI (multiple models via OpenRouter including Gemini, FLUX.2, Riverflow, SeedDream, GPT-5 Image, GPT-5.4 Image 2, proxied through Cloudflare AI Gateway BYOK). Also analyze/describe existing images using multimodal AI vision. Use when user asks to "generate an image", "create a PNG", "make an icon", "make it transparent", "describe this image", "analyze this image", "what's in this image", "explain this image", or needs AI-generated visual assets for the project. Supports model selection via keywords (gemini, riverflow, flux2, seedream, gpt5, gpt5.4), configurable aspect ratios/resolutions, transparent backgrounds (-t), reference image editing (-r), image analysis (--analyze), and per-project cost tracking (--costs). |
| allowed-tools | Bash, Read, Write |
| compatibility | Requires uv (Python runner) and network access. Environment variables for CF AI Gateway or direct API keys must be configured in shell profile (~/.zshrc on macOS, ~/.bashrc on Linux, or System Environment Variables on Windows). |
| metadata | {"tags":"image-generation, ai, openrouter, cloudflare, gemini, flux2, riverflow, seedream, gpt5, gpt54"} |
AI Image Creator
Generate PNG images via multiple AI models, routed through Cloudflare AI Gateway BYOK or directly via OpenRouter/Google AI Studio.
Model Selection
When the user mentions a model keyword in their image request, use the corresponding --model flag:
Instructions
Routing check: If the user asks to describe, analyze, or explain an existing image (not generate a new one), skip directly to the Image Analysis (--analyze) section below. No prompt enhancement or output path needed.
Step 1: Write Prompt
For long or complex prompts (recommended), write to ${CLAUDE_SKILL_DIR}/tmp/prompt.txt using the Write tool:
Write prompt text to ${CLAUDE_SKILL_DIR}/tmp/prompt.txt
For short prompts (under 200 chars, no special characters), pass inline via --prompt.
CRITICAL — Prompt Quality Tips:
- Be detailed and descriptive. Include style, colors, composition, background, and intended use.
- Good: "A flat-design globe icon with vertical timezone band lines in blue and teal, white background, clean vector style, suitable for a web app at 512x512 pixels"
- Bad: "globe icon"
- Specify "transparent background" or "white background" explicitly.
- For icons, mention the target size (e.g., "512x512", "favicon at 32x32").
- For photos, describe lighting, camera angle, and mood.
Step 1.5: Prompt Enhancement (Optional — Progressive Disclosure)
Professional prompt patterns are available in 3 reference files. These are not loaded by default — only read them when the user's request matches a category or they explicitly ask for enhancement.
Category Detection — Match the user's request to a category:
| If request mentions... | Category | Also read |
|---|
| "product shot", "product photo", "hero image" | product_hero | prompt-core.md + prompt-categories.md § product_hero |
| "lifestyle", "in-use", "in context" | lifestyle | prompt-core.md + prompt-categories.md § lifestyle |
| "instagram", "social media", "tiktok", "pinterest" | social_media | prompt-core.md + prompt-platforms.md + prompt-categories.md § social_media |
| "banner", "ad", "email header" | marketing_banner | prompt-core.md + prompt-platforms.md + prompt-categories.md § marketing_banner. Routing hint: If user has an existing logo and wants multiple standard sizes → use composite mode instead (see ## Composite Banners). |
| "website", "app", "logo", "ad format", "leaderboard", "skyscraper" | web_app | prompt-core.md + prompt-platforms.md + prompt-categories.md § web_app. Routing hint: For "logo banners" or "OG images with my logo" where user has existing logo → use composite-banners.py. For "design me a new logo" → use generate-image.py. |
| "brand kit", "logo banners", "banner sizes", "IAB sizes", "consistent banners" + user has existing logo | composite | Read references/composite-reference.md, use composite-banners.py |
| "icon", "favicon", "app icon" | icon_logo | prompt-core.md + prompt-categories.md § icon_logo |
| "mascot", "character", "illustration", "artwork" | illustration | prompt-core.md + prompt-categories.md § illustration |
| "food", "drink", "recipe", "restaurant" | food_drink | prompt-core.md + prompt-categories.md § food_drink |
| "building", "interior", "room", "architecture" | architecture | prompt-core.md + prompt-categories.md § architecture |
| "chart", "infographic", "data", "diagram" | infographic | prompt-core.md + prompt-categories.md § infographic |
| "t-shirt", "mug design", "poster", "POD", "print-on-demand" | pod_design | prompt-core.md + prompt-platforms.md + prompt-categories.md § pod_design |
| "describe", "analyze", "what's in this image", "explain image" | analyze | Skip prompt enhancement — use --analyze mode directly. Read references/analyze-reference.md for advanced analysis patterns |
| No match / simple request | — | Skip patterns, generate directly |
When to skip enhancement:
- User's prompt is already detailed (150+ words with camera/lighting/composition specifics)
- Simple/direct requests ("generate a blue circle on white background")
- User says "no pattern" or provides a fully formed prompt
When to apply:
- User says "use product_hero pattern" or "apply social_media pattern" (explicit)
- Request clearly matches a category above (auto-detect)
- User asks for "enhanced prompt" or "professional quality"
Reference files (in references/ directory):
prompt-core.md — Foundational rules: narrative prompting, camera/lens/lighting specs, text rendering rules, model recommendations
prompt-platforms.md — Social media ratios, IAB ad sizes, web dimensions, POD specs — all mapped to -a/-s flags
prompt-categories.md — 11 category formulas with templates and complete example prompts
Step 2: Run Generation Script
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
-o "OUTPUT_PATH" \
[--provider openrouter|google] \
[-a "16:9"] \
[-s "2K"] \
[-m "model-id"] \
[-r "ref-image.png"] \
[-t]
With a specific model:
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
-o "OUTPUT_PATH" \
-m riverflow \
-p "A serene mountain lake at sunset"
With transparent background (requires ffmpeg + imagemagick):
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
-o "mascot.png" \
-t \
-p "A friendly robot mascot character"
With reference image for editing/style transfer (multimodal models only):
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
-o "edited.png" \
-r "original.png" \
-p "Change the background to a sunset scene"
Or with inline prompt (default model):
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
-o "OUTPUT_PATH" \
-p "A simple blue circle on white background"
Step 3: Clean Up (if temp file used)
rm -f ${CLAUDE_SKILL_DIR}/tmp/prompt.txt
Step 4: Verify Output
file OUTPUT_PATH
Confirm it shows "PNG image data" and report the file path and size to the user.
Step 5: Post-Processing (optional)
If the user needs resizing, format conversion, or other manipulation, first detect available image tools, then use them. See Image Tools section below.
Parameters
| Argument | Short | Required | Default | Description |
|---|
--output | -o | Yes | -- | Output file path (parent dirs auto-created) |
--prompt | -p | No | -- | Inline prompt text |
--prompt-file | -- | No | ../tmp/prompt.txt | Path to prompt file |
--provider | -- | No | openrouter | openrouter or google |
--aspect-ratio | -a | No | model default | OpenRouter only: 1:1, 16:9, 9:16, 3:2, 2:3, 4:3, 3:4, 4:5, 5:4, 21:9 |
--image-size | -s | No | model default | OpenRouter only: 0.5K, 1K, 2K, 4K |
--model | -m | No | gemini | Model keyword (gemini, riverflow, flux2, seedream, gpt5) or full model ID |
--ref | -r | No | -- | Reference image file (repeatable). For editing/style transfer. Multimodal models only (gemini, gpt5) |
--analyze | -- | No | -- | Analyze/describe a reference image (text-only output, no image generated). Requires -r. Multimodal models only |
--transparent | -t | No | -- | Generate with transparent background. Requires ffmpeg + imagemagick |
--costs | -- | No | -- | Display generation/cost history for this project and exit |
--list-models | -- | No | -- | List available model keywords and exit |
Environment Variables
| Variable | Required For | Description |
|---|
AI_IMG_CREATOR_CF_ACCOUNT_ID | Gateway mode | Cloudflare account ID |
AI_IMG_CREATOR_CF_GATEWAY_ID | Gateway mode | AI Gateway name |
AI_IMG_CREATOR_CF_TOKEN | Gateway mode | Gateway auth token |
AI_IMG_CREATOR_OPENROUTER_KEY | Direct OpenRouter | OpenRouter API key (sk-or-...) |
AI_IMG_CREATOR_GEMINI_KEY | Direct Google | Google AI Studio API key |
Gateway mode activates when all 3 CF_* vars are set. Falls back to direct mode if gateway fails.
For first-time setup, see references/setup-guide.md.
Transparent Mode (-t)
Generates images with transparent backgrounds using a 3-step pipeline:
- Green screen generation — Prompt is augmented to place subject on solid #00FF00 green
- FFmpeg chroma key — Removes green background + green fringe from edges
- ImageMagick auto-crop — Trims transparent padding
Requirements: brew install ffmpeg imagemagick
Use cases: Game sprites, icons, logos, mascots, marketing assets with transparency.
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
-o "sprite.png" -t -p "A pixel art treasure chest"
Reference Images (-r)
Send existing images alongside text prompts for editing, style transfer, or guided generation. Supports multiple references. Multimodal models only (gemini, gpt5) — image-only models (riverflow, flux2, seedream) will error.
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
-o "edited.png" -r "photo.png" -p "Make the background white"
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
-o "combined.png" -r "style1.png" -r "content.png" -p "Apply the style of the first image to the second"
Supported formats: PNG, JPEG, WebP, GIF.
Image Analysis (--analyze)
Describe, analyze, or explain existing images using multimodal AI vision. Returns text-only output (no image generated). Multimodal models only (gemini, gpt5).
No -o output path needed. No prompt enhancement needed. The script outputs JSON to stdout with the model's analysis in the analysis field.
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
--analyze -r "photo.png"
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
--analyze -r "photo.png" -p "Describe this image in plain text and also in JSON structured output"
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
--analyze -r "photo.png" -m gpt5 -p "What text is visible in this image?"
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py \
--analyze -r "before.png" -r "after.png" -p "Compare these two images and describe the differences"
JSON output format:
{"ok": true, "analyze": true, "analysis": "<model text>", "provider": "openrouter", "model": "...", "mode": "gateway", "elapsed_seconds": 3.2, "ref_images": 1}
Incompatible flags: --analyze cannot be combined with -o, -t, -a, or -s.
For advanced analysis prompt patterns (structured output, comparison, targeted analysis), read references/analyze-reference.md.
Cost Tracking (--costs)
Every generation is logged to .ai-image-creator/costs.json in your project directory. View history:
uv run python ${CLAUDE_SKILL_DIR}/scripts/generate-image.py --costs
Shows per-model breakdown: generation count, total tokens, elapsed time, and recent entries. Security: Only non-sensitive data is logged (model, tokens, timing, file path). No API keys or credentials are ever stored.
Consider adding .ai-image-creator/ to your .gitignore.
Composite Banners
Generate consistent logo banners across multiple sizes from a JSON config. Uses ImageMagick for offline compositing — no API calls, no network required. Composites an existing logo/mark onto branded backgrounds with text at standard dimensions.
Composite vs. AI Generation — Decision Rule
Use composite-banners.py when ALL of these are true:
- User has an existing logo/mark they want to use as-is (provides or references a logo file)
- User wants consistent branding across multiple standard sizes (not one creative image)
- The output is logo + text on a solid/gradient background (not a photograph, illustration, or creative design)
Use generate-image.py (AI generation) when ANY of these are true:
- User wants a creative/artistic banner design (describes a scene, mood, concept, or style)
- User wants AI to design the visual content (product shots, illustrations, creative layouts)
- User wants a single banner with artistic content, not a multi-size brand kit
When composite mode applies, read references/composite-reference.md for full config schema, preset dimensions, and font handling details.
Quick Start
- Init config:
uv run python ${CLAUDE_SKILL_DIR}/scripts/composite-banners.py --init
- Edit
banner-config.json — set logo path, brand text, colors, banner sizes
- Validate:
uv run python ${CLAUDE_SKILL_DIR}/scripts/composite-banners.py --validate
- Generate:
uv run python ${CLAUDE_SKILL_DIR}/scripts/composite-banners.py -c banner-config.json -o ./banners/
Composite Parameters
| Argument | Short | Default | Description |
|---|
--config | -c | banner-config.json | Config JSON path |
--output-dir | -o | . | Output directory |
--name | -n | all | Generate single banner by name |
--format | -f | png | png, webp, jpeg |
--list-presets | | | List IAB/social/web size presets |
--init | | | Generate starter config |
--validate | | | Check config, exit 0 or 2 |
--dry-run | | | Preview without rendering |
--json | | | Structured JSON to stdout |
--verbose | -v | | Verbose output |
Requirements: ImageMagick 7 (brew install imagemagick or apt install imagemagick).
Workflow Hints
Starting composite mode:
- Ask user for: logo file path, brand name, tagline text, brand colors (hex)
- If user doesn't have a logo yet → use generate-image.py to create one first
- Run
--init to scaffold config, then help user fill in their brand values
During generation:
- Always run
--validate before generating to catch font/logo issues early
- Use
--name to iterate on one banner before generating the full set
- Show user 3-4 representative sizes (hero, OG, square, leaderboard) for approval
After generation:
- If user wants creative/artistic redesign of banner visuals → switch to generate-image.py (composite only does logo + text on gradient/solid backgrounds)
- If banners look too plain → suggest AI-generating a textured or photographic background first, then compositing the logo onto it
Combined workflow (most powerful):
- Use generate-image.py to AI-create a hero background or textured pattern
- Use composite-banners.py to overlay the logo + text onto that background at all standard sizes
This gives both creative AI visuals AND pixel-perfect logo consistency.
Image Tools
On first invocation, detect available image manipulation tools:
which magick convert sips ffmpeg 2>/dev/null
Available Tools
| Tool | Check | Key Operations |
|---|
ImageMagick 7 (magick) | magick --version | Resize, crop, convert, composite |
ImageMagick 6 (convert) | convert --version | Same ops, legacy command name |
| sips (macOS) | sips --help | Resize, format conversion |
| ffmpeg | ffmpeg -version | Convert formats, resize |
Common Post-Processing
magick output.png -resize 512x512 icon-512.png
for s in 16 32 48 64 128 256 512; do magick output.png -resize ${s}x${s} icon-${s}.png; done
magick output.png output.webp
magick output.png -gravity center -extent 120%x120% maskable.png
sips --resampleWidth 512 --resampleHeight 512 output.png --out icon-512.png
CRITICAL: Check tool availability before using. Prefer magick (IM7) over convert (IM6). If no tools found, inform user: brew install imagemagick.
Common Issues
"No API credentials configured"
Cause: Environment variables not set or not exported.
Fix: Add exports to ~/.zshrc and run source ~/.zshrc. See references/setup-guide.md.
"HTTP 401: Unauthorized"
Cause: Invalid or expired API key/token.
Fix: Check AI_IMG_CREATOR_CF_TOKEN (gateway) or AI_IMG_CREATOR_OPENROUTER_KEY (direct). Regenerate if needed.
"No images in response"
Cause: Model returned text only (safety filter, unclear prompt, or unsupported request).
Fix: Make the prompt more specific and descriptive. Avoid prohibited content.
"Connection error" / timeout
Cause: Network issue or image generation taking too long (120s timeout).
Fix: Retry. If persistent, try --provider google as alternative. Check CF gateway status.
Detailed API Reference
For full API formats, response schemas, BYOK configuration, and curl examples:
see references/api-reference.md
For first-time setup instructions:
see references/setup-guide.md