Run any Skill in Manus with one click

gen-image

Name: Gen Image
Author: orakitine

Generate or edit images using AI image models. Currently supports Google Gemini (Nano Banana Pro for creation, Nano Banana for editing). Provider-agnostic design. Use for image generation, visual content creation, or image editing tasks.

Run Skill in Manus

Skill metadata

Stars0

Forks0

UpdatedMay 23, 2026 at 00:42

File Explorer

3 files

SKILL.md

readonly

name	gen-image
description	Generate or edit images using AI image models. Currently supports Google Gemini (Nano Banana Pro for creation, Nano Banana for editing). Provider-agnostic design. Use for image generation, visual content creation, or image editing tasks.
argument-hint	[create\|edit] [prompt or options]
allowed-tools	["Bash","Read"]

Generate or edit images via AI image models. Generic — no style or domain knowledge. Higher-order skills build prompts and call the CLI scripts bundled here.

Prerequisites

python3 must be available in PATH (no pip packages needed — stdlib only)
GEMINI_API_KEY must be set. Add to one of:
- Project .env (for project-level installs): echo 'GEMINI_API_KEY=your-key' >> .env
- Global ~/.claude/.env (for global installs): echo 'GEMINI_API_KEY=your-key' >> ~/.claude/.env
- Or export directly: export GEMINI_API_KEY=your-key

API calls consume Google AI credits. Imagen 4 requires billing. Gemini Flash has a free tier but is rate-limited. See Google AI pricing.

Variables

GI_CLI: python3 ./gemini-image.py # Path to the Gemini image CLI DEFAULT_CREATE_MODEL: gemini-3-pro-image-preview # Nano Banana Pro. Best quality + character/text fidelity. Override with --model (e.g. imagen-4.0-generate-001 for cheaper batches) DEFAULT_EDIT_MODEL: gemini-2.5-flash-image # Nano Banana. Default for edits DEFAULT_SIZE: 1280x960 # 4:3 landscape. Override with --size

Workflow

Check Prerequisites
- IF: which python3 fails → report "python3 not found" and stop
- IF: ./gemini-image.py not found → report "gemini-image CLI missing" and stop
- IF: API key not available → run <GI_CLI> models as a lightweight auth check. If it fails with "GEMINI_API_KEY not set", stop and tell the user:
  GEMINI_API_KEY is not configured. Set it in one of:
  - Project-level: add GEMINI_API_KEY=your-key to ./.env
  - Global (recommended for personal use): add GEMINI_API_KEY=your-key to ~/.claude/.env
  - Shell: export GEMINI_API_KEY=your-key
  Get your API key at: https://aistudio.google.com/apikey
- Example: python3 found, gemini-image.py exists, API key valid → proceed
- Tool: Bash
Route Request
- IF: user asks to create an image → go to step 3
- IF: user asks to edit an existing image → go to step 4
- IF: user asks about available models → run <GI_CLI> models and report results
- IF: user asks to check remote API models → run <GI_CLI> models --remote
- Example: "generate a sunset painting" → step 3 (create)
- Example: "make the sky orange in photo.png" → step 4 (edit)
- Example: "what models can I use?" → <GI_CLI> models
- Tool: Bash <GI_CLI> models [--remote] [--json]
Create Image (Text to Image)
- Generate a new image from a text prompt
- IF: no --model → uses DEFAULT_CREATE_MODEL
- IF: no --size → uses DEFAULT_SIZE
- IF: no --output → saves as output.png in current directory
- IF: safety filter triggers (empty response) → rephrase prompt to be more specific, less ambiguous
- IF: rate limit (429) → wait and retry. Imagen 4 has generous limits; Gemini Flash free tier is stricter
- Enrich vague prompts with specific details (style, lighting, composition, colors) before calling the API
- Example: <GI_CLI> create "a watercolor painting of a mountain lake at sunset" --output lake.png
- Example: <GI_CLI> create "pixel art treasure chest, 32x32 sprite" --size 1024x1024 --output chest.png
- Example: <GI_CLI> create "photo of a red sports car" --model gemini-2.0-flash-exp --output car.png
- Tool: Bash <GI_CLI> create <prompt> [--output <path>] [--size <WxH>] [--model <id>]
Edit Image (Image + Instruction)
- Modify an existing image based on a text instruction
- IF: no --model → uses DEFAULT_EDIT_MODEL
- IF: no --output → saves as edited.png
- IF: user wants to overwrite → set --output same as --input
- Describe what to CHANGE, not what to keep. Good: "Make the sky orange." Bad: "Keep everything but change the sky."
- Example: <GI_CLI> edit "change the background to a beach" --input photo.png --output beach.png
- Example: <GI_CLI> edit "remove the text overlay" --input screenshot.png --output clean.png
- Tool: Bash <GI_CLI> edit <instruction> --input <source> [--output <path>] [--model <id>]
Inspect and Iterate
- Read the generated image to visually verify the result
- Check for common AI artifacts: extra limbs, baked-in text, wrong colors, white borders, blur
- IF: result needs small fixes → use edit command on the output
- IF: result is fundamentally wrong → adjust prompt and create again
- Example: Read output.png → spots extra finger → <GI_CLI> edit "remove the extra finger on the left hand" --input output.png --output fixed.png
- Tool: Read (to view image), then Bash (to edit/recreate)

Reference

IF: need full CLI flags, supported sizes, or model details → read reference/commands.md
IF: need to add a new provider → see gemini-image.py as the reference implementation contract

More from this repository

same repository

browser-workflow

orakitine/toolbox

Executes saved browser automation workflows with consistent setup, teardown, and reporting. Loads workflow files and runs them through the browser skill. Use for repeatable browser automations like scraping, form-filling, or monitoring.

2026-05-230

caveman

orakitine/toolbox

Ultra-compressed communication mode. Cuts token usage ~75% by dropping filler, articles, and pleasantries while keeping full technical accuracy. Use when user says "caveman mode", "talk like caveman", "use caveman", "less tokens", "be brief", or invokes /caveman.

2026-05-230

diagnose

orakitine/toolbox

Disciplined diagnosis loop for hard bugs and performance regressions. Reproduce, minimise, hypothesise, instrument, fix, regression-test. Use when user says "diagnose this", "debug this", reports a bug, says something is broken / throwing / failing, or describes a performance regression. Triggers: "diagnose", "debug this", "why is X broken", "X is throwing", "X is failing", "perf regression", "this got slow", "intermittent failure", "flaky test".

2026-05-230

elevenlabs

orakitine/toolbox

Text-to-speech, sound effects, music generation, and audio processing using the ElevenLabs API via the el CLI. Use for voice generation, audio content creation, sound design, or audio processing.

2026-05-230

grill-me

orakitine/toolbox

Relentless interview mode that drives toward shared understanding of a plan or design. Walks every branch of the decision tree one question at a time, each with a recommended answer. Use when user says "grill me", "interview me", or invokes /grill-me.

2026-05-230

grill-with-docs

orakitine/toolbox

Grilling interview that stress-tests a plan against the project's existing domain language and documented decisions, sharpening terminology and updating CONTEXT.md and ADRs inline as decisions crystallise. Use when user says "grill me with docs", "grill on the design", "challenge this against our context", or invokes /grill-with-docs.

2026-05-230

Source

orakitine

orakitine/toolbox

View GitHub Repository View Creator Repositories

Install

Download

Run Skill in Manus

Useful forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	gen-image
description	Generate or edit images using AI image models. Currently supports Google Gemini (Nano Banana Pro for creation, Nano Banana for editing). Provider-agnostic design. Use for image generation, visual content creation, or image editing tasks.
argument-hint	[create\|edit] [prompt or options]
allowed-tools	["Bash","Read"]

Generate or edit images via AI image models. Generic — no style or domain knowledge. Higher-order skills build prompts and call the CLI scripts bundled here.

Prerequisites

python3 must be available in PATH (no pip packages needed — stdlib only)
GEMINI_API_KEY must be set. Add to one of:
- Project .env (for project-level installs): echo 'GEMINI_API_KEY=your-key' >> .env
- Global ~/.claude/.env (for global installs): echo 'GEMINI_API_KEY=your-key' >> ~/.claude/.env
- Or export directly: export GEMINI_API_KEY=your-key

API calls consume Google AI credits. Imagen 4 requires billing. Gemini Flash has a free tier but is rate-limited. See Google AI pricing.

Variables

Workflow

Check Prerequisites
- IF: which python3 fails → report "python3 not found" and stop
- IF: ./gemini-image.py not found → report "gemini-image CLI missing" and stop
- IF: API key not available → run <GI_CLI> models as a lightweight auth check. If it fails with "GEMINI_API_KEY not set", stop and tell the user:
  GEMINI_API_KEY is not configured. Set it in one of:
  - Project-level: add GEMINI_API_KEY=your-key to ./.env
  - Global (recommended for personal use): add GEMINI_API_KEY=your-key to ~/.claude/.env
  - Shell: export GEMINI_API_KEY=your-key
  Get your API key at: https://aistudio.google.com/apikey
- Example: python3 found, gemini-image.py exists, API key valid → proceed
- Tool: Bash
Route Request
- IF: user asks to create an image → go to step 3
- IF: user asks to edit an existing image → go to step 4
- IF: user asks about available models → run <GI_CLI> models and report results
- IF: user asks to check remote API models → run <GI_CLI> models --remote
- Example: "generate a sunset painting" → step 3 (create)
- Example: "make the sky orange in photo.png" → step 4 (edit)
- Example: "what models can I use?" → <GI_CLI> models
- Tool: Bash <GI_CLI> models [--remote] [--json]
Create Image (Text to Image)
- Generate a new image from a text prompt
- IF: no --model → uses DEFAULT_CREATE_MODEL
- IF: no --size → uses DEFAULT_SIZE
- IF: no --output → saves as output.png in current directory
- IF: safety filter triggers (empty response) → rephrase prompt to be more specific, less ambiguous
- IF: rate limit (429) → wait and retry. Imagen 4 has generous limits; Gemini Flash free tier is stricter
- Enrich vague prompts with specific details (style, lighting, composition, colors) before calling the API
- Example: <GI_CLI> create "a watercolor painting of a mountain lake at sunset" --output lake.png
- Example: <GI_CLI> create "pixel art treasure chest, 32x32 sprite" --size 1024x1024 --output chest.png
- Example: <GI_CLI> create "photo of a red sports car" --model gemini-2.0-flash-exp --output car.png
- Tool: Bash <GI_CLI> create <prompt> [--output <path>] [--size <WxH>] [--model <id>]
Edit Image (Image + Instruction)
- Modify an existing image based on a text instruction
- IF: no --model → uses DEFAULT_EDIT_MODEL
- IF: no --output → saves as edited.png
- IF: user wants to overwrite → set --output same as --input
- Describe what to CHANGE, not what to keep. Good: "Make the sky orange." Bad: "Keep everything but change the sky."
- Example: <GI_CLI> edit "change the background to a beach" --input photo.png --output beach.png
- Example: <GI_CLI> edit "remove the text overlay" --input screenshot.png --output clean.png
- Tool: Bash <GI_CLI> edit <instruction> --input <source> [--output <path>] [--model <id>]
Inspect and Iterate
- Read the generated image to visually verify the result
- Check for common AI artifacts: extra limbs, baked-in text, wrong colors, white borders, blur
- IF: result needs small fixes → use edit command on the output
- IF: result is fundamentally wrong → adjust prompt and create again
- Example: Read output.png → spots extra finger → <GI_CLI> edit "remove the extra finger on the left hand" --input output.png --output fixed.png
- Tool: Read (to view image), then Bash (to edit/recreate)

Reference

IF: need full CLI flags, supported sizes, or model details → read reference/commands.md
IF: need to add a new provider → see gemini-image.py as the reference implementation contract