// Generate and edit images using the Gemini API (Nano Banana). Use this skill when creating images from text prompts, editing existing images, applying style transfers, generating logos with text, creating stickers, product mockups, or any image generation/manipulation task. Supports text-to-image, image editing, multi-turn refinement, and composition from multiple reference images.
| name | gemini-imagegen |
| description | Generate and edit images using the Gemini API (Nano Banana). Use this skill when creating images from text prompts, editing existing images, applying style transfers, generating logos with text, creating stickers, product mockups, or any image generation/manipulation task. Supports text-to-image, image editing, multi-turn refinement, and composition from multiple reference images. |
Generate and edit images using Google's Gemini API. Requires GEMINI_API_KEY environment variable.
When the user doesn't specify a location, save images to:
/Users/samarthgupta/Documents/generated images/
Every generated image gets a companion .md file with the prompt used (e.g., logo.png → logo.md).
When gathering parameters (aspect ratio, resolution), offer the option to specify a custom output location.
Describe scenes narratively, don't list keywords. Gemini has deep language understanding—write prompts like prose, not tags.
❌ "cat, wizard hat, magical, fantasy, 4k, detailed"
✓ "A fluffy orange tabby sits regally on a velvet cushion, wearing an ornate
purple wizard hat embroidered with silver stars. Soft candlelight illuminates
the scene from the left. The mood is whimsical yet dignified."
[Subject + Adjectives] doing [Action] in [Location/Context].
[Composition/Camera]. [Lighting/Atmosphere]. [Style/Media]. [Constraint].
Not every prompt needs every element—match detail to intent.
Prescriptive (user has specific vision): Detailed descriptions, exact specifications Open (exploring/want model creativity): General direction, let model decide details
Both are valid. Ask the user's intent if unclear.
Think like a photographer: describe lens, light, moment.
'Morning Brew Coffee Co'Pattern: Acknowledge → specify change → describe integration → preserve the rest
Names invoke aesthetics. The model learned associations for film stocks, cameras, studios, artists, and styles. Instead of describing characteristics, reference the name directly.
"Portrait at golden hour, shot on Kodak Portra 400"
→ Warm skin tones, pastel highlights, fine grain
"Studio Ghibli forest scene"
→ Lush nature, soft lighting, whimsical atmosphere
"Fashion editorial, Hasselblad medium format"
→ Exceptional detail, shallow DOF, that medium format look
This works for photography, animation, illustration, game art, graphic design, fine art—anything with a recognizable visual identity.
See STYLE_REFERENCE.md for comprehensive lexicon of film stocks, cameras, studios, artists, and styles.
| Model | Best For |
|---|---|
gemini-2.5-flash-image | Speed, iteration, simple generation (1024px fixed) |
gemini-3-pro-image-preview | Text rendering, complex instructions, high-res (up to 4K), multi-image composition, Google Search grounding |
Defaults: Pro model uses 1K resolution, 1:1 aspect. Confirm with user before changing.
Aspect ratios: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9 Resolutions: 1K (~1024px), 2K (~2048px), 4K (~4096px)
Enable with --grounding flag when real-time data helps:
Use chat for iterative editing instead of perfecting prompts in one shot:
→ "Create a logo for Acme Corp"
→ "Make the text bolder"
→ "Add a blue gradient background"
No manual masking needed. Describe changes conversationally:
# Generate from prompt
python scripts/generate_image.py "prompt" output.png [--model MODEL] [--aspect RATIO] [--size SIZE] [--grounding]
# Edit existing image
python scripts/edit_image.py input.png "instruction" output.png [--model MODEL] [--aspect RATIO] [--size SIZE]
# Compose multiple images
python scripts/compose_images.py "instruction" output.png img1.png [img2.png ...] [--model MODEL] [--aspect RATIO] [--size SIZE]
# Interactive multi-turn chat
python scripts/multi_turn_chat.py [--model MODEL] [--output-dir DIR]
Models: gemini-2.5-flash-image (default), gemini-3-pro-image-preview
from google import genai
from google.genai import types
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
response = client.models.generate_content(
model="gemini-2.5-flash-image",
contents=["Your narrative prompt here"],
config=types.GenerateContentConfig(response_modalities=["TEXT", "IMAGE"])
)
for part in response.parts:
if part.inline_data:
# Save image from part.inline_data.data
For Pro model with configuration:
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'],
image_config=types.ImageConfig(aspectRatio="16:9", imageSize="2K"),
tools=[{"google_search": {}}] # Optional grounding
)
Before generating: