بنقرة واحدة
image-gen
// Generate and edit images using the Gemini API. Text-to-image, image editing, multi-turn iteration, 4K resolution, search grounding.
// Generate and edit images using the Gemini API. Text-to-image, image editing, multi-turn iteration, 4K resolution, search grounding.
Review GitHub pull requests with structured code analysis. Use when asked to review a PR, check a pull request, or audit code changes.
CLI to manage emails via IMAP/SMTP. Use `himalaya` to list, read, write, reply, forward, search, and organize emails from the terminal. Supports multiple accounts and message composition with MML (MIME Meta Language).
Personalize the agent — interview the user to build their profile (USER.md) and craft the agent's personality (SOUL.md). Triggered by 'onboard', 'personalize', 'set up my soul', etc.
Control macOS via AppleScript/osascript. Manage windows (move, resize, tile), apps (launch, quit, focus), system (volume, dark mode, notifications), Spotify, browsers, Calendar, Reminders, Finder, and clipboard. Use when the user asks to control their Mac, arrange windows, manage apps, or interact with native macOS features.
Master multi-agent orchestration using Claude Code's TeammateTool and Task system. Use when coordinating multiple agents, running parallel code reviews, creating pipeline workflows with dependencies, building self-organizing task queues, or any task benefiting from divide-and-conquer patterns.
Interact with GitHub using the `gh` CLI. Use `gh issue`, `gh pr`, `gh run`, and `gh api` for issues, PRs, CI runs, and advanced queries.
| name | image-gen |
| description | Generate and edit images using the Gemini API. Text-to-image, image editing, multi-turn iteration, 4K resolution, search grounding. |
| metadata | {"requires":{"bins":["curl","jq"],"env":["GEMINI_API_KEY"]}} |
Generate and edit images via Gemini's native image generation API using curl.
| Model | ID | Best for |
|---|---|---|
| Nano Banana | gemini-2.5-flash-image | Fast, high-volume, low-latency |
| Nano Banana Pro | gemini-3-pro-image-preview | Pro asset production, complex prompts, accurate text rendering, 4K |
Default to gemini-2.5-flash-image unless the user asks for high quality, 4K, search grounding, or text-heavy images.
curl -s -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{"parts": [{"text": "YOUR PROMPT HERE"}]}]
}' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 -D > output.png
Read the output image file to show it to the user.
Encode an existing image as base64 and send it alongside a text prompt:
BASE64_IMG=$(base64 -i input.png)
curl -s -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"contents\": [{
\"parts\": [
{\"text\": \"YOUR EDIT PROMPT HERE\"},
{\"inline_data\": {\"mime_type\": \"image/png\", \"data\": \"$BASE64_IMG\"}}
]
}]
}" | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 -D > output.png
When using gemini-3-pro-image-preview, you can set aspect ratio and resolution:
curl -s -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{"parts": [{"text": "YOUR PROMPT HERE"}]}],
"generationConfig": {
"responseModalities": ["TEXT", "IMAGE"],
"imageConfig": {
"aspectRatio": "16:9",
"imageSize": "2K"
}
}
}' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 -D > output.png
1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
1K (default), 2K, 4K — must be uppercase K.
Generate images based on real-time info (weather, news, etc.):
curl -s -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{"parts": [{"text": "YOUR PROMPT HERE"}]}],
"tools": [{"google_search": {}}],
"generationConfig": {
"responseModalities": ["TEXT", "IMAGE"]
}
}' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 -D > output.png
The API returns JSON with parts that can contain text and/or image data. Extract text and image separately:
# save full response
RESPONSE=$(curl -s -X POST ... )
# extract text (if any)
echo "$RESPONSE" | jq -r '.candidates[0].content.parts[] | select(.text) | .text'
# extract and save image
echo "$RESPONSE" | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 -D > output.png
inlineData, check for error messages in the JSON