| name | gpt-atelier |
| description | OpenAI GPT Image generation and editing (gpt-image-2, 1.5, 1, mini). Text-to-image, mask-based inpainting, multi-reference composition, multi-turn conversational editing via Responses API, streaming with partial images. This skill should be used when generating or editing images via OpenAI's image models, when near-perfect text rendering in images is needed, when mask-based region-aware editing is required, when multi-turn conversational image editing is desired, or when streaming progressive image delivery is needed. Complements nano-banana-pro (Gemini) as a parallel image generation backend.
|
GPT Atelier
OpenAI GPT Image generation and editing. Wraps both the Image API (one-shot) and Responses API (multi-turn conversational) with 6 scripts covering generate, edit, compose, converse, stream, and test workflows.
Prerequisite: OPENAI_API_KEY environment variable.
Models
| Flag | Model | Strengths |
|---|
| (default) | gpt-image-2 | Reasoning-based, ~99% text rendering, up to 8 consistent images, 4K, streaming |
--fast | gpt-image-1.5 | Region-aware editing, 4x faster, cheaper |
--mini | gpt-image-1-mini | Cheapest ($0.006/image low quality) |
Quick Start
python3 scripts/test_connection.py --check-models
python3 scripts/generate_image.py "A Minoan bull-leaper under golden light"
python3 scripts/compare_models.py "A bronze seal stamp in Minoan style" --all --open
python3 scripts/edit_image.py "Replace the sky with a dramatic sunset" photo.png --mask sky_mask.png
python3 scripts/compose_images.py "Create a gift basket containing these items" item1.png item2.png item3.png
python3 scripts/converse_image.py
python3 scripts/stream_image.py "An ancient fresco being restored" --partials 3
Image API scripts share: --output DIR, --filename NAME, --quality low|medium|high, --format png|jpeg|webp, --fast, --mini. converse_image.py uses --orchestrator instead of --fast/--mini.
Core Workflows
1. Text-to-Image Generation
python3 scripts/generate_image.py "prompt" [options]
| Option | Default | Description |
|---|
--size | auto | WxH or preset: square, landscape, portrait, wide, 2k, 4k, 4k-portrait |
--quality | high | low, medium, high, auto |
--n | 1 | Number of images (1-8) |
--format | png | png, jpeg (faster), webp |
--compression | — | 0-100 for jpeg/webp |
--background | — | opaque, transparent, auto |
--moderation | auto | auto, low |
python3 scripts/generate_image.py \
"High-end product photography of luxury watch on black marble, dramatic key light, f/5.6, commercial quality" \
--size square --quality high
python3 scripts/generate_image.py "Quick sketch of a coffee cup" --mini --quality low
python3 scripts/generate_image.py "Logo design for a tech startup" --n 4 --size square
2. Image Editing
python3 scripts/edit_image.py "instruction" input.png [options]
Additional options: --mask, --images (extra references).
Auto-converts B&W masks to RGBA (requires Pillow: pip install Pillow).
python3 scripts/edit_image.py "Add a flamingo to the pool" lounge.png --mask pool_mask.png
python3 scripts/edit_image.py "Convert to watercolor painting style" photo.jpg
python3 scripts/edit_image.py "Replace the car with this bicycle" street.png --images bicycle.png
3. Multi-Reference Composition
python3 scripts/compose_images.py "instruction" img1.png img2.png [img3.png ...] [options]
Requires 2+ reference images. The model creates a new image incorporating all references.
python3 scripts/compose_images.py \
"Create a mood board combining these design elements" \
texture.png palette.png sketch.png --size landscape
4. Multi-Turn Conversational Editing (Responses API)
python3 scripts/converse_image.py [prompt] [options]
Without a prompt, enters interactive REPL. Maintains conversation state via previous_response_id.
python3 scripts/converse_image.py --auto-save
> A cyberpunk street scene at night
> Now add neon signs with Japanese text
> Make it rain and add reflections
> /save final_scene
python3 scripts/converse_image.py "Design a coffee brand logo" --output ./logos
Interactive commands: /save, /action auto|generate|edit, /model, /clear, /history, /help, /quit.
Attach images with @path: @logo.png Add this logo to the top-right corner.
Additional options:
| Flag | Orchestrator | Tradeoff |
|---|
| (default) | gpt-5.4 | Standard quality, fastest |
--thinking | gpt-5.4-thinking | Better composition for complex scenes, slower |
--pro | gpt-5.4-pro | Highest quality, most expensive |
--orchestrator MODEL | any | Manual override |
5. Model Comparison (HTML Page)
python3 scripts/compare_models.py "A Minoan bull-leaper" --open
python3 scripts/compare_models.py "prompt" --all --open
python3 scripts/compare_models.py "" --list-models
Generates the same prompt across multiple models and outputs a dark-themed HTML comparison page. See references/compare-models-reference.md for full options.
6. Streaming with Partial Images
python3 scripts/stream_image.py "prompt" --partials N [options]
Outputs partial images as they generate, then the final image.
python3 scripts/stream_image.py "A detailed architectural drawing" --partials 3 --save-partials
Additional option: --save-partials to save intermediate images.
Prompting Quick-Hits
- Lead with scene/style, not subject. First words carry highest visual weight. Specify intended use (ad, UI mockup, editorial) so the model picks the right polish level.
- Always double-quote literal text.
"HELLO WORLD" engages the high-accuracy text rendering engine.
- Pixel dimensions in prompt. For custom aspect ratios, append
"Output in exactly WxH (R:R ratio) resolution" — the API size param alone is unreliable. Done automatically by inject_size_hint() for gpt-image-2.
- Use
--thinking for complex scenes. The orchestrator model matters — Thinking models produce significantly better multi-element compositions.
- Generate fresh, don't edit. Reference-image editing on gpt-image-2 produces yellow tint and poor prompt adherence. For design-final work, generate from scratch.
See references/prompting-guide.md for full details.
When to Use GPT Atelier vs Nano Banana Pro
| Task | GPT Atelier | Nano Banana Pro |
|---|
| Text rendering (complex, non-Latin) | Best | Good |
| Mask-based inpainting | Native, low drift | Higher drift (~40%) |
| Reference-image editing | Yellow tint risk — use --fast | Better fidelity |
| Multi-turn editing with state | Responses API | Multi-turn chat |
| Photorealism | Excellent | Excellent |
| Cinematic digital painting | Good | Best |
| UI mockups / screenshots | Best | Good |
| Multi-image consistency | Up to 8/prompt | Up to 14 references |
| Streaming partial delivery | Native | Not available |
| Dark/artistic themes | Stricter policy | More permissive |
| Budget/volume | $0.006/img (mini low) | Gemini pricing |
| Arbitrary aspect ratios | Any (16px multiples) | 10 presets |
Cost Control
Start with --quality low ($0.006/image) for ideation. Graduate to medium ($0.05) for review, high ($0.21) for final assets. Use --fast for cheaper generation, --mini for maximum cost savings. Use --format jpeg for faster response times.
Script Reference
| Script | Purpose | API |
|---|
generate_image.py | Text-to-image | Image API |
edit_image.py | Edit with mask/references | Image API |
compose_images.py | Multi-reference composition | Image API |
converse_image.py | Multi-turn editing | Responses API |
compare_models.py | Side-by-side model comparison (HTML) | Image API |
stream_image.py | Streaming with partials | Image API |
test_connection.py | Connectivity check | Models API |
Reference Documentation
| File | Contents |
|---|
references/api-reference.md | Full parameter reference, pricing, size constraints |
references/compare-models-reference.md | Model comparison script: options, examples, HTML output |
references/prompting-guide.md | Prompt engineering patterns, text rendering, style keywords |
references/troubleshooting.md | Error codes, content policy, rate limits |