Run any Skill in Manus with one click

$pwd:

image

Name: Image
Author: ildunari

// Use whenever Kosta invokes /image or asks Hermes to generate, draw, create, mock up, render, or edit an image using the default Codex/GPT Image 2 path. This is the preferred default image-generation skill; use ComfyUI only when the user explicitly asks for local models, LoRAs, SDXL/Flux workflows, or advanced node/control workflows. Turns loose requests into strong GPT Image 2 prompts and calls Hermes image_generate.

Run Skill in Manus

$ git log --oneline --stat

stars:0

forks:0

updated:May 6, 2026 at 02:48

File Explorer

2 files

SKILL.md

readonly

package.json

"author": "ildunari"

"repository": "ildunari/skillshare"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name	image
description	Use whenever Kosta invokes /image or asks Hermes to generate, draw, create, mock up, render, or edit an image using the default Codex/GPT Image 2 path. This is the preferred default image-generation skill; use ComfyUI only when the user explicitly asks for local models, LoRAs, SDXL/Flux workflows, or advanced node/control workflows. Turns loose requests into strong GPT Image 2 prompts and calls Hermes image_generate.
version	0.3.0
author	Hermes Agent
license	MIT
targets	["hermes-default","hermes-gpt","claude-hermes"]
metadata	{"hermes":{"command_priority":500,"tags":["image-generation","gpt-image-2","codex","image","prompt"]}}

Image — Codex GPT Image 2

Use this as Kosta's default image command. /image <request> means: understand the desired artifact, write a good GPT Image 2 prompt, then call Hermes' image_generate tool.

Do not route ordinary /image requests to ComfyUI. ComfyUI is for explicit local/LoRA/workflow requests. The default path is Codex/OpenAI GPT Image 2 through Hermes image_generate.

First move

Generate immediately. Do not ask clarifying questions unless the user's intent is genuinely ambiguous (e.g., "image" with no other content, or an unresolvable conflict like "landscape portrait of a building in square format"). All of these are clear enough — proceed without asking:

Any subject described in ≥4 words
Any /image <noun/scene/concept> with obvious visual interpretation
Requests that name style, mood, format, brand, or use case
Edit requests with before/after described
UI/mockup/screenshot requests (even if only the category is named)

Never ask about: style, aspect ratio, level of detail, color palette, realism vs. illustration, number of variants, model, or background — unless the user explicitly raised it. Infer defaults from context.

Aspect ratio inference

Pick the ratio before prompting. Use the first matching rule:

Trigger words / context	Ratio
wide, banner, hero, desktop, cinematic, landscape, panorama, scene, wallpaper (horizontal)	`landscape`
phone, mobile, portrait, poster, tall, story (IG/social), wallpaper (vertical)	`portrait`
everything else (product shot, logo mark, avatar, icon, thumbnail, default)	`square`

When the request contains conflicting signals (e.g., "portrait of a landscape"), pick portrait for human subjects and landscape for scenes.

Prompt structure

Rewrite the user request into a concise production prompt. GPT Image 2 responds best to structure, not keyword sludge.

Use this order when it helps:

Create/draw/render [artifact type and intended use].

Scene / background:
[where it exists, environment, mood]

Subject:
[main subject, pose/action, scale, expression, important relationships]

Visual details:
[materials, textures, era, palette, realism/style, visible props]

Composition:
[framing, viewpoint, placement, negative space, orientation]

Lighting:
[soft window light, golden hour, studio, high contrast, etc.]

Exact text, if any:
"TEXT TO RENDER" — [font style, color, placement]. No other text.

Constraints:
[no watermark, no logos/trademarks, no extra text, preserve X, avoid Y]

Short requests do not need the whole template. A crisp paragraph is fine when it carries the same information.

GPT Image 2 rules that matter

Use action words: create, draw, render, or edit. For edits, say "edit the image by changing X" rather than vague "combine/merge."
For photorealism, include photorealistic or "real photograph / professional photography / iPhone photo." Do not rely on "8K, ultra detailed, masterpiece" boilerplate.
Be concrete about visual facts: materials, shape, texture, placement, lighting, camera angle, and mood.
For text in the image, put exact copy in quotes, specify typography and placement, and say verbatim, no extra text. Spell unusual words letter-by-letter if accuracy matters.
For edits or reference-based work, separate Change from Preserve. Example: "Change only the background to a rainy Tokyo street. Preserve the person's face, pose, clothing, camera angle, lighting direction, and color grade."
Iterate with small changes. If the first result is close, ask for one targeted edit rather than rewriting everything.
For dense charts/spreadsheets/tiny labels, warn that image generation is the wrong final-format tool; use real document/design tooling if precision matters.
GPT Image 2 does not support transparent backgrounds in the current API path. If transparency is requested, generate on a plain high-contrast background or use post-processing unless the configured provider changes.

Calling the tool

image_generate(prompt=<final prompt>, aspect_ratio="landscape" | "square" | "portrait")

aspect_ratio must be exactly one of "landscape", "square", or "portrait". No other values are accepted.

After generation, return the image directly. Add only a short note if useful, e.g. "I kept it square and optimized for product-shot realism."

Failure recovery

If image_generate returns an error or empty result:

Content policy rejection — rewrite the prompt to remove potentially flagged phrasing (e.g., real brand names, explicit depictions, celebrity likenesses). Try once with cleaned prompt before reporting.
Tool unavailable / timeout — report: "image_generate failed: [error]. Try again or check Hermes tool status."
Result misses a key explicit constraint — after receiving the result, check it against the request's primary constraints: required text present and legible, correct orientation for the use case, correct subject. If a constraint is clearly violated (text missing or garbled, wrong orientation, wrong subject count), identify the broken constraint, rewrite that section of the prompt only, and retry once automatically — do not ask permission. Deliver the final result with a one-line note on what was fixed. If the second attempt also fails the constraint, report both attempts and the specific mismatch.

When to use high-effort prompting

Use the full structured prompt for:

UI mockups, screenshots, app screens, product concepts
Posters, ads, packaging, logos/marks, thumbnails with text
Infographics and diagrams
Photorealistic product/people/scene images
Edits where identity, geometry, layout, or brand feel must be preserved

For UI/UX images, also follow the gpt-image-2-uiux-prompting skill if available.

Examples

User: /image a slick product shot of a black ceramic mug for a coffee brand

Tool prompt:

Create a photorealistic studio product photograph for a premium coffee brand.

Subject: a matte black ceramic mug with a subtle curved handle, empty, centered on a warm stone surface.
Visual details: soft ceramic texture, tiny rim highlights, faint coffee-bean shadows nearby, no visible logo.
Composition: square crop, mug centered with tasteful negative space, eye-level product photography.
Lighting: soft warm key light from upper left, gentle fill, natural contact shadow.
Constraints: no text, no watermark, no fake brand logo, no extra objects except subtle coffee beans/shadows.

User: /image mobile onboarding screen for a meditation app, calm but not generic

Tool prompt:

Generate one realistic iPhone portrait screenshot of a native iOS meditation app onboarding screen.

Product intent: help a new user choose a calming daily practice without feeling like a wellness cliché.
Screen state: first onboarding screen with a single hero illustration, short headline, one primary CTA, and small secondary sign-in link.
Layout: native iOS safe areas, SF Pro-like typography, generous spacing, bottom CTA above the home indicator.
Visual style: calm editorial UI, warm off-white background, muted sage and clay accents, subtle grain, no stock-photo look.
Exact visible text: headline "Find your quiet minute", CTA "Start", secondary link "Sign in". No other text.
Constraints: straight-on screenshot, normal iPhone proportions, no device mockup frame, no poster layout, no duplicate buttons, no fake logos, no watermark.

References

references/gpt-image-2-prompting.md — current GPT Image 2 prompt rules and constraints.

name	image
description	Use whenever Kosta invokes /image or asks Hermes to generate, draw, create, mock up, render, or edit an image using the default Codex/GPT Image 2 path. This is the preferred default image-generation skill; use ComfyUI only when the user explicitly asks for local models, LoRAs, SDXL/Flux workflows, or advanced node/control workflows. Turns loose requests into strong GPT Image 2 prompts and calls Hermes image_generate.
version	0.3.0
author	Hermes Agent
license	MIT
targets	["hermes-default","hermes-gpt","claude-hermes"]
metadata	{"hermes":{"command_priority":500,"tags":["image-generation","gpt-image-2","codex","image","prompt"]}}

Image — Codex GPT Image 2

Use this as Kosta's default image command. /image <request> means: understand the desired artifact, write a good GPT Image 2 prompt, then call Hermes' image_generate tool.

Do not route ordinary /image requests to ComfyUI. ComfyUI is for explicit local/LoRA/workflow requests. The default path is Codex/OpenAI GPT Image 2 through Hermes image_generate.

First move

Any subject described in ≥4 words
Any /image <noun/scene/concept> with obvious visual interpretation
Requests that name style, mood, format, brand, or use case
Edit requests with before/after described
UI/mockup/screenshot requests (even if only the category is named)

Aspect ratio inference

Pick the ratio before prompting. Use the first matching rule:

Trigger words / context	Ratio
wide, banner, hero, desktop, cinematic, landscape, panorama, scene, wallpaper (horizontal)	`landscape`
phone, mobile, portrait, poster, tall, story (IG/social), wallpaper (vertical)	`portrait`
everything else (product shot, logo mark, avatar, icon, thumbnail, default)	`square`

When the request contains conflicting signals (e.g., "portrait of a landscape"), pick portrait for human subjects and landscape for scenes.

Prompt structure

Rewrite the user request into a concise production prompt. GPT Image 2 responds best to structure, not keyword sludge.

Use this order when it helps:

Create/draw/render [artifact type and intended use].

Scene / background:
[where it exists, environment, mood]

Subject:
[main subject, pose/action, scale, expression, important relationships]

Visual details:
[materials, textures, era, palette, realism/style, visible props]

Composition:
[framing, viewpoint, placement, negative space, orientation]

Lighting:
[soft window light, golden hour, studio, high contrast, etc.]

Exact text, if any:
"TEXT TO RENDER" — [font style, color, placement]. No other text.

Constraints:
[no watermark, no logos/trademarks, no extra text, preserve X, avoid Y]

Short requests do not need the whole template. A crisp paragraph is fine when it carries the same information.

GPT Image 2 rules that matter

Use action words: create, draw, render, or edit. For edits, say "edit the image by changing X" rather than vague "combine/merge."
For photorealism, include photorealistic or "real photograph / professional photography / iPhone photo." Do not rely on "8K, ultra detailed, masterpiece" boilerplate.
Be concrete about visual facts: materials, shape, texture, placement, lighting, camera angle, and mood.
For text in the image, put exact copy in quotes, specify typography and placement, and say verbatim, no extra text. Spell unusual words letter-by-letter if accuracy matters.
For edits or reference-based work, separate Change from Preserve. Example: "Change only the background to a rainy Tokyo street. Preserve the person's face, pose, clothing, camera angle, lighting direction, and color grade."
Iterate with small changes. If the first result is close, ask for one targeted edit rather than rewriting everything.
For dense charts/spreadsheets/tiny labels, warn that image generation is the wrong final-format tool; use real document/design tooling if precision matters.
GPT Image 2 does not support transparent backgrounds in the current API path. If transparency is requested, generate on a plain high-contrast background or use post-processing unless the configured provider changes.

Calling the tool

image_generate(prompt=<final prompt>, aspect_ratio="landscape" | "square" | "portrait")

aspect_ratio must be exactly one of "landscape", "square", or "portrait". No other values are accepted.

After generation, return the image directly. Add only a short note if useful, e.g. "I kept it square and optimized for product-shot realism."

Failure recovery

If image_generate returns an error or empty result:

Content policy rejection — rewrite the prompt to remove potentially flagged phrasing (e.g., real brand names, explicit depictions, celebrity likenesses). Try once with cleaned prompt before reporting.
Tool unavailable / timeout — report: "image_generate failed: [error]. Try again or check Hermes tool status."
Result misses a key explicit constraint — after receiving the result, check it against the request's primary constraints: required text present and legible, correct orientation for the use case, correct subject. If a constraint is clearly violated (text missing or garbled, wrong orientation, wrong subject count), identify the broken constraint, rewrite that section of the prompt only, and retry once automatically — do not ask permission. Deliver the final result with a one-line note on what was fixed. If the second attempt also fails the constraint, report both attempts and the specific mismatch.

When to use high-effort prompting

Use the full structured prompt for:

UI mockups, screenshots, app screens, product concepts
Posters, ads, packaging, logos/marks, thumbnails with text
Infographics and diagrams
Photorealistic product/people/scene images
Edits where identity, geometry, layout, or brand feel must be preserved

For UI/UX images, also follow the gpt-image-2-uiux-prompting skill if available.

Examples

User: /image a slick product shot of a black ceramic mug for a coffee brand

Tool prompt:

Create a photorealistic studio product photograph for a premium coffee brand.

Subject: a matte black ceramic mug with a subtle curved handle, empty, centered on a warm stone surface.
Visual details: soft ceramic texture, tiny rim highlights, faint coffee-bean shadows nearby, no visible logo.
Composition: square crop, mug centered with tasteful negative space, eye-level product photography.
Lighting: soft warm key light from upper left, gentle fill, natural contact shadow.
Constraints: no text, no watermark, no fake brand logo, no extra objects except subtle coffee beans/shadows.

User: /image mobile onboarding screen for a meditation app, calm but not generic

Tool prompt:

Generate one realistic iPhone portrait screenshot of a native iOS meditation app onboarding screen.

Product intent: help a new user choose a calming daily practice without feeling like a wellness cliché.
Screen state: first onboarding screen with a single hero illustration, short headline, one primary CTA, and small secondary sign-in link.
Layout: native iOS safe areas, SF Pro-like typography, generous spacing, bottom CTA above the home indicator.
Visual style: calm editorial UI, warm off-white background, muted sage and clay accents, subtle grain, no stock-photo look.
Exact visible text: headline "Find your quiet minute", CTA "Start", secondary link "Sign in". No other text.
Constraints: straight-on screenshot, normal iPhone proportions, no device mockup frame, no poster layout, no duplicate buttons, no fake logos, no watermark.

References

references/gpt-image-2-prompting.md — current GPT Image 2 prompt rules and constraints.