with one click
baoyu-image-gen
// AI SDK-based image generation using official OpenAI and Google APIs. Supports text-to-image, reference images, aspect ratios, and quality presets.
// AI SDK-based image generation using official OpenAI and Google APIs. Supports text-to-image, reference images, aspect ratios, and quality presets.
Knowledge comic creator supporting multiple styles (Logicomix/Ligne Claire, Ohmsha manga guide). Creates original educational comics with detailed panel layouts and sequential image generation. Use when user asks to create "知识漫画", "教育漫画", "biography comic", "tutorial comic", or "Logicomix-style comic".
A fundamental skill that demonstrates the basic execution of a Python script. It serves as a "Hello, World!" example for the skill system, verifying that the environment is correctly set up and that the agent can execute scripts.
Complete browser automation with Playwright. Auto-detects dev servers, writes clean test scripts to /tmp. Test pages, fill forms, take screenshots, check responsive design, validate UX, test login flows, check links, automate any browser task. Use when user wants to test websites, automate browser interactions, validate web functionality, or perform any browser-based testing.
Comprehensive CrewAI framework guide for building collaborative AI agent teams and structured workflows. Use when developing multi-agent systems with CrewAI, creating autonomous AI crews, orchestrating flows, implementing agents with roles and tools, or building production-ready AI automation. Essential for developers building intelligent agent systems, task automation, and complex AI workflows.
Convert various file formats (PDF, Office documents, images, audio, web content, structured data) to Markdown optimized for LLM processing. Use when converting documents to markdown, extracting text from PDFs/Office files, transcribing audio, performing OCR on images, extracting YouTube transcripts, or processing batches of files. Supports 20+ formats including DOCX, XLSX, PPTX, PDF, HTML, EPUB, CSV, JSON, images with OCR, and audio with transcription.
Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).
| name | baoyu-image-gen |
| description | AI SDK-based image generation using official OpenAI and Google APIs. Supports text-to-image, reference images, aspect ratios, and quality presets. |
| tools | [{"name":"generate_comic_image","script":"scripts/main.ts","description":"生成单张漫画图像(需要提示词和路径)","parameters":{"prompt":{"type":"string","description":"图像生成提示词","required":true},"path":{"type":"string","description":"输出文件路径","required":true},"ar":{"type":"string","description":"宽高比","required":false},"quality":{"type":"string","description":"质量预设","required":false}}}] |
Official API-based image generation via AI SDK. Supports OpenAI (DALL-E, GPT Image) and Google (Imagen, Gemini multimodal).
Important: All scripts are located in the scripts/ subdirectory of this skill.
Agent Execution Instructions:
SKILL_DIR${SKILL_DIR}/scripts/<script-name>.ts${SKILL_DIR} in this document with the actual pathScript Reference:
| Script | Purpose |
|---|---|
scripts/main.ts | CLI entry point for image generation |
# Basic generation (auto-detect provider)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png
# With aspect ratio
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A landscape" --image landscape.png --ar 16:9
# High quality (2k)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality 2k
# Specific provider
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --provider openai
# From prompt files
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png
# With reference images (Google multimodal only)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png
# Generate with prompt
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A sunset over mountains" --image sunset.png
# Shorthand
npx -y bun ${SKILL_DIR}/scripts/main.ts -p "A cute robot" --image robot.png
# Common ratios: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A portrait" --image portrait.png --ar 3:4
# Or specify exact size
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Banner" --image banner.png --size 1792x1024
# Image editing with reference
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make it blue" --image blue.png --ref original.png
# Multiple references
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Combine these styles" --image out.png --ref a.png b.png
# Normal quality (default)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality normal
# High quality (2k resolution)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality 2k
# Plain output (prints saved path)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png
# JSON output
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --json
| Option | Description |
|---|---|
--prompt <text>, -p | Prompt text |
--promptfiles <files...> | Read prompt from files (concatenated) |
--image <path> | Output image path (required) |
--provider google|openai | Force provider (default: google) |
--model <id>, -m | Model ID |
--ar <ratio> | Aspect ratio (e.g., 16:9, 1:1, 4:3) |
--size <WxH> | Size (e.g., 1024x1024) |
--quality normal|2k | Quality preset (default: normal) |
--ref <files...> | Reference images (Google multimodal only) |
--n <count> | Number of images |
--json | JSON output |
--help, -h | Show help |
| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY | OpenAI API key | - |
GOOGLE_API_KEY | Google API key | - |
OPENAI_IMAGE_MODEL | OpenAI model | gpt-image-1.5 |
GOOGLE_IMAGE_MODEL | Google model | gemini-3-pro-image-preview |
OPENAI_BASE_URL | Custom OpenAI endpoint | - |
GOOGLE_BASE_URL | Custom Google endpoint | - |
Load Priority: CLI args > process.env > <cwd>/.baoyu-skills/.env > ~/.baoyu-skills/.env
--provider specified → use it| Model Category | API Function | Example Models |
|---|---|---|
| Google Multimodal | generateText | gemini-2.0-flash-exp-image-generation |
| Google Imagen | experimental_generateImage | imagen-3.0-generate-002 |
| OpenAI | experimental_generateImage | gpt-image-1, dall-e-3 |
Google:
gemini-3-pro-image-preview - Default, multimodal generationgemini-2.0-flash-exp-image-generation - Gemini 2.0 Flashimagen-3.0-generate-002 - Imagen 3OpenAI:
gpt-image-1.5 - Default, GPT Image 1.5gpt-image-1 - GPT Image 1dall-e-3 - DALL-E 3| Preset | OpenAI | Use Case | |
|---|---|---|---|
normal | 1024x1024 | Default | Covers, illustrations |
2k | 2048x2048 | "2048px" in prompt | Infographics, slides |
"... aspect ratio 16:9")aspectRatio or size parameternpx -y bun ${SKILL_DIR}/scripts/main.ts \
--prompt "A minimalist tech illustration with blue gradients" \
--image cover.png --ar 2.35:1 --quality 2k
npx -y bun ${SKILL_DIR}/scripts/main.ts \
--prompt "Instagram post about coffee" \
--image post.png --ar 1:1
npx -y bun ${SKILL_DIR}/scripts/main.ts \
--prompt "Change the background to sunset" \
--image edited.png --ref original.png --provider google
# Create prompt file with detailed instructions
npx -y bun ${SKILL_DIR}/scripts/main.ts \
--promptfiles style-guide.md scene-description.md \
--image scene.png
Custom configurations via EXTEND.md.
Check paths (priority order):
.baoyu-skills/baoyu-image-gen/EXTEND.md (project)~/.baoyu-skills/baoyu-image-gen/EXTEND.md (user)If found, load before workflow. Extension content overrides defaults.