| name | batch-image-pdf |
| description | Use when the user wants to turn source content into a coherent batch of AI-generated visual pages, posters, route books/路书, illustrated roadbooks, visual decks/PPT-style pages, hand-drawn-style PPT, image-model-generated PPT, social carousel posters, concept poster series, or any multi-page text-image visual output. Trigger on requests like 批量生成图片/海报, 生成路书/录书, 生成图文PPT, 用生图模型生成PPT, 手绘版PPT, 批量用生图模型生成视觉页, 把文档做成一组海报/PDF/PPT. This skill covers interactive clarification, recommending page count, aspect ratio and style, semantic decomposition, information-density planning, integrated text-image prompt design, generating each finished page with ChatGPT Images 2.0/image_gen, saving final assets, writing prompts.md, and assembling images into an ordered PDF by default. |
Batch Image PDF
Overview
Create a coherent image set from source content, then export it as a PDF plus a Markdown prompt record. The skill is optimized for route books/路书, travel roadbooks, batch poster series, image-model-generated PPT-style pages, hand-drawn-style PPT, multi-page visual explainers, children's science sheets, illustrated stories, legal or business visual summaries, product concept boards, social media carousels, and any task where consistency across several generated images matters.
When the input is a word, phrase, short sentence, slogan, title, or abstract concept, default to the concept-poster method below: translate meaning into a high-end visual field with strong typography, precise metaphor, restrained composition, and integrated text-image relationships.
Trigger Phrases
Use this skill for requests phrased like:
- 生成路书 / 做一份旅行路书 / 把这个文档做成路书
- 生成录书 when the context means a visual route/booklet-style output
- 批量生成海报 / 批量生成图片 / 做一组图文海报
- 用生图模型生成PPT / 批量用生图模型生成PPT / 图像模型PPT
- 手绘版PPT / 插画版PPT / 海报风PPT / 图文一体PPT
- 把文章、文档、课程、故事、方案、合同摘要、活动策划做成一组视觉页
If the user specifically requires an editable .pptx, coordinate with the presentation/PPTX skill after generating or planning the visual pages. If the user wants image-based slides or a PDF-style visual deck, this skill can complete the core workflow directly.
Workflow
1. Clarify Before Generating
Start by reading the user's source content and proposing a concrete production plan. Do not skip the user-facing clarification step unless the user explicitly says to proceed without questions. The clarification can include recommendations, but the user should see and approve the key production choices before image generation.
Cover these decisions:
- Output purpose: presentation PDF, printable handout, social carousel, storyboard, children's worksheet, concept board, etc.
- Suggested image/page count: derive from content density. Typical defaults: 4-6 for a quick concept preview, 6-10 for a usable illustrated explainer, 8-12 for a dense briefing or route book. Do not compress a rich source into too few pages when the output needs to communicate information.
- Aspect ratio and final PDF size: recommend based on use, then ask the user to approve. Use 16:9 for presentations/video, 4:5 or 1:1 for social, 3:4 or 4:3 for illustrated pages, A4 portrait/landscape when print is primary.
- Information density: ask the user to choose or approve
light, standard, or dense. Default to standard for normal documents and dense for route books, briefings, teaching materials, and operational guides.
- Visual style: propose 2-3 suitable styles tied to the topic and audience, then let the user choose or approve.
- Text-in-image policy: for this skill, default to integrated text-image generation. The image model should design the page with its title, short labels, concise information blocks, visual metaphor, and composition together in one generation. Do not default to generating a decorative background first and adding text later.
- Concept-poster policy: if the source is a word, phrase, title, slogan, or short conceptual sentence, recommend a visual translation strategy instead of ordinary illustration. Ask whether auxiliary text is allowed only when it would change the design.
- Consistency anchors: palette, lighting, character design, line quality, camera/framing, typography avoidance, recurring symbols, and negative constraints.
- File destination and PDF name when the user cares; otherwise create a dated folder in the current workspace.
Use a concise approval prompt like:
我建议做成 <N> 页,<ratio/page size>,信息密度 <light/standard/dense>,风格为 <style>。每页会保留 <3-7> 个来自原文的关键要点,同时用一个主视觉隐喻统摄。是否按这个方向生成?
If the user asks to proceed immediately and the source is clear, state the defaults you are applying before continuing.
2. Semantic And Visual Translation
Before splitting pages or writing prompts, analyze the source content and let the visual language grow from meaning. For Chinese words, phrases, or short sentences, also decompose important characters instead of treating the phrase as one flat label.
Analyze these dimensions when relevant:
- Surface meaning and deep implication
- Core theme, emotional temperament, hidden conflict, and psychological tension
- Cultural associations, historical or social atmosphere, and time period cues
- Character fate, narrative pressure, symbolic weight, and metaphor potential
- For Chinese text: each key character's literal meaning, cultural symbolism, emotional weight, visual potential, and relationship to the whole
- For longer sentences: the true core word or character with visual authority; do not distribute attention evenly across all text
For long documents, first make an information extraction sheet. Do not jump directly from source text to image prompt.
Use this compact structure:
Source structure: <major sections / timeline / argument flow>
Must-keep facts: <specific facts, dates, places, names, constraints, warnings>
Page candidates:
1. <page title> — <3-7 source-derived key points that must remain visible or visually encoded>
2. <page title> — <3-7 source-derived key points that must remain visible or visually encoded>
...
Compression risk: <what would be lost if page count is too low>
Recommended page count: <N and reason>
Then choose:
- One primary visual metaphor, preferably single and strong
- A whole-image visual field rather than a divided information board
- A composition in which title, image, space, reading path, and atmosphere depend on one another
- A restrained palette of usually 2-4 main colors derived from meaning and mood
- 1-3 key subjects or symbolic objects that perform the concept through action, distance, scale, pressure, concealment, waiting, falling, offering, crossing, confrontation, absence, or attachment
Do not make a generic illustration, a simple enlarged word poster, a magazine-like information layout, or a page split into separate title/image/explanation blocks. Also do not reduce document-based tasks into pure mood posters that lose key content.
3. Produce A Generation Brief
Before calling image generation, produce a brief for user approval unless they explicitly asked to skip confirmation.
Use this structure:
Project: <short title>
Output: <N> images -> ordered PDF + prompts.md
Audience/use: <audience and use case>
Aspect ratio: <ratio and reason>
Information density: <light / standard / dense>
Style lock: <one paragraph describing global visual style>
Semantic core: <surface meaning, deeper implication, emotional temperament>
Core visual metaphor: <single strongest metaphor or scene relationship>
Typography-image relationship: <how main text becomes structure, field, wall, stage, path, terrain, or symbolic object>
Global negative constraints: <what to avoid in every image>
Page plan:
1. <title> — <visual role> — must-keep points: <3-7 source-derived points>
2. <title> — <visual role> — must-keep points: <3-7 source-derived points>
...
Keep the plan concise but specific enough that each page has a distinct visual job.
Integrated Text-Image Default
The default deliverable is a finished page generated by the image model, where text, image, metaphor, spatial structure, and information hierarchy are designed together. Do not split the work into "background generation + local text overlay" unless the user explicitly asks for deterministic local text, or a generated page repeatedly fails text accuracy after prompt simplification.
For information-heavy pages, reduce each page's text to model-friendly units:
- One large exact main title
- 3-5 short section labels
- 3-7 source-derived key points expressed as concise phrases, route markers, or short callouts
- Short warning or execution keywords instead of long paragraphs
If the source has more content than this, increase page count or move extra detail into another page. Do not pack long paragraphs into a generated image.
4. Write Prompts
Create one prompt per image. Each prompt must stand alone because image generation calls are independent, but every prompt should reuse the same Style lock paragraph.
Use this prompt template:
Create image <index> of <total> for a coherent series.
Purpose: <PDF/social/story/etc.>
Aspect ratio: <ratio>
Style lock: <repeat exact global style lock>
Continuity anchors: <recurring characters, palette, setting, props, icon language>
Scene/page role: <what this page communicates>
Must-keep source points: <3-7 concrete points from the source text>
Exact page text: <main title and exact short labels/callouts to render; keep concise>
Composition: <framing, perspective, layout, focal point>
Required visual elements: <concrete subjects and actions>
Information design: <how the exact text and must-keep points appear as part of the generated image: integrated title, short labels, embedded callouts, route markers, icon-caption pairs, or small information clusters>
Text policy: Render all listed page text inside the image as designed typography. Keep text short, large enough to read, and spatially integrated with the visual field. No later local text overlay by default.
Avoid: <global and page-specific negatives>
For recurring characters, objects, or brand-like elements, describe them identically in every prompt. If reference images exist, label their role before using them: style reference, character reference, edit target, or object reference.
For concept-poster pages, use this richer prompt template instead:
Create image <index> of <total> as a high-end conceptual poster based on deep semantic visual translation, not a generic illustration.
Input text/main title: <exact title text>
Language: <Chinese/English/etc.>
Aspect ratio: <ratio>
Purpose: <ordered PDF / poster series / social carousel / exhibition visual>
Semantic analysis to express visually:
- Surface meaning: <literal meaning>
- Deep implication: <deeper idea, fate, conflict, social or cultural resonance>
- Emotional temperament: <mood and pressure>
- Key characters/words: <important character or word meanings and why they matter>
- Core visual metaphor: <single strongest symbolic scene or relationship>
- Must-keep source points: <3-7 concrete points if this page is based on a document, route, lesson, briefing, or factual source>
- Exact page text: <main title and exact short labels/callouts to render; keep concise>
Style lock: <repeat exact global style lock>
Typography system: The main title "<exact title>" must be huge, clear, complete, readable, and the absolute visual subject. It must function as the image's skeleton, spatial structure, rhythm source, and meaning carrier, not as a pasted label.
Text-image integration: Make the text grow inside the visual field. The subject or objects should stand before, hide within, pass through, attach to, lean on, be blocked by, or move along the title forms. Title, image, space, and any small reading text must form one unified field.
Composition: Use a giant-title structure plus a simple grounding surface such as a stage, landform, platform, horizon, base, section plane, or symbolic field. Use 1-3 restrained subjects or objects to perform the meaning through posture, scale, contact, distance, pressure, concealment, waiting, falling, offering, crossing, or confrontation.
Reading path: First the giant title, then the core visual metaphor, then any tiny auxiliary reading text. Organize by scale, distance, density, white space, and direction, not by columns, frames, or separated information blocks.
Information design: For document-derived pages, the generated image itself must include the concise must-keep points as integrated route markers, object-attached labels, small caption bands, or embedded callouts that follow the visual field. Keep the page informative enough to present, not merely atmospheric. Do not design an empty background for later text.
Auxiliary text policy: <none / exact short phrase only>. If allowed, embed it lightly along title edges, near symbolic objects, or as a quiet exhibition-label-like trace that serves the mood. No random decorative text.
Color logic: <2-4 colors derived from meaning; restrained, modern, exhibition-grade; avoid cheap gradients and noisy high saturation>
Material/finish: <graphic art poster, restrained collage, lithograph/silkscreen/print grain, paper texture, subtle printing noise if useful>
Avoid: generic illustration, pure typography poster, title-image separation, empty decorative background, separate background-and-overlay design, multi-column information board, magazine directory layout, fake labels, fake coordinates, random English fragments, decorative filler text, crowded elements, cheap commercial illustration style, misspelled or broken title characters.
5. Generate Images
Default to the built-in image_gen tool and issue one call per image. Do not use one prompt for multiple distinct pages. If the user explicitly requests API/CLI/GPT-Image-2 batch generation, use the system imagegen skill's CLI fallback instructions.
After each generated image:
- Inspect whether it satisfies the page role, style lock, integrated text-image design, information density, and text legibility.
- Check the exact main title and key labels. If text is misspelled, broken, too small, fake, or materially different, regenerate with fewer words, larger type, clearer text hierarchy, and stronger instruction to render only the listed exact text.
- If a page repeatedly fails text accuracy after prompt simplification, ask the user whether to accept a local text-repair pass. Do not silently switch to background-plus-overlay as the default.
- Save the selected image into the project output folder with stable ordering:
01-title.png, 02-title.png, etc.
- Maintain
prompts.md in the output folder as a required final deliverable. It must record page order, page title/role, final image filename/path, final generation prompt, and any regeneration notes.
6. Write The Prompt Markdown
Create prompts.md as a required output file before final delivery. Use this structure:
# <Project Title> - Image Generation Prompts
## Generation Brief
- Source: <user source summary>
- Output: <N> images -> PDF
- Aspect ratio: <ratio>
- Information density: <light / standard / dense>
- Style lock: <global style lock>
- Core visual metaphor: <if applicable>
- Global negative constraints: <shared avoid list>
## Page Prompts
### 01 - <Page Title>
- Role: <page role>
- Must-keep points: <3-7 source-derived points>
- Exact page text: <main title and short labels/callouts requested in the generated image>
- Image: `<01-title.png>`
- Status: <generated / regenerated once / accepted with caveat>
```text
<final prompt used for this page>
02 -
...
If a page was regenerated, include only the final accepted prompt in the main prompt block and add a short `Regeneration note:` line explaining what was corrected. Do not omit any page prompt.
### 7. Assemble PDF
Use `scripts/images_to_pdf.py` to combine final images into an ordered PDF. The script sorts paths naturally, so numeric prefixes control page order.
Default PDF assembly rule: make each image occupy the full PDF page. For generated image sets, prefer `--page-size auto`, which uses each image's native dimensions and avoids extra PDF margins. Use A4/letter only when the user requested a printable standard size; in that case, generate images in the same aspect ratio as the target page and use `--fit cover` only when cropping is acceptable.
Examples:
```bash
python /Users/CS/.codex/skills/batch-image-pdf/scripts/images_to_pdf.py \
--input-dir ./output/my-image-series \
--output ./output/my-image-series/final.pdf
python /Users/CS/.codex/skills/batch-image-pdf/scripts/images_to_pdf.py \
--images ./output/my-image-series/01-cover.png ./output/my-image-series/02-context.png \
--output ./output/my-image-series/final.pdf \
--page-size auto \
--fit contain \
--background "#ffffff"
Use --fit cover only when cropping is acceptable. Use --fit contain for artwork, diagrams, and any image where edges or text must remain visible. If the generated artwork has large internal margins or low information density, fix the prompt and regenerate; do not rely on PDF scaling to repair weak page design.
8. Final Response
Report:
- PDF path
- prompt Markdown path
- image folder path
- number of images generated
- whether any pages were regenerated
- approved aspect ratio/page size and information density
Do not paste every full prompt by default unless the user asks; the required prompts.md file already contains all final per-page prompts.
Quality Rules
- Preserve factual meaning from the source content; do not invent claims for educational, legal, medical, or business materials.
- Preserve meaningful information density. For document-based outputs, every page should carry 3-7 source-derived key points unless the user explicitly requests a pure mood/concept series.
- Default to synchronous page design: the generated image should include the final typography, short labels, information clusters, visual metaphor, and composition in one integrated output.
- Prefer short, exact, readable text inside the generated image over long paragraphs. Use fewer words, stronger hierarchy, and more pages when needed; do not solve density by making tiny unreadable type.
- Use local text overlay only as an explicit fallback for text repair or deterministic production, not as the normal creative path.
- Keep the series coherent but avoid making every page compositionally identical.
- When the user provides a brand, style guide, or audience, make those constraints part of the style lock.
- If exact readable text is critical, first try integrated image generation with shorter text and stronger hierarchy; only consider local repair after generated-text failures are visible.
- Always deliver two final files by default: the ordered image PDF and
prompts.md containing every final per-page generation prompt.
- Default PDF pages should be full-image pages with no unintended margins. Use native image page size (
--page-size auto) unless a standard print size was approved.
- Never overwrite existing output folders unless the user explicitly asks.
Concept Poster Defaults
Apply these defaults whenever the user supplies a word, phrase, title, slogan, short sentence, or abstract concept and wants images, posters, covers, or a visual PDF:
- The output is not ordinary illustration. It is a visual translation of meaning, mood, cultural association, hidden tension, and symbolic pressure.
- The main title must be huge, clear, complete, readable, and visually dominant. It should become the structure of the image: wall, stage, terrain, screen, barrier, path, background architecture, or symbolic object.
- The whole page must be one unified visual field. Do not split title, image, and explanation into separate blocks, columns, cards, or an information-board layout.
- Text and image must be interdependent. Subjects and symbolic objects should interact with title forms through scale, overlap, concealment, attachment, passage, shadow, obstruction, distance, or contact.
- Use one precise core metaphor rather than a pile of concepts.
- Keep subjects restrained, usually 1-3 key figures, objects, or symbolic forms.
- Add auxiliary text only when it deepens the concept. It must be short, exact, and embedded naturally in the image atmosphere.
- Do not generate meaningless decorative microtext, fake numbering, fake signatures, fake coordinates, or unrelated English fragments.
- Use restrained exhibition-grade color: usually 2-4 colors, with one strong main color, one neutral/paper-like base, one structural dark, and at most one accent when meaningful.
- Acceptable finish directions include graphic art poster, high-end collage, lithograph, silkscreen, woodcut/print feel, paper grain, and subtle print noise. Avoid dirty, cheap, template-like, crowded, or generic commercial illustration aesthetics.