Run any Skill in Manus with one click

research-figure-edit

Generate or edit research figures with text-to-image and text+image-to-image workflows. Use when the user wants科研绘图、论文插图生成、基于现有图片改图、统一 figure 风格、润色 scientific schematic。

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/kepengxu/research-skills --skill research-figure-edit

Copy and paste this command into Claude Code to install the skill

Source

kepengxu/research-skills

Stars4

Forks0

UpdatedMarch 30, 2026 at 05:24

File Explorer

2 files

SKILL.md

readonly

Research Figure Edit

Use this skill for two capabilities:

text2image: generate a research/scientific figure from a text prompt.
image2image: generate a new figure from a text prompt plus an input image.

Required environment variables

NANOBANANA_HOST — API host. Defaults to api.vectorengine.ai if unset.
NANOBANANA_MODEL — model name. Defaults to gemini-3-pro-image-preview if unset.
NANOBANANA_API_KEY — optional query-string API key.
NANOBANANA_BEARER_TOKEN — optional bearer token for Authorization header.

At least one of NANOBANANA_API_KEY or NANOBANANA_BEARER_TOKEN must be set.

Backend script

The backend script is:

~/.claude/skills/research-figure-edit/research_figure_edit.py

Execution workflow

When this skill is invoked, do the following:

Parse the user's request from $ARGUMENTS.
Determine whether the user supplied an input image path.
If there is a valid local image path, use --mode image2image and pass that path with --input.
Otherwise use --mode text2image.
Run a preprocessing stage before generation.
In preprocessing, analyze the request context and infer:
- the figure category
- the paper-style intent
- the likely venue-style preset
- the desired layout structure
- the color semantics
- the connection structure
- the arrow density and arrow hierarchy
- whether the request is a hero figure, overview figure, mechanism figure, architecture figure, or workflow figure
Automatically select the most suitable venue-style preset.
Optimize the layout and connection grammar before writing prompts.
From the preprocessing result, write two explicit prompts:
- a Positive prompt that is very detailed, publication-oriented, and suitable for scientific figure generation
- a Negative prompt that suppresses low-quality, non-academic, decorative, illegible, or off-style outputs
Rewrite the generation input so the backend script receives a strong, paper-style final prompt derived from the Positive prompt.
Choose an output directory automatically:

default to outputs/research-figure-edit/
if that is unsuitable for the current workspace, use a nearby outputs/ directory under the working directory

Run the backend script with Bash and let it complete the full workflow in one step: request API response, extract image bytes, and save the final image file.
Return the saved output paths to the user together with:

detected figure category
selected venue-style preset
Positive prompt
Negative prompt
any major rewrite decisions (briefly)

Mode selection rules

If the user provides only a prompt, run --mode text2image.
If the user provides a prompt and a local input image path, run --mode image2image.
Only treat a path as an image input if it points to an existing local file and has a supported image extension.
If the request is ambiguous, ask the user to specify the prompt text and image path separately.

Figure category detection

Before generating, classify the request into the closest figure category.

Detection heuristics

Use lightweight intent matching from the user's wording.

If the request mentions 流程图, pipeline, workflow, step, loop, training pipeline, prefer flowchart
If it mentions 架构图, architecture, framework, module, encoder, decoder, prefer architecture
If it mentions 机制, principle, how it works, reward mechanism, attention mechanism, prefer mechanism
If it mentions 对比, compare, baseline, before/after, ablation, prefer comparison
If it mentions overview, motivation, teaser, main idea, concept, prefer concept
If it mentions data flow, input/output, preprocess, postprocess, multimodal stream, prefer dataflow
If it mentions experiment setup, evaluation, benchmark, protocol, prefer experiment

If multiple categories match, prefer the most structurally specific one in this order: flowchart > architecture > mechanism > comparison > concept > dataflow > experiment

Venue-style preset system

After detecting the category, automatically select a venue-style preset.

Presets

`paper-clean`

Use for generic publication-quality figures when no venue-specific visual cue is clearly better.

Prompt guidance:

clean publication-quality scientific figure
white background
minimal modern academic style
balanced layout
crisp labels
subtle professional color palette
polished and highly readable

`nature-like`

Use for elegant overview figures, concept figures, and high-level mechanism illustrations.

Prompt guidance:

Nature-like scientific figure
elegant and refined
clean white background
subtle professional color palette
balanced composition
minimal clutter
polished scientific schematic

`science-like`

Use for explanatory mechanism figures and concept+experiment hybrid figures.

Prompt guidance:

Science-like publication figure
explanatory scientific illustration
refined but accessible design
strong visual storytelling
concise labels
polished multi-part composition

`cvpr-like`

Use for computer vision style architecture diagrams, technical pipelines, and comparison figures.

Prompt guidance:

CVPR-like technical figure
structured modular layout
clear blocks and arrows
compact and professional design
white background
strong readability
conference-ready diagram

`iccv-like`

Use for visually layered architecture diagrams and complex visual systems.

Prompt guidance:

ICCV-like vision paper diagram
professional modular architecture figure
clearly separated components
strong visual hierarchy
balanced layout
publication-quality technical illustration

`neurips-like`

Use for machine learning method figures, algorithm logic diagrams, and concise training schematics.

Prompt guidance:

NeurIPS-like machine learning figure
clean and rigorous academic style
precise structure
compact scientific layout
clear logical flow
minimal but highly legible

`iclr-like`

Use for reinforcement learning, LLM, alignment, agent, and training workflow figures.

Prompt guidance:

ICLR-like modern ML figure
clean research schematic
clear process flow
concise modular design
elegant but rigorous
highly readable labels
suitable for RL or LLM training diagrams

`aaai-like`

Use for formal AI system diagrams and logic-oriented technical figures.

Prompt guidance:

AAAI-like AI research diagram
clear and formal academic style
structured blocks
straightforward technical presentation
neat and concise conference-style figure

Category to preset mapping

Use these defaults unless the user explicitly asks for a different visual style:

flowchart -> iclr-like
architecture -> cvpr-like
mechanism -> nature-like
comparison -> cvpr-like
concept -> nature-like
dataflow -> paper-clean
experiment -> neurips-like

Override rules

If the user explicitly asks for Nature, Science, CVPR, ICCV, NeurIPS, ICLR, or AAAI style, honor that request.
If the user asks for a paper main figure, overview figure, or teaser figure, prefer nature-like unless they specify another venue.
If the user asks for a training pipeline, RL flowchart, agent workflow, or LLM training diagram, prefer iclr-like.
If the user asks for a computer vision architecture, vision pipeline, or multi-branch model diagram, prefer cvpr-like or iccv-like.
If the user gives no clear style signal, use the mapping table above.

Layout and connection optimization rules

Before writing prompts, optimize the visual organization of the figure itself.

Narrative hierarchy

Prefer figures with explicit narrative hierarchy rather than flat equal-weight modules.

Use layered organization when appropriate, such as:

top: central claim / problem framing
middle: method response / mechanism / architecture
bottom or side: validation mapping / experiment support

If one part of the figure is conceptually dominant, give it more visual weight. Do not force all panels to have equal size when the content is not equally important.

Arrow and connector discipline

Reduce unnecessary arrows. Only use arrows when they truly improve scientific readability.

Prefer three levels of connector strength:

primary arrows for the main process or argument flow
secondary arrows for local dependencies within one panel
minimal connectors or simple alignment for weak logical mapping

Avoid spider-web connection structures. Avoid dense cross-panel arrows when color semantics, grouping, or alignment can communicate the same relationship more clearly.

Preferred connection structures

Prefer clean structures such as:

linear chains for process explanations
tree-like branching and merging for supervision or aggregation paths
parallel columns feeding one fusion block for multimodal reasoning
aligned mapping grids or tag panels for experiment-to-operator correspondence

Avoid complex global loops unless the loop itself is the scientific point. If a loop is needed, keep it local to one panel instead of the whole figure.

Panel consistency

Within a multi-panel figure, keep panels internally consistent. A strong default is:

title on top
main visual structure in the center
one short takeaway note below

Do not mix radically different alignment systems across neighboring panels unless there is a clear reason.

Cross-panel structure

Minimize direct cross-panel connectors. When possible, communicate shared semantics by:

repeated color meaning
repeated visual tokens
aligned panel ordering
matching labels
mirrored or parallel local structures

For validation panels, prefer compact mapping layouts over long return arrows to earlier panels. For example, use aligned tags, mini-box mappings, or table-like correspondences rather than many long crossing connectors.

Layout optimization objectives

During preprocessing, explicitly optimize for:

balanced visual weight
clean reading order
minimal connector clutter
high signal-to-noise ratio
strong alignment
clear grouping
scientific readability at paper scale
immediate recognition of the main claim

Prompt preprocessing and construction rules

Before any image generation, always perform prompt preprocessing.

Preprocessing outputs

The preprocessing stage must explicitly produce two artifacts:

Positive prompt
Negative prompt

Positive prompt requirements

The Positive prompt must be:

tailored to the user's scientific context
aligned with the detected figure category
aligned with the selected venue-style preset
detailed enough for high-quality academic figure generation
written in a publication-oriented style rather than a casual image-generation style
structured, specific, and visually directive

The Positive prompt should usually contain:

Figure identity
- what kind of figure this is
- whether it is a hero figure, overview figure, mechanism figure, architecture figure, workflow figure, comparison figure, or experiment figure
Scientific content layer
- the exact entities, modules, stages, or claims to show
- the intended scientific message
- the causal or logical relationships between components
Layout layer
- horizontal or vertical layout
- number of panels or bands
- relative grouping of information
- alignment and reading order
- narrative hierarchy across panels or bands
- which panel or band should receive the most visual weight
Connection layer
- where arrows are necessary
- where alignment or grouping should replace arrows
- primary vs secondary connector hierarchy
- whether the structure should be linear, branching, merging, parallel, or mapping-based
- how to avoid clutter and spider-web connectors
Visual semantics layer
- color meaning per pathway or concept
- which elements should be emphasized
- which elements should stay neutral
Typography and labeling layer
- short labels only
- clean scientific sans-serif style
- bold panel titles when needed
- no decorative text treatment
Venue-style layer
- include the chosen preset guidance explicitly
Quality layer
- white background
- publication-ready
- vector-graphics look
- flat design when appropriate
- strict alignment
- high readability
- minimal clutter
- disciplined connector usage
- balanced layout and panel spacing

Negative prompt requirements

The Negative prompt must explicitly suppress outputs that are visually incompatible with scientific paper figures.

The Negative prompt should usually include terms such as:

low quality
blurry text
unreadable labels
poster design
marketing infographic
cartoon
comic style
childish icons
3D rendering
glossy UI
neon colors
sci-fi interface
cluttered layout
dark background
heavy gradients
dramatic shadows
photographic realism
messy arrows
inconsistent alignment
decorative background
exaggerated icons
slide deck style
business infographic style
flashy presentation
hand-drawn style
sketch style
crowded composition
illegible academic labels

Final generation prompt rule

Construct the final generation prompt by combining these layers in order:

User intent layer — what the user actually wants drawn
Category layer — structural instructions for the detected category
Venue-style layer — one of the preset guidance blocks above
Quality layer — aspect ratio, clarity, white background, readability, publication quality, minimal clutter

Do not merely forward the raw user request. Rewrite it into a stronger research-figure generation prompt. The backend script should receive the rewritten Positive prompt as its --prompt content.

Bash command templates

Text to image

python ~/.claude/skills/research-figure-edit/research_figure_edit.py \
  --mode text2image \
  --prompt "<user prompt>" \
  --aspect-ratio "<aspect ratio, e.g. 16:9>" \
  --clarity "<standard|high|ultra>" \
  --output-dir outputs/research-figure-edit \
  --save-raw-response

Text + image to image

python ~/.claude/skills/research-figure-edit/research_figure_edit.py \
  --mode image2image \
  --prompt "<user prompt>" \
  --input "<local image path>" \
  --output-dir outputs/research-figure-edit \
  --save-raw-response

Output behavior

The script completes the whole pipeline in one run:

sends the generation request
extracts generated image bytes from the JSON response
saves the final image file directly
optionally saves raw API response JSON

It prints the saved paths after completion.

When responding to the user after generation, also report:

detected figure category
selected venue-style preset
Positive prompt
Negative prompt
any major prompt rewrite decisions (briefly)

Notes

For image2image, the input image is read locally and base64-encoded automatically.
Supported input MIME types: PNG, JPG, JPEG.
--aspect-ratio is supported for both modes.
--clarity is supported for text2image and can be standard, high, or ultra.
Use clear prompts such as:
- preserve the structure and improve visual consistency
- convert into a clean Nature-style scientific figure
- white background, paper-ready, minimal annotations
Before running the script, verify that at least one of NANOBANANA_API_KEY or NANOBANANA_BEARER_TOKEN is available in the environment. If not, ask the user to configure them.
Even if the user only provides a short request, you must still run the preprocessing stage internally and expand it into detailed Positive and Negative prompts before generation.

name	research-figure-edit
description	Generate or edit research figures with text-to-image and text+image-to-image workflows. Use when the user wants科研绘图、论文插图生成、基于现有图片改图、统一 figure 风格、润色 scientific schematic。
argument-hint	["prompt and optional input image path"]
allowed-tools	Bash(*), Read, Write, Edit

Research Figure Edit

Use this skill for two capabilities:

text2image: generate a research/scientific figure from a text prompt.
image2image: generate a new figure from a text prompt plus an input image.

Required environment variables

NANOBANANA_HOST — API host. Defaults to api.vectorengine.ai if unset.
NANOBANANA_MODEL — model name. Defaults to gemini-3-pro-image-preview if unset.
NANOBANANA_API_KEY — optional query-string API key.
NANOBANANA_BEARER_TOKEN — optional bearer token for Authorization header.

At least one of NANOBANANA_API_KEY or NANOBANANA_BEARER_TOKEN must be set.

Backend script

The backend script is:

~/.claude/skills/research-figure-edit/research_figure_edit.py

Execution workflow

When this skill is invoked, do the following:

Parse the user's request from $ARGUMENTS.
Determine whether the user supplied an input image path.
If there is a valid local image path, use --mode image2image and pass that path with --input.
Otherwise use --mode text2image.
Run a preprocessing stage before generation.
In preprocessing, analyze the request context and infer:
- the figure category
- the paper-style intent
- the likely venue-style preset
- the desired layout structure
- the color semantics
- the connection structure
- the arrow density and arrow hierarchy
- whether the request is a hero figure, overview figure, mechanism figure, architecture figure, or workflow figure
Automatically select the most suitable venue-style preset.
Optimize the layout and connection grammar before writing prompts.
From the preprocessing result, write two explicit prompts:
- a Positive prompt that is very detailed, publication-oriented, and suitable for scientific figure generation
- a Negative prompt that suppresses low-quality, non-academic, decorative, illegible, or off-style outputs
Rewrite the generation input so the backend script receives a strong, paper-style final prompt derived from the Positive prompt.
Choose an output directory automatically:

default to outputs/research-figure-edit/
if that is unsuitable for the current workspace, use a nearby outputs/ directory under the working directory

Run the backend script with Bash and let it complete the full workflow in one step: request API response, extract image bytes, and save the final image file.
Return the saved output paths to the user together with:

detected figure category
selected venue-style preset
Positive prompt
Negative prompt
any major rewrite decisions (briefly)

Mode selection rules

If the user provides only a prompt, run --mode text2image.
If the user provides a prompt and a local input image path, run --mode image2image.
Only treat a path as an image input if it points to an existing local file and has a supported image extension.
If the request is ambiguous, ask the user to specify the prompt text and image path separately.

Figure category detection

Before generating, classify the request into the closest figure category.

Detection heuristics

Use lightweight intent matching from the user's wording.

If the request mentions 流程图, pipeline, workflow, step, loop, training pipeline, prefer flowchart
If it mentions 架构图, architecture, framework, module, encoder, decoder, prefer architecture
If it mentions 机制, principle, how it works, reward mechanism, attention mechanism, prefer mechanism
If it mentions 对比, compare, baseline, before/after, ablation, prefer comparison
If it mentions overview, motivation, teaser, main idea, concept, prefer concept
If it mentions data flow, input/output, preprocess, postprocess, multimodal stream, prefer dataflow
If it mentions experiment setup, evaluation, benchmark, protocol, prefer experiment

If multiple categories match, prefer the most structurally specific one in this order: flowchart > architecture > mechanism > comparison > concept > dataflow > experiment

Venue-style preset system

After detecting the category, automatically select a venue-style preset.

Presets

`paper-clean`

Use for generic publication-quality figures when no venue-specific visual cue is clearly better.

Prompt guidance:

clean publication-quality scientific figure
white background
minimal modern academic style
balanced layout
crisp labels
subtle professional color palette
polished and highly readable

`nature-like`

Use for elegant overview figures, concept figures, and high-level mechanism illustrations.

Prompt guidance:

Nature-like scientific figure
elegant and refined
clean white background
subtle professional color palette
balanced composition
minimal clutter
polished scientific schematic

`science-like`

Use for explanatory mechanism figures and concept+experiment hybrid figures.

Prompt guidance:

Science-like publication figure
explanatory scientific illustration
refined but accessible design
strong visual storytelling
concise labels
polished multi-part composition

`cvpr-like`

Use for computer vision style architecture diagrams, technical pipelines, and comparison figures.

Prompt guidance:

CVPR-like technical figure
structured modular layout
clear blocks and arrows
compact and professional design
white background
strong readability
conference-ready diagram

`iccv-like`

Use for visually layered architecture diagrams and complex visual systems.

Prompt guidance:

ICCV-like vision paper diagram
professional modular architecture figure
clearly separated components
strong visual hierarchy
balanced layout
publication-quality technical illustration

`neurips-like`

Use for machine learning method figures, algorithm logic diagrams, and concise training schematics.

Prompt guidance:

NeurIPS-like machine learning figure
clean and rigorous academic style
precise structure
compact scientific layout
clear logical flow
minimal but highly legible

`iclr-like`

Use for reinforcement learning, LLM, alignment, agent, and training workflow figures.

Prompt guidance:

ICLR-like modern ML figure
clean research schematic
clear process flow
concise modular design
elegant but rigorous
highly readable labels
suitable for RL or LLM training diagrams

`aaai-like`

Use for formal AI system diagrams and logic-oriented technical figures.

Prompt guidance:

AAAI-like AI research diagram
clear and formal academic style
structured blocks
straightforward technical presentation
neat and concise conference-style figure

Category to preset mapping

Use these defaults unless the user explicitly asks for a different visual style:

flowchart -> iclr-like
architecture -> cvpr-like
mechanism -> nature-like
comparison -> cvpr-like
concept -> nature-like
dataflow -> paper-clean
experiment -> neurips-like

Override rules

If the user explicitly asks for Nature, Science, CVPR, ICCV, NeurIPS, ICLR, or AAAI style, honor that request.
If the user asks for a paper main figure, overview figure, or teaser figure, prefer nature-like unless they specify another venue.
If the user asks for a training pipeline, RL flowchart, agent workflow, or LLM training diagram, prefer iclr-like.
If the user asks for a computer vision architecture, vision pipeline, or multi-branch model diagram, prefer cvpr-like or iccv-like.
If the user gives no clear style signal, use the mapping table above.

Layout and connection optimization rules

Before writing prompts, optimize the visual organization of the figure itself.

Narrative hierarchy

Prefer figures with explicit narrative hierarchy rather than flat equal-weight modules.

Use layered organization when appropriate, such as:

top: central claim / problem framing
middle: method response / mechanism / architecture
bottom or side: validation mapping / experiment support

If one part of the figure is conceptually dominant, give it more visual weight. Do not force all panels to have equal size when the content is not equally important.

Arrow and connector discipline

Reduce unnecessary arrows. Only use arrows when they truly improve scientific readability.

Prefer three levels of connector strength:

primary arrows for the main process or argument flow
secondary arrows for local dependencies within one panel
minimal connectors or simple alignment for weak logical mapping

Avoid spider-web connection structures. Avoid dense cross-panel arrows when color semantics, grouping, or alignment can communicate the same relationship more clearly.

Preferred connection structures

Prefer clean structures such as:

linear chains for process explanations
tree-like branching and merging for supervision or aggregation paths
parallel columns feeding one fusion block for multimodal reasoning
aligned mapping grids or tag panels for experiment-to-operator correspondence

Avoid complex global loops unless the loop itself is the scientific point. If a loop is needed, keep it local to one panel instead of the whole figure.

Panel consistency

Within a multi-panel figure, keep panels internally consistent. A strong default is:

title on top
main visual structure in the center
one short takeaway note below

Do not mix radically different alignment systems across neighboring panels unless there is a clear reason.

Cross-panel structure

Minimize direct cross-panel connectors. When possible, communicate shared semantics by:

repeated color meaning
repeated visual tokens
aligned panel ordering
matching labels
mirrored or parallel local structures

Layout optimization objectives

During preprocessing, explicitly optimize for:

balanced visual weight
clean reading order
minimal connector clutter
high signal-to-noise ratio
strong alignment
clear grouping
scientific readability at paper scale
immediate recognition of the main claim

Prompt preprocessing and construction rules

Before any image generation, always perform prompt preprocessing.

Preprocessing outputs

The preprocessing stage must explicitly produce two artifacts:

Positive prompt
Negative prompt

Positive prompt requirements

The Positive prompt must be:

tailored to the user's scientific context
aligned with the detected figure category
aligned with the selected venue-style preset
detailed enough for high-quality academic figure generation
written in a publication-oriented style rather than a casual image-generation style
structured, specific, and visually directive

The Positive prompt should usually contain:

Figure identity
- what kind of figure this is
- whether it is a hero figure, overview figure, mechanism figure, architecture figure, workflow figure, comparison figure, or experiment figure
Scientific content layer
- the exact entities, modules, stages, or claims to show
- the intended scientific message
- the causal or logical relationships between components
Layout layer
- horizontal or vertical layout
- number of panels or bands
- relative grouping of information
- alignment and reading order
- narrative hierarchy across panels or bands
- which panel or band should receive the most visual weight
Connection layer
- where arrows are necessary
- where alignment or grouping should replace arrows
- primary vs secondary connector hierarchy
- whether the structure should be linear, branching, merging, parallel, or mapping-based
- how to avoid clutter and spider-web connectors
Visual semantics layer
- color meaning per pathway or concept
- which elements should be emphasized
- which elements should stay neutral
Typography and labeling layer
- short labels only
- clean scientific sans-serif style
- bold panel titles when needed
- no decorative text treatment
Venue-style layer
- include the chosen preset guidance explicitly
Quality layer
- white background
- publication-ready
- vector-graphics look
- flat design when appropriate
- strict alignment
- high readability
- minimal clutter
- disciplined connector usage
- balanced layout and panel spacing

Negative prompt requirements

The Negative prompt must explicitly suppress outputs that are visually incompatible with scientific paper figures.

The Negative prompt should usually include terms such as:

low quality
blurry text
unreadable labels
poster design
marketing infographic
cartoon
comic style
childish icons
3D rendering
glossy UI
neon colors
sci-fi interface
cluttered layout
dark background
heavy gradients
dramatic shadows
photographic realism
messy arrows
inconsistent alignment
decorative background
exaggerated icons
slide deck style
business infographic style
flashy presentation
hand-drawn style
sketch style
crowded composition
illegible academic labels

Final generation prompt rule

Construct the final generation prompt by combining these layers in order:

User intent layer — what the user actually wants drawn
Category layer — structural instructions for the detected category
Venue-style layer — one of the preset guidance blocks above
Quality layer — aspect ratio, clarity, white background, readability, publication quality, minimal clutter

Do not merely forward the raw user request. Rewrite it into a stronger research-figure generation prompt. The backend script should receive the rewritten Positive prompt as its --prompt content.

Bash command templates

Text to image

python ~/.claude/skills/research-figure-edit/research_figure_edit.py \
  --mode text2image \
  --prompt "<user prompt>" \
  --aspect-ratio "<aspect ratio, e.g. 16:9>" \
  --clarity "<standard|high|ultra>" \
  --output-dir outputs/research-figure-edit \
  --save-raw-response

Text + image to image

python ~/.claude/skills/research-figure-edit/research_figure_edit.py \
  --mode image2image \
  --prompt "<user prompt>" \
  --input "<local image path>" \
  --output-dir outputs/research-figure-edit \
  --save-raw-response

Output behavior

The script completes the whole pipeline in one run:

sends the generation request
extracts generated image bytes from the JSON response
saves the final image file directly
optionally saves raw API response JSON

It prints the saved paths after completion.

When responding to the user after generation, also report:

detected figure category
selected venue-style preset
Positive prompt
Negative prompt
any major prompt rewrite decisions (briefly)

Notes

For image2image, the input image is read locally and base64-encoded automatically.
Supported input MIME types: PNG, JPG, JPEG.
--aspect-ratio is supported for both modes.
--clarity is supported for text2image and can be standard, high, or ultra.
Use clear prompts such as:
- preserve the structure and improve visual consistency
- convert into a clean Nature-style scientific figure
- white background, paper-ready, minimal annotations
Before running the script, verify that at least one of NANOBANANA_API_KEY or NANOBANANA_BEARER_TOKEN is available in the environment. If not, ask the user to configure them.
Even if the user only provides a short request, you must still run the preprocessing stage internally and expand it into detailed Positive and Negative prompts before generation.

research-figure-edit

More from this repository

More from this repository

Research Figure Edit

Required environment variables

Backend script

Execution workflow

Mode selection rules

Figure category detection

Categories

Detection heuristics

Venue-style preset system

Presets

paper-clean

nature-like

science-like

cvpr-like

iccv-like

neurips-like

iclr-like

aaai-like

Category to preset mapping

Override rules

Layout and connection optimization rules

Narrative hierarchy

Arrow and connector discipline

Preferred connection structures

Panel consistency

Cross-panel structure

Layout optimization objectives

Prompt preprocessing and construction rules

Preprocessing outputs

Positive prompt requirements

Negative prompt requirements

Final generation prompt rule

Bash command templates

Text to image

Text + image to image

Output behavior

Notes

Research Figure Edit

Required environment variables

Backend script

Execution workflow

Mode selection rules

Figure category detection

Categories

Detection heuristics

Venue-style preset system

Presets

paper-clean

nature-like

science-like

cvpr-like

iccv-like

neurips-like

iclr-like

aaai-like

Category to preset mapping

Override rules

Layout and connection optimization rules

Narrative hierarchy

Arrow and connector discipline

Preferred connection structures

Panel consistency

Cross-panel structure

Layout optimization objectives

Prompt preprocessing and construction rules

Preprocessing outputs

Positive prompt requirements

Negative prompt requirements

Final generation prompt rule

Bash command templates

Text to image

Text + image to image

Output behavior

Notes

`paper-clean`

`nature-like`

`science-like`

`cvpr-like`

`iccv-like`

`neurips-like`

`iclr-like`

`aaai-like`

`paper-clean`

`nature-like`

`science-like`

`cvpr-like`

`iccv-like`

`neurips-like`

`iclr-like`

`aaai-like`