ワンクリックでManusで任意のスキルを実行

$pwd:

sn-da-image-caption

Name: Sn Da Image Caption
Author: OpenSenseNova

// 图片理解与数据提取 skill。当图片文件（.png/.jpg/.jpeg/.gif/.webp/.bmp）是主要输入且用户需要理解、提取数据或分析图片内容时使用。提供预配置的 caption 脚本（scripts/caption.py），通过 vision 模型将图片转为文本描述，无需额外配置 API Key。覆盖：(1) 通过 scripts/caption.py 对图表/表格/截图/流程图进行 caption，(2) 将 caption 文本解析为结构化 DataFrame，(3) 基于提取数据重新生成可视化图表，(4) 导出为 Excel/CSV。**遇到以下任一情况就主动使用本 skill，不要自行猜测图片内容**：①用户出现触发词：图片分析 / 图表提取 / 表格识别 / OCR / 图片描述 / 截图分析 / 图表数据 / 提取图片中的数据 / 图片转表格 / 识别图片 / image caption / extract data from image / chart analysis / table OCR；②用户上传或指定了图片文件（.png / .jpg / .jpeg / .gif / .webp / .bmp）并要求理解、提取数据或分析内容；③任务需要从图表截图、表格截图、UI 截图、流程图中提取结构化信息；④用户要求将图片中的数据转为 Excel/CSV 或重新生成可视化图表。仅不用于：图片编辑（裁剪、滤镜、缩放）、图片生成、不含数据的风景/人物照片描述。

Manusで実行

$ git log --oneline --stat

stars:2,899

forks:215

updated:2026年5月7日 14:00

ファイルエクスプローラー

2 ファイル

SKILL.md

readonly

related-skills.json

同じリポジトリ

sn-image-base.md

from "OpenSenseNova/SenseNova-Skills"

Base-layer skill for the SenseNova-Skills project, providing low-level APIs for image generation, recognition (VLM), and text optimization (LLM). This skill does not preprocess inputs; it only calls backend services and returns results. This skill is not user-facing and is intended for upper-layer skills only.

2026-05-272.9k

sn-infographic.md

from "OpenSenseNova/SenseNova-Skills"

Generates professional infographics with various layout types and visual styles. Analyzes content, recommends layout and style, and generates publication-ready infographics. Use when user asks to create "infographic", "信息图", "visual summary", or "可视化".

2026-05-272.9k

sn-ppt-creative.md

from "OpenSenseNova/SenseNova-Skills"

Creative-mode PPT pipeline. One full-page 16:9 PNG per slide. LLM / VLM calls go through sn-ppt-standard/lib/model_client.py (shared thin client). Text-to-image (the actual png rendering) goes through sn-image-base/scripts/sn_agent_runner.py. Expects task_pack.json + info_pack.json already written by sn-ppt-entry.

2026-05-252.9k

sn-ppt-doctor.md

from "OpenSenseNova/SenseNova-Skills"

Environment diagnostic for the PPT family. Validates sn-image-base, API keys, Node runtime, and optional deps; interactively writes .env for required vars. Runs before sn-ppt-entry; does not modify sn-image-* skills.

2026-05-252.9k

sn-ppt-standard.md

from "OpenSenseNova/SenseNova-Skills"

Standard-mode PPT pipeline. All LLM / VLM / T2I calls are wrapped in a single CLI entry (scripts/run_stage.py). The main agent's job is simple: emit ONE shell command per stage, never write loops, never write prompts.

2026-05-252.9k

sn-update.md

from "OpenSenseNova/SenseNova-Skills"

Update SenseNova Skills (the sn-* bundle) inside an OpenClaw or hermes-agent install. ALWAYS use this skill when the user says any of: "update SenseNova skills", "update SN skills", "更新 sensenova skills", "更新 sn skills", "刷新 sn-*", "升级 sn-* skills", or names a specific sn-* skill to update (e.g. "更新 sn-ppt-standard", "refresh sn-image-base"). Default scope is the whole sn-* bundle; if the user names specific skills, update ONLY those.

2026-05-072.9k

package.json

"author": "OpenSenseNova"

"repository": "OpenSenseNova/SenseNova-Skills"

GitHub リポジトリを開く Creator のリポジトリを見る

$ install --global

$ download --local

Manusで実行

$ useful --forSOC

ソフトウェア開発者コンピュータ・数学職15-1252L4

name

sn-da-image-caption

description

图片理解与数据提取 skill。当图片文件（.png/.jpg/.jpeg/.gif/.webp/.bmp）是主要输入且用户需要理解、提取数据或分析图片内容时使用。提供预配置的 caption 脚本（scripts/caption.py），通过 vision 模型将图片转为文本描述，无需额外配置 API Key。覆盖：(1) 通过 scripts/caption.py 对图表/表格/截图/流程图进行 caption，(2) 将 caption 文本解析为结构化 DataFrame，(3) 基于提取数据重新生成可视化图表，(4) 导出为 Excel/CSV。**遇到以下任一情况就主动使用本 skill，不要自行猜测图片内容**：①用户出现触发词：图片分析 / 图表提取 / 表格识别 / OCR / 图片描述 / 截图分析 / 图表数据 / 提取图片中的数据 / 图片转表格 / 识别图片 / image caption / extract data from image / chart analysis / table OCR；②用户上传或指定了图片文件（.png / .jpg / .jpeg / .gif / .webp / .bmp）并要求理解、提取数据或分析内容；③任务需要从图表截图、表格截图、UI 截图、流程图中提取结构化信息；④用户要求将图片中的数据转为 Excel/CSV 或重新生成可视化图表。仅不用于：图片编辑（裁剪、滤镜、缩放）、图片生成、不含数据的风景/人物照片描述。

Image Caption Analysis — 图片描述与数据提取

Overview

Analyze, extract data from, or understand image files (.png, .jpg, .jpeg, .gif, .webp, .bmp). The core workflow:

Run scripts/caption.py to get a text description of the image
Parse the description into structured data (DataFrame, etc.)
Analyze, visualize, or export

scripts/caption.py — Image Caption

The script converts images to text descriptions via a vision model. Configure via SN_API_KEY (minimum required), or use SN_VISION_API_KEY / SN_VISION_BASE_URL / SN_VISION_MODEL for fine-grained control. See the project environment variable spec for the full fallback chain.

Usage

# Basic — get text description
python3 scripts/caption.py /mnt/data/image.png

# Custom prompt — guide what to extract
python3 scripts/caption.py /mnt/data/chart.png --prompt "提取所有数值，Markdown 表格格式"

# JSON output — includes detected type, usage stats, cache info
python3 scripts/caption.py /mnt/data/image.png --json

# Batch — process all images in a directory
python3 scripts/caption.py /mnt/data/images/ --batch --output /mnt/data/captions.json

# Override model (optional)
python3 scripts/caption.py /mnt/data/image.png --model gemini-3.1-flash-lite-preview

Options

Option	Description
`--prompt, -p`	Custom prompt (overrides auto-detection)
`--model, -m`	Vision model (default: sensenova-6.7-flash-lite)
`--json`	Output structured JSON instead of plain text
`--batch`	Process all images in a directory
`--output, -o`	Output file for batch results
`--no-cache`	Skip MD5 cache

What it does automatically

Type detection: Detects image type from filename (chart/table/UI/diagram/general) and picks the best prompt
Compression: Images >5MB or >2048px are compressed before sending
Caching: Same image + same prompt → instant cached result, no API cost
Error handling: Retries on failure, returns error message on permanent failure

JSON output format

{
  "file": "/mnt/data/image.png",
  "type": "chart",
  "description": "这是一张柱状图...",
  "usage": {"prompt_tokens": 1100, "completion_tokens": 400},
  "cached": false
}

Calling from Python

import subprocess, json

CAPTION = "/path/to/skills/sn-da-image-caption/scripts/caption.py"

# Single image
result = subprocess.run(
    ["python3", CAPTION, "/mnt/data/chart.png", "--json",
     "--prompt", "提取图表数据，Markdown 表格输出"],
    capture_output=True, text=True, timeout=60
)
data = json.loads(result.stdout)
description = data["description"]

# Batch
result = subprocess.run(
    ["python3", CAPTION, "/mnt/data/images/", "--batch",
     "--output", "/mnt/data/captions.json"],
    capture_output=True, text=True, timeout=300
)
with open("/mnt/data/captions.json") as f:
    all_captions = json.load(f)

Prompt Strategy

Different image types need different prompts. The script auto-detects, but specifying --prompt gives better results.

Image Type	When	Recommended --prompt
Data chart	柱状图/折线图/饼图	`"提取图表标题、坐标轴、每个数据点数值、图例。Markdown 表格输出。"`
Table screenshot	表格截图	`"提取表格所有内容，Markdown 表格格式，保持行列结构，数值不四舍五入。"`
UI screenshot	界面截图	`"以前端开发者视角描述：布局、组件、文字、颜色。"`
Diagram	流程图/架构图	`"描述所有节点、连接关系（A→B）、分支条件。"`
General	照片、其他	不传 --prompt，用默认

Parsing Caption Results

Caption 通常返回 Markdown 表格，解析为 DataFrame：

import pandas as pd

def parse_markdown_table(text):
    lines = text.strip().split('\n')
    table_lines = []
    in_table = False
    for line in lines:
        stripped = line.strip()
        if '|' in stripped:
            in_table = True
            table_lines.append(stripped)
        elif in_table:
            break

    data_lines = []
    for l in table_lines:
        cells = [c.strip() for c in l.split('|') if c.strip()]
        if cells and not all(set(c) <= set('-: ') for c in cells):
            data_lines.append(cells)

    if len(data_lines) < 2:
        return None

    header = data_lines[0]
    rows = [r for r in data_lines[1:] if len(r) == len(header)]
    df = pd.DataFrame(rows, columns=header)

    # Auto numeric conversion
    for col in df.columns:
        try:
            cleaned = df[col].str.replace(',', '').str.strip()
            if cleaned.str.endswith('%').any():
                df[col] = pd.to_numeric(cleaned.str.rstrip('%'), errors='coerce')
            else:
                converted = pd.to_numeric(cleaned, errors='coerce')
                if converted.notna().sum() > len(df) * 0.5:
                    df[col] = converted
        except Exception:
            pass
    return df

Visualization

Chinese Font Setup (MANDATORY)

import matplotlib.pyplot as plt
import matplotlib
import os

font_path = '/usr/share/fonts/truetype/wqy/wqy-zenhei.ttc'
if os.path.exists(font_path):
    matplotlib.rcParams['font.family'] = 'WenQuanYi Zen Hei'
matplotlib.rcParams['axes.unicode_minus'] = False

Color Palette

COLORS = ['#4C72B0', '#55A868', '#C44E52', '#8172B2', '#CCB974', '#64B5CD']

Save & Display

plt.savefig('/mnt/data/chart.png', dpi=150, bbox_inches='tight')
plt.show()
print("![图表](sandbox:/mnt/data/chart.png)")

Export to Excel

from openpyxl.styles import Font, PatternFill, Alignment

output_path = "/mnt/data/result.xlsx"
with pd.ExcelWriter(output_path, engine='openpyxl') as writer:
    df.to_excel(writer, index=False, sheet_name='提取数据')
    ws = writer.sheets['提取数据']
    fill = PatternFill(start_color='4472C4', end_color='4472C4', fill_type='solid')
    for cell in ws[1]:
        cell.font = Font(bold=True, color='FFFFFF')
        cell.fill = fill
        cell.alignment = Alignment(horizontal='center')
    for i, col in enumerate(df.columns, 1):
        w = max(df[col].astype(str).str.len().max(), len(str(col))) + 2
        ws.column_dimensions[chr(64 + i)].width = min(w * 1.2, 40)

print(f"[下载](sandbox:{output_path})")

Multi-Image Processing

import glob

image_files = sorted(glob.glob("/mnt/data/*.png"))
all_dfs = []

for img in image_files:
    r = subprocess.run(
        ["python3", CAPTION, img, "--json", "--prompt", "提取数据，Markdown 表格"],
        capture_output=True, text=True, timeout=60
    )
    desc = json.loads(r.stdout)["description"]
    df = parse_markdown_table(desc)
    if df is not None:
        all_dfs.append(df)

combined = pd.concat(all_dfs, ignore_index=True) if all_dfs else None

Or batch mode:

python3 scripts/caption.py /mnt/data/images/ --batch --output /mnt/data/captions.json

Common Pitfalls

Always caption first — don't guess image content from filenames
Use --prompt for precision — auto-detect is OK, explicit prompt is better
Verify extracted data — check sums, percentages, row counts after parsing
Large tables truncate — caption in two passes: "提取前半部分" + "提取后半部分"
Chinese font — must set before any matplotlib call, or output is garbled
Timeout — single image ~10-30s, batch set timeout accordingly

sn-da-image-caption

このリポジトリの他の Skills

このリポジトリの他の Skills

Image Caption Analysis — 图片描述与数据提取

Overview

scripts/caption.py — Image Caption

Usage

Options

What it does automatically

JSON output format

Calling from Python

Prompt Strategy

Parsing Caption Results

Visualization

Chinese Font Setup (MANDATORY)

Color Palette

Save & Display

Export to Excel

Multi-Image Processing

Common Pitfalls

Image Caption Analysis — 图片描述与数据提取

Overview

scripts/caption.py — Image Caption

Usage

Options

What it does automatically

JSON output format

Calling from Python

Prompt Strategy

Parsing Caption Results

Visualization

Chinese Font Setup (MANDATORY)

Color Palette

Save & Display

Export to Excel

Multi-Image Processing

Common Pitfalls