원클릭으로 Manus에서 모든 스킬 실행

$pwd:

sn-da-excel-workflow

Name: Sn Da Excel Workflow
Author: OpenSenseNova

// Excel 数据分析多步编排器。覆盖：(1) 读取多 Sheet Excel 文件并统计行数，(2) 大文件检测（≥10k 行自动 Parquet 优化），(3) 数据清洗（缺失值、文本标准化、无效字符），(4) 条件筛选与分类提取，(5) 跨 Sheet 统计聚合，(6) 导出 Excel/CSV 并提供下载链接。覆盖从数据读取到报告生成全流程，按步骤编排 capability 子 skill。**遇到以下任一情况就主动使用本 skill，不要自行写几行 pandas 就回答**：①用户出现触发词：Excel 分析 / 表格分析 / 数据分析 / 数据清洗 / 数据统计 / 数据筛选 / 数据可视化 / 数据导出 / 汇总统计 / 透视表 / 分组统计 / 交叉分析 / 趋势分析 / 对比分析 / 异常值检测 / 去重 / 缺失值处理 / Excel 报告 / 生成报表 / analyze Excel / data analysis / data cleaning / pivot table；②用户上传或指定了 .xlsx / .xls / .csv 文件并要求分析、清洗、统计或可视化；③任务涉及多 Sheet 读取、条件筛选、分类汇总、图表生成中的任意一项；④用户要求导出带格式的 Excel 报告或下载链接。仅不用于：不涉及表格数据的纯文本处理、图片分析（使用 sn-da-image-caption）、单个公式计算的简单问答。

Manus에서 실행

$ git log --oneline --stat

stars:2,899

forks:215

updated:2026년 5월 7일 14:00

파일 탐색기

45 개 파일

SKILL.md

readonly

related-skills.json

같은 저장소

sn-image-base.md

from "OpenSenseNova/SenseNova-Skills"

Base-layer skill for the SenseNova-Skills project, providing low-level APIs for image generation, recognition (VLM), and text optimization (LLM). This skill does not preprocess inputs; it only calls backend services and returns results. This skill is not user-facing and is intended for upper-layer skills only.

2026-05-272.9k

sn-infographic.md

from "OpenSenseNova/SenseNova-Skills"

Generates professional infographics with various layout types and visual styles. Analyzes content, recommends layout and style, and generates publication-ready infographics. Use when user asks to create "infographic", "信息图", "visual summary", or "可视化".

2026-05-272.9k

sn-ppt-creative.md

from "OpenSenseNova/SenseNova-Skills"

Creative-mode PPT pipeline. One full-page 16:9 PNG per slide. LLM / VLM calls go through sn-ppt-standard/lib/model_client.py (shared thin client). Text-to-image (the actual png rendering) goes through sn-image-base/scripts/sn_agent_runner.py. Expects task_pack.json + info_pack.json already written by sn-ppt-entry.

2026-05-252.9k

sn-ppt-doctor.md

from "OpenSenseNova/SenseNova-Skills"

Environment diagnostic for the PPT family. Validates sn-image-base, API keys, Node runtime, and optional deps; interactively writes .env for required vars. Runs before sn-ppt-entry; does not modify sn-image-* skills.

2026-05-252.9k

sn-ppt-standard.md

from "OpenSenseNova/SenseNova-Skills"

Standard-mode PPT pipeline. All LLM / VLM / T2I calls are wrapped in a single CLI entry (scripts/run_stage.py). The main agent's job is simple: emit ONE shell command per stage, never write loops, never write prompts.

2026-05-252.9k

sn-update.md

from "OpenSenseNova/SenseNova-Skills"

Update SenseNova Skills (the sn-* bundle) inside an OpenClaw or hermes-agent install. ALWAYS use this skill when the user says any of: "update SenseNova skills", "update SN skills", "更新 sensenova skills", "更新 sn skills", "刷新 sn-*", "升级 sn-* skills", or names a specific sn-* skill to update (e.g. "更新 sn-ppt-standard", "refresh sn-image-base"). Default scope is the whole sn-* bundle; if the user names specific skills, update ONLY those.

2026-05-072.9k

package.json

"author": "OpenSenseNova"

"repository": "OpenSenseNova/SenseNova-Skills"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

데이터 과학자컴퓨터 및 수학직15-2051L4

name

sn-da-excel-workflow

description

Excel 数据分析多步编排器。覆盖：(1) 读取多 Sheet Excel 文件并统计行数，(2) 大文件检测（≥10k 行自动 Parquet 优化），(3) 数据清洗（缺失值、文本标准化、无效字符），(4) 条件筛选与分类提取，(5) 跨 Sheet 统计聚合，(6) 导出 Excel/CSV 并提供下载链接。覆盖从数据读取到报告生成全流程，按步骤编排 capability 子 skill。**遇到以下任一情况就主动使用本 skill，不要自行写几行 pandas 就回答**：①用户出现触发词：Excel 分析 / 表格分析 / 数据分析 / 数据清洗 / 数据统计 / 数据筛选 / 数据可视化 / 数据导出 / 汇总统计 / 透视表 / 分组统计 / 交叉分析 / 趋势分析 / 对比分析 / 异常值检测 / 去重 / 缺失值处理 / Excel 报告 / 生成报表 / analyze Excel / data analysis / data cleaning / pivot table；②用户上传或指定了 .xlsx / .xls / .csv 文件并要求分析、清洗、统计或可视化；③任务涉及多 Sheet 读取、条件筛选、分类汇总、图表生成中的任意一项；④用户要求导出带格式的 Excel 报告或下载链接。仅不用于：不涉及表格数据的纯文本处理、图片分析（使用 sn-da-image-caption）、单个公式计算的简单问答。

Excel Data Analysis Workflow

End-to-end workflow for structured Excel analysis. Each step maps to a capability sub-skill that can be loaded for detailed patterns.

Workflow

Step 1 — Count rows across all sheets (lightweight, no full load)

Count rows per sheet without loading data into memory. Use openpyxl read_only mode — this works for any file size.

import openpyxl, gc

wb = openpyxl.load_workbook(file_path, read_only=True, data_only=True)
total_rows = 0
sheet_info = {}
for name in wb.sheetnames:
    ws = wb[name]
    row_count = sum(1 for _ in ws.iter_rows(min_row=2, values_only=True))
    total_rows += row_count
    sheet_info[name] = row_count
    print(f"Sheet '{name}': {row_count} rows")
wb.close()
print(f"总行数={total_rows}")

⚠️ Do NOT use pd.read_excel() to count rows — it loads all data into memory, which will OOM on large files.

→ capability: excel-reading/multi-sheet-reading

Step 2 — Large file gate (CRITICAL — choose strategy by row count)

total_rows	Strategy	What to do
< 10k	Direct read	`df = pd.read_excel(file_path, sheet_name=target_sheet)`
10k – 100k	Parquet cache	`pd.read_excel()` once → `df.to_parquet()` → all later reads from Parquet
>= 100k	STOP. Load `sn-da-large-file-analysis` skill	Read its SKILL.md, then follow its streaming read + Parquet pattern. Do NOT use `pd.read_excel()` at all — it will OOM or timeout on 100k+ rows.

For >= 100k rows:

read_file(path="<skills_base>/sn-da-large-file-analysis/SKILL.md")

Then use stream_excel_to_parquet() from that skill — it reads via openpyxl iter_rows in 50k-row chunks with constant memory.

For 10k – 100k rows (only):

import pandas as pd
parquet_path = "/tmp/_auto_parquet.parquet"
df = pd.read_excel(file_path, sheet_name=target_sheet)
df.to_parquet(parquet_path, engine="pyarrow")
del df; gc.collect()
df = pd.read_parquet(parquet_path)

→ capability: excel-reading/large-excel-reading

Step 3 — Inspect schema & data types

Preview target sheet structure. For large files (>= 10k rows), only read a small sample — never full load just to inspect.

# For any file size — read only first N rows for inspection
df_head = pd.read_excel(file_path, sheet_name=target_sheet, nrows=20)
print(f"Columns: {df_head.columns.tolist()}")
print(f"Dtypes:\n{df_head.dtypes}")
print(df_head.head(10))

→ capability: excel-reading/range-reading

Step 4 — Data cleaning

Handle missing values, normalize text, clean invalid characters.

# Missing values
null_count = df[col].isna().sum()

# Text cleaning: keep only Chinese characters
import re
def clean_text(val):
    if pd.isna(val): return val
    return "".join(re.findall(r"[\u4e00-\u9fff]", str(val))) or ""

df[col] = df[col].apply(clean_text)

⚠️ Large file rule: When total_rows >= 100k, do NOT use df.apply(lambda...). Use vectorized operations or np.where() instead. See sn-da-large-file-analysis skill for the vectorized cheat sheet.

→ capabilities:

excel-data-cleaning/missing-value-handling
excel-data-cleaning/invalid-data-cleaning
excel-data-cleaning/text-normalization

Step 5 — Filter & extract

Apply condition or category filters, aggregate results.

# Condition filter
mask = df[col].astype(str).str.strip() == target_value
filtered = df[mask]

# Category extraction (for headerless layouts)
df_raw = pd.read_excel(file_path, sheet_name=sheet, header=None)
# Walk rows to find category markers, collect items until next marker

→ capabilities:

excel-data-filtering/condition-filtering
excel-data-filtering/category-filtering
excel-data-filtering/threshold-filtering

Step 6 — Export results

Save filtered/cleaned data as Excel or CSV. Provide download link.

output_path = "/mnt/data/result.xlsx"
result_df.to_excel(output_path, index=False)
print(f"[Download](sandbox:{output_path})")

→ capabilities:

excel-result-export/single-sheet-export
excel-result-export/formatted-export

Key rules

Always count rows first — gate large-file logic on the 10k threshold.
>= 100k rows → MUST load sn-da-large-file-analysis skill — do not attempt to handle with pd.read_excel().
Column names may contain spaces (e.g. '是否通过') — use exact string indexing.
Headerless sheets — use header=None and positional indexing.
Prohibited on large files (>= 100k rows):
- pd.read_excel() for full load (use streaming read → Parquet)
- df.apply(lambda...) or df.iterrows() (use vectorized ops or itertuples())
- fc-list, find ... fonts, subprocess to search fonts, or pip install (use fixed font paths below)
- Printing all unique values or full DataFrames (use .head(), .value_counts().head())

CJK Font Setup (mandatory for charts)

When generating charts with matplotlib, copy this block as-is. Do NOT search for fonts.

import os
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm

_FONT_PATHS = [
    '/mnt/afs_agents/SimHei.ttf',
    '/mnt/afs_agents/mnt/data/SimHei.ttf',
    os.path.expanduser('~/.fonts/SimHei.ttf'),
    '/usr/share/fonts/truetype/wqy/wqy-zenhei.ttc',
    '/usr/share/fonts/SimHei.ttf',
]
for _p in _FONT_PATHS:
    if os.path.exists(_p):
        fm.fontManager.addfont(_p)
        matplotlib.rcParams['font.family'] = fm.FontProperties(fname=_p).get_name()
        break
matplotlib.rcParams['axes.unicode_minus'] = False

How to load sub-skills

Each workflow step references one or more capability sub-skills. When you need the detailed code pattern for a step, load the sub-skill on demand:

read_file(path="<base_path>/<category>/<sub-skill-name>/SKILL.md")

Rules:

Only load the sub-skill(s) needed for your current step.
Do NOT load all sub-skills at once — it wastes context.
The top-level workflow (this file) is your guide; sub-skills provide detailed implementation patterns.

Available capability sub-skills

Base path: <skills_root>/sn-da-excel-workflow/capability/{category}/{sub-skill}/SKILL.md

excel-reading — 读取与解析

Sub-skill	功能
single-sheet-reading	读取单个工作表，支持合并单元格处理、交叉分析及多维度可视化
multi-sheet-reading	读取多工作表，动态评估数据量启用Parquet优化，支持正则清洗、分类汇总与线性拟合
range-reading	特定区域数据提取，根据数据量动态选择处理策略
large-excel-reading	大型Excel文件处理，支持Parquet转换提速，生成带条件高亮的格式化报告
multi-file-reading	多文件读取与统计，支持大文件Parquet转换与可视化报告
specific-sheet-reading	跨Sheet特定字段统计、数据清洗与交叉分析，生成带下载链接的汇总报告
structured-header-reading	动态识别目标列进行统计，正则清洗文本字段提取中文字符

excel-data-cleaning — 数据清洗

Sub-skill	功能
missing-value-handling	多Sheet智能清洗、跨表核对与可视化分析
duplicate-removal	多Sheet去重统计，生成摘要与明细报表
invalid-data-cleaning	正则清洗指定文本列（如保留中文字符），大文件自动Parquet加速
text-normalization	文本标准化清洗（去除异常前缀、提取纯中文字符等）
numeric-format-normalization	数值格式标准化，支持关键指标合计核对与结果文件导出
outlier-detection	IQR异常值检测，结合偏度/峰度分析数据分布，适用于非正态数据预处理

excel-data-filtering — 数据筛选

Sub-skill	功能
condition-filtering	根据数据规模动态选择处理策略进行条件筛选
category-filtering	自定义分类统计、交叉分析，支持文本长度/术语密度/正则匹配等综合评分与分级
range-filtering	根据多维数值条件筛选并导出，支持大规模数据自动性能优化
threshold-filtering	数值列清洗、条件过滤，使用openpyxl对符合条件的单元格进行样式标记

excel-data-analysis — 数据分析

Sub-skill	功能
comparison-analysis	两类分类数据对比分析，统计数量差异与比例关系并生成可视化
group-by-analysis	多Sheet数据清洗及分组聚合分析，生成带样式标记的统计表与图表
kpi-metric-analysis	提取关键指标进行单位一致性验证与排序分析
pivot-table-analysis	交叉表与热力图进行多维度占比分析，适用于奖项分布/绩效评估/市场占有率
time-series-analysis	时间序列趋势分析、百分比清洗、绩效分级建模与预测，生成高分辨率可视化报告
trend-analysis	多维度分级评估与趋势预测，差异化增长率计算，适用于绩效评估/目标设定

excel-data-statistics — 统计计算

Sub-skill	功能
basic-statistics	基础统计，支持按条件筛选计算均值，指定行区间提取数据去重求和
category-statistics	各类别数量与占比统计，生成柱状图/饼图等组合可视化报告
grouped-statistics	多Sheet数据合并与前向填充，分组统计
percentage-calculation	逐行扫描或列匹配提取关键指标并计算占比/均值，输出结构化报告及图表

excel-data-visualization — 数据可视化

Sub-skill	功能
bar-chart-visualization	处理合并单元格，交叉分组统计，生成支持中英文字体的美化柱状图
histogram-visualization	数值型分布分析与异常值检测，支持正则提取误差项，生成箱线图与直方图
line-chart-visualization	特征清洗与聚类分析，生成趋势对比/分布特征/参数敏感性多维度图表
pie-chart-visualization	分类汇总统计，自动识别关键字段生成包含占比/数值的美化饼图
scatter-plot-visualization	多维度统计分析与散点图可视化
stacked-chart-visualization	百分比字符串数据处理，补全缺失维度，生成堆叠柱状图展示构成变化趋势

excel-cell-coloring — 单元格着色

Sub-skill	功能
category-coloring	提取目标指标计算最大值，对特定行进行高亮标注
duplicate-value-coloring	对比多表中的特定系数并对异常值进行颜色标记
outlier-coloring	识别超限数值与错误单元格并进行高亮标注
threshold-cell-coloring	计算时间序列平均值，使用openpyxl输出带条件格式（如低于均值标绿）的报告
top-value-coloring	根据数据规模动态选择策略，多表合并、统计筛选，关键指标自动化样式高亮

excel-conditional-formatting — 条件格式

Sub-skill	功能
data-bar-formatting	从带单位字符串列提取数值并清洗，生成直方图/饼图/条形图/累积分布图

excel-result-export — 结果导出

Sub-skill	功能
single-sheet-export	多Sheet数据探查与条件过滤导出，重命名字段后生成带下载链接的Excel
formatted-export	条件筛选记录并以整行标红格式导出Excel
chart-embedded-export	分类分布清洗与统计，生成多维度交叉分析与高分辨率嵌入式图表报告
report-generation-export	从Excel提取多类型数据，生成包含可视化图表与下载链接的综合分析报告

excel-table-styling — 表格样式

Sub-skill	功能
table-theme-styling	大文件Parquet加速读取，条件筛选/分类汇总与结果导出

sn-da-excel-workflow

이 저장소의 다른 Skills

이 저장소의 다른 Skills

Excel Data Analysis Workflow

Workflow

Step 1 — Count rows across all sheets (lightweight, no full load)

Step 2 — Large file gate (CRITICAL — choose strategy by row count)

Step 3 — Inspect schema & data types

Step 4 — Data cleaning

Step 5 — Filter & extract

Step 6 — Export results

Key rules

CJK Font Setup (mandatory for charts)

How to load sub-skills

Available capability sub-skills

excel-reading — 读取与解析

excel-data-cleaning — 数据清洗

excel-data-filtering — 数据筛选

excel-data-analysis — 数据分析

excel-data-statistics — 统计计算

excel-data-visualization — 数据可视化

excel-cell-coloring — 单元格着色

excel-conditional-formatting — 条件格式

excel-result-export — 结果导出

excel-table-styling — 表格样式

Excel Data Analysis Workflow

Workflow

Step 1 — Count rows across all sheets (lightweight, no full load)

Step 2 — Large file gate (CRITICAL — choose strategy by row count)

Step 3 — Inspect schema & data types

Step 4 — Data cleaning

Step 5 — Filter & extract

Step 6 — Export results

Key rules

CJK Font Setup (mandatory for charts)

How to load sub-skills

Available capability sub-skills

excel-reading — 读取与解析

excel-data-cleaning — 数据清洗

excel-data-filtering — 数据筛选

excel-data-analysis — 数据分析

excel-data-statistics — 统计计算

excel-data-visualization — 数据可视化

excel-cell-coloring — 单元格着色

excel-conditional-formatting — 条件格式

excel-result-export — 结果导出

excel-table-styling — 表格样式