with one click
对 Excel 数据进行多维度统计分析与可视化。
npx skills add https://github.com/OpenSenseNova/SenseNova-Skills --skill excel-statistical-viz-large-fileCopy and paste this command into Claude Code to install the skill
对 Excel 数据进行多维度统计分析与可视化。
npx skills add https://github.com/OpenSenseNova/SenseNova-Skills --skill excel-statistical-viz-large-fileCopy and paste this command into Claude Code to install the skill
Creative-mode PPT pipeline. One full-page 16:9 PNG per slide. LLM / VLM calls go through sn-ppt-standard/lib/model_client.py (shared thin client). Text-to-image (the actual png rendering) goes through sn-image-base/scripts/sn_agent_runner.py. Expects task_pack.json + info_pack.json already written by sn-ppt-entry.
Standard-mode PPT pipeline. All LLM / VLM / T2I calls are wrapped in a single CLI entry (scripts/run_stage.py). The main agent's job is simple: emit ONE shell command per stage, never write loops, never write prompts.
USE FOR Google-backed image discovery via Serper.dev. Returns image URLs, page URLs, titles, and source domains.
Base-layer skill for the SenseNova-Skills project, providing low-level APIs for image generation, recognition (VLM), and text optimization (LLM). This skill does not preprocess inputs; it only calls backend services and returns results. This skill is not user-facing and is intended for upper-layer skills only.
Generates professional infographics with various layout types and visual styles. Analyzes content, recommends layout and style, and generates publication-ready infographics. Use when user asks to create "infographic", "信息图", "visual summary", or "可视化".
Environment diagnostic for the PPT family. Validates sn-image-base, API keys, Node runtime, and optional deps; interactively writes .env for required vars. Runs before sn-ppt-entry; does not modify sn-image-* skills.
| name | excel-statistical-viz-large-file |
| description | 对 Excel 数据进行多维度统计分析与可视化。 |
This sub-skill covers one capability of the Excel workflow. For reading/counting/Parquet optimization, see the parent workflow SKILL.md.
Step1 数据清洗与标准化。提取目标分析列,处理合并单元格(ffill),并利用正则表达式或类型转换清理数值字段,确保分析数据集的准确性。
import re
# 假设 target_col_x 和 target_col_y 是分析目标
# 处理合并单元格导致的缺失值
df['group_col'] = df['group_col'].fillna(method='ffill')
def clean_numeric_string(value):
if pd.isna(value): return None
# 保留数字、小数点和负号,移除空格及非法字符
cleaned = re.sub(r'[^\d\.\-]', '', str(value))
try:
return float(cleaned)
except ValueError:
return None
df['x_val'] = df['target_col_x'].apply(clean_numeric_string)
df['y_val'] = df['target_col_y'].apply(clean_numeric_string)
# 过滤无效数据
df_clean = df.dropna(subset=['x_val', 'y_val']).copy()
Step2 执行多维度统计分析。计算分类占比、均值、标准差,并构建交叉分析表(crosstab/pivot),为可视化提供数据支撑。
# 分类统计与占比
stats_summary = df_clean.groupby('group_col')['y_val'].agg(['count', 'mean', 'std', 'min', 'max'])
stats_summary['percentage'] = (stats_summary['count'] / stats_summary['count'].sum()) * 100
# 添加总计行
total_row = pd.DataFrame(df_clean[['y_val']].agg(['count', 'mean']).T)
total_row.index = ['Total']
# 交叉分析示例
pivot_table = pd.pivot_table(df_clean, values='y_val', index='group_col', columns='category_col', aggfunc='count', fill_value=0)
Step3 生成高分辨率可视化图表。包含散点图、线性趋势线(R²、p值)、箱线图或柱状图组合,并配置中文字体与美化参数。
import matplotlib.pyplot as plt
import matplotlib
from scipy import stats
import numpy as np
# 字体配置:优先使用 SimHei 或 DejaVu Sans 确保中文显示
matplotlib.rcParams['font.sans-serif'] = ['SimHei', 'WenQuanYi Zen Hei', 'DejaVu Sans']
matplotlib.rcParams['axes.unicode_minus'] = False
x = df_clean['x_val'].values
y = df_clean['y_val'].values
# 线性回归计算
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
line = slope * x + intercept
plt.figure(figsize=(12, 8), dpi=300)
# 散点图:添加随机抖动 (jitter) 避免点重叠
jitter_x = x + np.random.normal(0, 0.01, size=len(x))
plt.scatter(jitter_x, y, alpha=0.6, edgecolors='w', label='Data Points')
# 趋势线
plt.plot(x, line, color='red', linestyle='--', linewidth=2,
label=f'Trend: y={slope:.4f}x+{intercept:.4f}\n$R^2$={r_value**2:.4f}, p={p_value:.4e}')
# 数据点标注 (实战技巧:仅标注极值或特定点)
for i, (xi, yi) in enumerate(zip(x, y)):
if i % (len(x)//5 or 1) == 0: # 抽样标注避免拥挤
plt.annotate(f'({xi:.2f}, {yi:.2f})', (xi, yi), textcoords="offset points", xytext=(5,5), fontsize=8)
plt.xlabel('Dimension X')
plt.ylabel('Dimension Y')
plt.title('Statistical Distribution & Trend Analysis')
plt.grid(True, linestyle=':', alpha=0.6)
plt.legend()
output_img = 'analysis_plot.png'
plt.savefig(output_img, bbox_inches='tight')
plt.show()
Step4 导出分析结果并生成下载链接。将清洗后的数据及统计摘要保存为 CSV 或 Excel 文件。
output_csv = 'cleaned_analysis_data.csv'
# 使用 utf-8-sig 确保 Excel 打开中文不乱码
df_clean.to_csv(output_csv, index=False, encoding='utf-8-sig')
print(f"Visualization saved to: {output_img}")
print(f"Data exported to: {output_csv}")
# 打印回归关键指标供快速参考
print(f"R-squared: {r_value**2:.6f}, P-value: {p_value:.6f}")