Run any Skill in Manus with one click

Get Started

excel-statistical-viz-large-file

对 Excel 数据进行多维度统计分析与可视化。

Run Skill in Manus

Overview

对 Excel 数据进行多维度统计分析与可视化。

Install command

npx skills add https://github.com/OpenSenseNova/SenseNova-Skills --skill excel-statistical-viz-large-file

Copy and paste this command into Claude Code to install the skill

Source

OpenSenseNova/SenseNova-Skills

Stars3,284

Forks238

UpdatedApril 26, 2026 at 11:07

SKILL.md

readonly

name	excel-statistical-viz-large-file
description	对 Excel 数据进行多维度统计分析与可视化。

excel_statistical_visualization

This sub-skill covers one capability of the Excel workflow. For reading/counting/Parquet optimization, see the parent workflow SKILL.md.

Step1 数据清洗与标准化。提取目标分析列，处理合并单元格（ffill），并利用正则表达式或类型转换清理数值字段，确保分析数据集的准确性。

import re

# 假设 target_col_x 和 target_col_y 是分析目标
# 处理合并单元格导致的缺失值
df['group_col'] = df['group_col'].fillna(method='ffill')

def clean_numeric_string(value):
    if pd.isna(value): return None
    # 保留数字、小数点和负号，移除空格及非法字符
    cleaned = re.sub(r'[^\d\.\-]', '', str(value))
    try:
        return float(cleaned)
    except ValueError:
        return None

df['x_val'] = df['target_col_x'].apply(clean_numeric_string)
df['y_val'] = df['target_col_y'].apply(clean_numeric_string)

# 过滤无效数据
df_clean = df.dropna(subset=['x_val', 'y_val']).copy()

Step2 执行多维度统计分析。计算分类占比、均值、标准差，并构建交叉分析表（crosstab/pivot），为可视化提供数据支撑。

# 分类统计与占比
stats_summary = df_clean.groupby('group_col')['y_val'].agg(['count', 'mean', 'std', 'min', 'max'])
stats_summary['percentage'] = (stats_summary['count'] / stats_summary['count'].sum()) * 100

# 添加总计行
total_row = pd.DataFrame(df_clean[['y_val']].agg(['count', 'mean']).T)
total_row.index = ['Total']

# 交叉分析示例
pivot_table = pd.pivot_table(df_clean, values='y_val', index='group_col', columns='category_col', aggfunc='count', fill_value=0)

Step3 生成高分辨率可视化图表。包含散点图、线性趋势线（R²、p值）、箱线图或柱状图组合，并配置中文字体与美化参数。

import matplotlib.pyplot as plt
import matplotlib
from scipy import stats
import numpy as np

# 字体配置：优先使用 SimHei 或 DejaVu Sans 确保中文显示
matplotlib.rcParams['font.sans-serif'] = ['SimHei', 'WenQuanYi Zen Hei', 'DejaVu Sans']
matplotlib.rcParams['axes.unicode_minus'] = False

x = df_clean['x_val'].values
y = df_clean['y_val'].values

# 线性回归计算
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
line = slope * x + intercept

plt.figure(figsize=(12, 8), dpi=300)

# 散点图：添加随机抖动 (jitter) 避免点重叠
jitter_x = x + np.random.normal(0, 0.01, size=len(x))
plt.scatter(jitter_x, y, alpha=0.6, edgecolors='w', label='Data Points')

# 趋势线
plt.plot(x, line, color='red', linestyle='--', linewidth=2, 
         label=f'Trend: y={slope:.4f}x+{intercept:.4f}\n$R^2$={r_value**2:.4f}, p={p_value:.4e}')

# 数据点标注 (实战技巧：仅标注极值或特定点)
for i, (xi, yi) in enumerate(zip(x, y)):
    if i % (len(x)//5 or 1) == 0: # 抽样标注避免拥挤
        plt.annotate(f'({xi:.2f}, {yi:.2f})', (xi, yi), textcoords="offset points", xytext=(5,5), fontsize=8)

plt.xlabel('Dimension X')
plt.ylabel('Dimension Y')
plt.title('Statistical Distribution & Trend Analysis')
plt.grid(True, linestyle=':', alpha=0.6)
plt.legend()

output_img = 'analysis_plot.png'
plt.savefig(output_img, bbox_inches='tight')
plt.show()

Step4 导出分析结果并生成下载链接。将清洗后的数据及统计摘要保存为 CSV 或 Excel 文件。

output_csv = 'cleaned_analysis_data.csv'
# 使用 utf-8-sig 确保 Excel 打开中文不乱码
df_clean.to_csv(output_csv, index=False, encoding='utf-8-sig')

print(f"Visualization saved to: {output_img}")
print(f"Data exported to: {output_csv}")
# 打印回归关键指标供快速参考
print(f"R-squared: {r_value**2:.6f}, P-value: {p_value:.6f}")

More from this repository

same repository

sn-ppt-creative

OpenSenseNova/SenseNova-Skills

Creative-mode PPT pipeline. One full-page 16:9 PNG per slide. LLM / VLM calls go through sn-ppt-standard/lib/model_client.py (shared thin client). Text-to-image (the actual png rendering) goes through sn-image-base/scripts/sn_agent_runner.py. Expects task_pack.json + info_pack.json already written by sn-ppt-entry.

2026-06-013.3k

sn-ppt-standard

OpenSenseNova/SenseNova-Skills

Standard-mode PPT pipeline. All LLM / VLM / T2I calls are wrapped in a single CLI entry (scripts/run_stage.py). The main agent's job is simple: emit ONE shell command per stage, never write loops, never write prompts.

2026-06-013.3k

sn-search-image

OpenSenseNova/SenseNova-Skills

USE FOR Google-backed image discovery via Serper.dev. Returns image URLs, page URLs, titles, and source domains.

2026-06-013.3k

sn-image-base

OpenSenseNova/SenseNova-Skills

Base-layer skill for the SenseNova-Skills project, providing low-level APIs for image generation, recognition (VLM), and text optimization (LLM). This skill does not preprocess inputs; it only calls backend services and returns results. This skill is not user-facing and is intended for upper-layer skills only.

2026-05-273.3k

sn-infographic

OpenSenseNova/SenseNova-Skills

Generates professional infographics with various layout types and visual styles. Analyzes content, recommends layout and style, and generates publication-ready infographics. Use when user asks to create "infographic", "信息图", "visual summary", or "可视化".

2026-05-273.3k

sn-ppt-doctor

OpenSenseNova/SenseNova-Skills

Environment diagnostic for the PPT family. Validates sn-image-base, API keys, Node runtime, and optional deps; interactively writes .env for required vars. Runs before sn-ppt-entry; does not modify sn-image-* skills.

2026-05-253.3k

Source

OpenSenseNova

OpenSenseNova/SenseNova-Skills

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Data ScientistsComputer and Mathematical Occupations15-2051L4

name	excel-statistical-viz-large-file
description	对 Excel 数据进行多维度统计分析与可视化。

excel_statistical_visualization

This sub-skill covers one capability of the Excel workflow. For reading/counting/Parquet optimization, see the parent workflow SKILL.md.

Step1 数据清洗与标准化。提取目标分析列，处理合并单元格（ffill），并利用正则表达式或类型转换清理数值字段，确保分析数据集的准确性。

import re

# 假设 target_col_x 和 target_col_y 是分析目标
# 处理合并单元格导致的缺失值
df['group_col'] = df['group_col'].fillna(method='ffill')

def clean_numeric_string(value):
    if pd.isna(value): return None
    # 保留数字、小数点和负号，移除空格及非法字符
    cleaned = re.sub(r'[^\d\.\-]', '', str(value))
    try:
        return float(cleaned)
    except ValueError:
        return None

df['x_val'] = df['target_col_x'].apply(clean_numeric_string)
df['y_val'] = df['target_col_y'].apply(clean_numeric_string)

# 过滤无效数据
df_clean = df.dropna(subset=['x_val', 'y_val']).copy()

Step2 执行多维度统计分析。计算分类占比、均值、标准差，并构建交叉分析表（crosstab/pivot），为可视化提供数据支撑。

# 分类统计与占比
stats_summary = df_clean.groupby('group_col')['y_val'].agg(['count', 'mean', 'std', 'min', 'max'])
stats_summary['percentage'] = (stats_summary['count'] / stats_summary['count'].sum()) * 100

# 添加总计行
total_row = pd.DataFrame(df_clean[['y_val']].agg(['count', 'mean']).T)
total_row.index = ['Total']

# 交叉分析示例
pivot_table = pd.pivot_table(df_clean, values='y_val', index='group_col', columns='category_col', aggfunc='count', fill_value=0)

Step3 生成高分辨率可视化图表。包含散点图、线性趋势线（R²、p值）、箱线图或柱状图组合，并配置中文字体与美化参数。

import matplotlib.pyplot as plt
import matplotlib
from scipy import stats
import numpy as np

# 字体配置：优先使用 SimHei 或 DejaVu Sans 确保中文显示
matplotlib.rcParams['font.sans-serif'] = ['SimHei', 'WenQuanYi Zen Hei', 'DejaVu Sans']
matplotlib.rcParams['axes.unicode_minus'] = False

x = df_clean['x_val'].values
y = df_clean['y_val'].values

# 线性回归计算
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
line = slope * x + intercept

plt.figure(figsize=(12, 8), dpi=300)

# 散点图：添加随机抖动 (jitter) 避免点重叠
jitter_x = x + np.random.normal(0, 0.01, size=len(x))
plt.scatter(jitter_x, y, alpha=0.6, edgecolors='w', label='Data Points')

# 趋势线
plt.plot(x, line, color='red', linestyle='--', linewidth=2, 
         label=f'Trend: y={slope:.4f}x+{intercept:.4f}\n$R^2$={r_value**2:.4f}, p={p_value:.4e}')

# 数据点标注 (实战技巧：仅标注极值或特定点)
for i, (xi, yi) in enumerate(zip(x, y)):
    if i % (len(x)//5 or 1) == 0: # 抽样标注避免拥挤
        plt.annotate(f'({xi:.2f}, {yi:.2f})', (xi, yi), textcoords="offset points", xytext=(5,5), fontsize=8)

plt.xlabel('Dimension X')
plt.ylabel('Dimension Y')
plt.title('Statistical Distribution & Trend Analysis')
plt.grid(True, linestyle=':', alpha=0.6)
plt.legend()

output_img = 'analysis_plot.png'
plt.savefig(output_img, bbox_inches='tight')
plt.show()

Step4 导出分析结果并生成下载链接。将清洗后的数据及统计摘要保存为 CSV 或 Excel 文件。

output_csv = 'cleaned_analysis_data.csv'
# 使用 utf-8-sig 确保 Excel 打开中文不乱码
df_clean.to_csv(output_csv, index=False, encoding='utf-8-sig')

print(f"Visualization saved to: {output_img}")
print(f"Data exported to: {output_csv}")
# 打印回归关键指标供快速参考
print(f"R-squared: {r_value**2:.6f}, P-value: {p_value:.6f}")