ワンクリックで
data-explorer
// Performs exploratory data analysis, statistical analysis, and pattern discovery. Invoke when user wants to analyze data, find patterns, statistical testing, or get deep insights.
// Performs exploratory data analysis, statistical analysis, and pattern discovery. Invoke when user wants to analyze data, find patterns, statistical testing, or get deep insights.
| name | data-explorer |
| description | Performs exploratory data analysis, statistical analysis, and pattern discovery. Invoke when user wants to analyze data, find patterns, statistical testing, or get deep insights. |
Expert data scientist specializing in exploratory data analysis (EDA) and statistical analysis. Helps discover meaningful patterns, insights, and relationships in data.
Invoke this skill when user:
使用 Pandas 的情况:
使用 Pure Python 的情况:
自动检测并使用 Pandas:
# 自动检测是否有 pandas
try:
import pandas as pd
HAS_PANDAS = True
except ImportError:
HAS_PANDAS = False
if HAS_PANDAS:
# 使用 Pandas (推荐,数据量大时性能更好)
df = pd.read_csv('data.csv')
result = df.groupby('category')['value'].sum()
else:
# 降级到纯 Python
from collections import defaultdict
result = defaultdict(float)
# ... 手动实现
For e-commerce datasets, MUST calculate order amounts correctly:
# 正确的订单金额计算
from collections import defaultdict
# 按订单汇总 (order_items -> order)
order_total = defaultdict(float)
for item in order_items:
order_total[item['order_id']] += float(item['price']) + float(item.get('freight_value', 0))
# 然后计算统计量
all_order_amounts = list(order_total.values())
mean_amount = sum(all_order_amounts) / len(all_order_amounts)
| 数据类型 | 聚合方式 |
|---|---|
| 订单金额 | 按 order_id 汇总 (price + freight_value) |
| 评分 | 按 order_id 取平均值或最新值 |
| 支付金额 | 按 order_id 汇总 |
| 配送时间 | 按 order_id 计算 (delivered - purchase) |
# 自动检测 pandas
try:
import pandas as pd
import numpy as np
USE_PANDAS = True
except ImportError:
USE_PANDAS = False
if USE_PANDAS:
# ============ Pandas 版本 (推荐,数据量大时使用) ============
# 读取数据
orders = pd.read_csv('./data_storage/olist_orders_dataset.csv')
order_items = pd.read_csv('./data_storage/olist_order_items_dataset.csv')
# 正确的订单金额统计 (按order_id聚合)
order_amounts = order_items.groupby('order_id').agg({
'price': 'sum',
'freight_value': 'sum'
}).sum(axis=1)
amounts = order_amounts.values
# 描述性统计
mean_amount = amounts.mean()
median_amount = np.median(amounts)
std_amount = amounts.std()
q1, q2, q3 = np.percentile(amounts, [25, 50, 75])
# 偏度和峰度
skewness = pd.Series(amounts).skew()
kurtosis = pd.Series(amounts).kurtosis()
# 异常值检测 (IQR)
q1 = np.percentile(amounts, 25)
q3 = np.percentile(amounts, 75)
iqr = q3 - q1
lower = q1 - 1.5 * iqr
upper = q3 + 1.5 * iqr
outliers = amounts[(amounts < lower) | (amounts > upper)]
# RFM 分析
latest_date = orders['order_purchase_timestamp'].max()
rfm = orders.groupby('customer_id').agg({
'order_purchase_timestamp': lambda x: (pd.to_datetime(latest_date) - pd.to_datetime(x).max()).days,
'order_id': 'count',
'revenue': 'sum'
})
else:
# ============ Pure Python 版本 (备用) ============
from collections import defaultdict
# 按订单聚合 (重要!)
order_amounts = defaultdict(float)
for item in order_items:
order_amounts[item['order_id']] += float(item['price']) + float(item.get('freight_value', 0))
amounts = list(order_amounts.values())
# 描述性统计
mean_amount = sum(amounts) / len(amounts)
sorted_amounts = sorted(amounts)
n = len(sorted_amounts)
median_amount = (sorted_amounts[n//2-1] + sorted_amounts[n//2]) / 2 if n % 2 == 0 else sorted_amounts[n//2]
# 异常值检测 (IQR)
q1 = sorted_amounts[n//4]
q3 = sorted_amounts[3*n//4]
iqr = q3 - q1
lower = q1 - 1.5 * iqr
upper = q3 + 1.5 * iqr
outliers = [x for x in amounts if x < lower or x > upper]
Work with other skills:
All outputs should be in Chinese unless user specifies otherwise. Use Chinese for:
./data_storage/./analysis_reports/./generated_code/全面的AB测试分析工具,支持实验设计、统计检验、用户分群分析和可视化报告生成。用于分析产品改版、营销活动、功能优化等AB测试结果,提供统计显著性检验和深度洞察。
Perform multi-touch attribution analysis using Markov chains, Shapley values, and custom attribution models. Use when you need to analyze marketing channel effectiveness, calculate conversion attribution, optimize marketing budgets, or understand customer journey paths. Supports channel transition analysis, ROI calculation, and marketing optimization insights with Chinese language support.
Generates production-ready analysis code in Python, R, SQL. Invoke when user wants reusable code for data analysis, ML, or visualization.
Analyze text content using both traditional NLP and LLM-enhanced methods. Extract sentiment, topics, keywords, and insights from various content types including social media posts, articles, reviews, and video content. Use when working with text analysis, sentiment detection, topic modeling, or content optimization.
通用的 6 阶段数据分析助手:数据质量→探索性分析→假设生成→可视化→代码生成→综合报告。提供完整的方法论和模板!
自动化数据探索和可视化工具,提供从数据加载到专业报告生成的完整EDA解决方案。支持多种图表类型、智能数据诊断、建模评估和HTML报告生成。适用于医疗、金融、电商等领域的数据分析项目。