Run any Skill in Manus with one click

famou-data-analysis

A data analysis skill for understanding datasets, analyzing data, building data processing pipelines, and summarizing analytical results. Use this skill when the user mentions "analyze data", "data processing", "data exploration", "statistical analysis", "data cleaning", "data summarization", "create a data report", "understand this dataset", or "take a look at this CSV/Excel/dataset". Even if the user simply says "help me look at this data" or "analyze this", trigger this skill whenever the context involves a data file or dataset. Also invoke this skill if data analysis is required during Famou problem definition.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/baidubce/skills --skill famou-data-analysis

Copy and paste this command into Claude Code to install the skill

Source

baidubce/skills

Stars21

Forks7

UpdatedMay 12, 2026 at 10:43

SKILL.md

readonly

name	famou-data-analysis
description	A data analysis skill for understanding datasets, analyzing data, building data processing pipelines, and summarizing analytical results. Use this skill when the user mentions "analyze data", "data processing", "data exploration", "statistical analysis", "data cleaning", "data summarization", "create a data report", "understand this dataset", or "take a look at this CSV/Excel/dataset". Even if the user simply says "help me look at this data" or "analyze this", trigger this skill whenever the context involves a data file or dataset. Also invoke this skill if data analysis is required during Famou problem definition.
metadata	{"author":"famou-group","version":"2.0"}

Data Analysis Skill

Analysis Goals

The core objectives of a data analysis task are:

Understand the data: Clarify what the data is, where it comes from, and what business meaning it carries — not just read the file
Assess data quality: Identify missing values, duplicates, anomalies, and formatting inconsistencies; evaluate data trustworthiness
Discover patterns and insights: Use statistics and exploration to surface meaningful patterns, trends, anomalies, and correlations
Design a processing pipeline: Based on the actual data issues, design a reproducible and well-reasoned cleaning and transformation plan
Summarize findings: Deliver conclusions in clear, business-relevant language — not just a pile of statistics

Constraints

Always follow these constraints throughout the analysis:

Do not modify raw data without permission: Always inform the user and explain the reason before performing any cleaning or transformation
Do not over-assume data meaning: When column names are unclear, ask the user for clarification rather than guessing
Do not ignore data quality issues: Anomalies, missing values, and inconsistencies must be explicitly flagged — never silently worked around
Do not rigidly apply a fixed pipeline: Data formats vary widely; adapt the analysis approach to the actual situation
Do not stray from the user's goal: Always keep the analysis depth and direction anchored to "what problem is the user trying to solve"

Best Practices

Understand first, then act

After receiving data, prioritize understanding the context: What is the business scenario? What is the user's analytical goal? This determines which columns matter, what counts as "anomalous", and how to handle missing values.

Explain results in business language

Don't just output statistics — explain what they mean. "The mean is 3200" is less useful than "The average order value is approximately $3,200, but the median is only $1,800, suggesting a small number of high-value orders are inflating the mean."

Triage data quality issues by severity

Blocking issues (e.g., critical columns entirely empty, widespread primary key duplication): Stop and inform the user; wait for confirmation before proceeding
Decision-required issues (e.g., high missing rates, ambiguous outliers): Present the tradeoffs of each handling option and let the user choose
Minor issues (e.g., scattered format inconsistencies, leading/trailing whitespace): Handle directly, but document them in the report

Make the processing pipeline explainable

Every operation should be answerable with "why did we do this?", for example:

"Filled with the median because this column is right-skewed and the mean is heavily influenced by outliers"
"Dropped this column because the missing rate is 78%, making it unusable"

Back every conclusion with data

Each insight should be accompanied by specific numbers as evidence; avoid subjective judgments unsupported by data.

Environment & Tools

File paths: Uploaded files are at /mnt/user-data/uploads/; save output files to /mnt/user-data/outputs/
Encoding issues with non-ASCII data: Try utf-8 first; fall back to latin-1 or cp1252 if needed
Chart font rendering: Set matplotlib.rcParams['font.sans-serif'] = ['DejaVu Sans'] for reliable rendering
CSV output for Excel: Use encoding='utf-8-sig' to prevent encoding issues when opening in Excel

Analysis Report Format

Regardless of the analysis path taken, the final summary report should include:

## Data Analysis Report

### Data Overview
(Dataset size, source, time range, description of key fields)

### Data Quality
(Issues found, severity level, actions taken or items pending confirmation)

### Key Findings
(3–5 most important insights, each supported by data)

### Processing Pipeline
(What transformations were applied to the data and why)

### Recommendations
(Suggestions for data improvement, directions for deeper analysis)

Examples

The following examples show how to apply the goals and constraints above when facing different types of data.

Example A: E-commerce Order Data Analysis

Scenario: The user uploads an order CSV with columns for Order ID, Order Time, Product Category, Amount, User City, and Payment Status, and wants to understand "overall sales performance."

Analysis approach:

Confirm the time range first — this is the baseline context for any sales analysis
Check the Amount column for negative values (refunds? data entry errors?) and zero values (require business explanation)
The distribution of Payment Status determines the definition of "valid orders"
Aggregate by time to spot trends; break down by category and city for composition analysis

Quality issue handling example:

Found 12 rows with negative values in the "Amount" column. In sales data, negatives typically represent refund records. Recommendation: filter them out when analyzing gross revenue; keep them when analyzing net income. Please confirm your analysis goal before we proceed.

Key findings example:

March revenue grew 34% month-over-month, primarily driven by the "Electronics" category (contributing 61% of the incremental growth)

New York and Los Angeles together account for 47% of all orders, but average order value is 15% below the average for mid-tier cities

Payment failure rate is 8.3%, above the typical industry benchmark of 2–5%; recommend investigating the payment flow

Example B: Messy Survey Data

Scenario: The user uploads an Excel file with multiple sheets, inconsistent column names (some say "Q1", others say "Question 1"), merged cells, and scattered blank rows and columns.

First step: Don't rush into analysis — first surface the structural issues to the user:

This file contains 3 sheets, and there are several structural issues to clarify before we proceed:

Sheet1 and Sheet2 have different column names — are these different batches of the same survey, or entirely different surveys?

Rows 5–8 are blank — is it safe to remove them?

The "Q3_Other" column is 92% empty — is this a low open-response rate, or a data export issue?

Once confirmed, I'll design a cleaning plan.

Constraint in action: Column meanings are not assumed, sheets are not merged arbitrarily — issues are surfaced first and the user is asked to confirm.

Example C: User Requests "Clean the Data and Output a Processed File"

Processing log example (provided alongside the output file):

Data Processing Log
Source file:  sales_2024.csv  (8,412 rows × 15 columns)
Output file:  sales_2024_cleaned.csv  (8,203 rows × 13 columns)

Changes:
1. Removed duplicate rows: deleted 89 fully duplicate records
2. Dropped "Notes2" column: 96% missing, no useful information
3. Dropped "Internal Code" column: user confirmed it is not needed for analysis
4. Standardized Amount column: converted "¥1,200.00" format to numeric 1200.0 (203 rows affected)
5. Standardized Date column: unified to YYYY-MM-DD format (source had mixed MM/DD/YYYY and written-out date formats)
6. Missing value handling:
   - "City" column: 34 missing values → filled with "Unknown" (user confirmed)
   - "Amount" column: 86 missing values → left blank (user confirmed these are anomalous records that should not be imputed)

More from this repository

same repository

famou-experiment-manager

baidubce/skills

Workflow skill for managing famou evolutionary experiment tasks, including public normal mode and public pro hybrid mode. Use this skill when the user mentions "submit experiment", "check experiment status", "delete experiment", "get experiment results", "account info", "quota", "credits", "famou experiment", "upload experiment", "config.yaml experiment", "hybrid mode", or needs to use famou-ctl to manage experiment tasks. Even if the user just says "submit" or "run experiment", trigger this skill whenever the context involves the famou platform.

2026-05-1321

famou-artifact-generator

baidubce/skills

Interactive end-to-end Famou workflow for defining, implementing, and solving optimization tasks. The workflow typically proceeds in three stages: (1) understand the data and define the task, producing `problem.md`; (2) implement and validate `evaluator.py`, `init.py`, and `prompt.md` from the task definition; (3) run deep solving through Famou. Trigger this skill whenever the user wants to define, clarify, create, or fix a Famou task; prepare Famou experiment artifacts; write or update `problem.md`, `evaluator.py`, `init.py`, or `prompt.md`; run Famou; do deep solving; or solve an optimization, ML, or search problem with evolutionary methods. Even if the user simply says "help me make a Famou task", "help me solve this", or "run Famou", trigger this skill whenever the surrounding context indicates an optimization or search task. Also trigger when the user describes a combinatorial optimization, scheduling, routing, or ML problem without mentioning Famou — treat it as a potential Famou task.

2026-05-1221

famou-result-visualization

baidubce/skills

Generate interactive visualization pages for feasible solutions produced by Famou evolutionary algorithms. Use this skill when the user mentions "Famou visualization", "visualize this solution", "show feasible solution results", "evolution results", "evolve visualization", or provides a Python-code solution (path planning, scheduling, knapsack, TSP, job scheduling, machine learning, etc.) that needs to be displayed visually. Even if the user just says "help me visualize this solution", "draw it out", or "show me the results", trigger this skill immediately whenever the context involves evolutionary algorithms or optimization problem solutions.

2026-05-1221

baidu-search

baidubce/skills

Search the web using Baidu AI Search Engine (BDSE). Use for live information, documentation, or research topics.

2026-03-2421

medical-bill-organizer

baidubce/skills

医疗票据整理助手，用于自动分类、OCR识别和信息抽取。当用户需要处理医疗票据时使用，支持文件夹或压缩包输入，自动分类存放到不同文件夹，抽取病历信息（住院日期、医院名称），生成发票汇总CSV表格。

2026-03-2321

qianfanocr-document-intelligence

baidubce/skills

Analyze image files, image URLs, PDF files, and PDF URLs when the task requires recognizing, understanding, extracting, answering questions about, or locating content from the visual input. Typical uses include document parsing, layout analysis, element recognition, OCR, key information extraction, chart understanding, and document VQA. Do not use for plain text, structured data, code files, image-processing tasks, or cases where the needed information is already available in text form.

2026-03-1821

Source

baidubce

baidubce/skills

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Data ScientistsComputer and Mathematical Occupations15-2051L4

name	famou-data-analysis
description	A data analysis skill for understanding datasets, analyzing data, building data processing pipelines, and summarizing analytical results. Use this skill when the user mentions "analyze data", "data processing", "data exploration", "statistical analysis", "data cleaning", "data summarization", "create a data report", "understand this dataset", or "take a look at this CSV/Excel/dataset". Even if the user simply says "help me look at this data" or "analyze this", trigger this skill whenever the context involves a data file or dataset. Also invoke this skill if data analysis is required during Famou problem definition.
metadata	{"author":"famou-group","version":"2.0"}

Data Analysis Skill

Analysis Goals

The core objectives of a data analysis task are:

Understand the data: Clarify what the data is, where it comes from, and what business meaning it carries — not just read the file
Assess data quality: Identify missing values, duplicates, anomalies, and formatting inconsistencies; evaluate data trustworthiness
Discover patterns and insights: Use statistics and exploration to surface meaningful patterns, trends, anomalies, and correlations
Design a processing pipeline: Based on the actual data issues, design a reproducible and well-reasoned cleaning and transformation plan
Summarize findings: Deliver conclusions in clear, business-relevant language — not just a pile of statistics

Constraints

Always follow these constraints throughout the analysis:

Do not modify raw data without permission: Always inform the user and explain the reason before performing any cleaning or transformation
Do not over-assume data meaning: When column names are unclear, ask the user for clarification rather than guessing
Do not ignore data quality issues: Anomalies, missing values, and inconsistencies must be explicitly flagged — never silently worked around
Do not rigidly apply a fixed pipeline: Data formats vary widely; adapt the analysis approach to the actual situation
Do not stray from the user's goal: Always keep the analysis depth and direction anchored to "what problem is the user trying to solve"

Best Practices

Understand first, then act

Explain results in business language

Triage data quality issues by severity

Blocking issues (e.g., critical columns entirely empty, widespread primary key duplication): Stop and inform the user; wait for confirmation before proceeding
Decision-required issues (e.g., high missing rates, ambiguous outliers): Present the tradeoffs of each handling option and let the user choose
Minor issues (e.g., scattered format inconsistencies, leading/trailing whitespace): Handle directly, but document them in the report

Make the processing pipeline explainable

Every operation should be answerable with "why did we do this?", for example:

"Filled with the median because this column is right-skewed and the mean is heavily influenced by outliers"
"Dropped this column because the missing rate is 78%, making it unusable"

Back every conclusion with data

Each insight should be accompanied by specific numbers as evidence; avoid subjective judgments unsupported by data.

Environment & Tools

File paths: Uploaded files are at /mnt/user-data/uploads/; save output files to /mnt/user-data/outputs/
Encoding issues with non-ASCII data: Try utf-8 first; fall back to latin-1 or cp1252 if needed
Chart font rendering: Set matplotlib.rcParams['font.sans-serif'] = ['DejaVu Sans'] for reliable rendering
CSV output for Excel: Use encoding='utf-8-sig' to prevent encoding issues when opening in Excel

Analysis Report Format

Regardless of the analysis path taken, the final summary report should include:

## Data Analysis Report

### Data Overview
(Dataset size, source, time range, description of key fields)

### Data Quality
(Issues found, severity level, actions taken or items pending confirmation)

### Key Findings
(3–5 most important insights, each supported by data)

### Processing Pipeline
(What transformations were applied to the data and why)

### Recommendations
(Suggestions for data improvement, directions for deeper analysis)

Examples

The following examples show how to apply the goals and constraints above when facing different types of data.

Example A: E-commerce Order Data Analysis

Scenario: The user uploads an order CSV with columns for Order ID, Order Time, Product Category, Amount, User City, and Payment Status, and wants to understand "overall sales performance."

Analysis approach:

Confirm the time range first — this is the baseline context for any sales analysis
Check the Amount column for negative values (refunds? data entry errors?) and zero values (require business explanation)
The distribution of Payment Status determines the definition of "valid orders"
Aggregate by time to spot trends; break down by category and city for composition analysis

Quality issue handling example:

Found 12 rows with negative values in the "Amount" column. In sales data, negatives typically represent refund records. Recommendation: filter them out when analyzing gross revenue; keep them when analyzing net income. Please confirm your analysis goal before we proceed.

Key findings example:

March revenue grew 34% month-over-month, primarily driven by the "Electronics" category (contributing 61% of the incremental growth)

New York and Los Angeles together account for 47% of all orders, but average order value is 15% below the average for mid-tier cities

Payment failure rate is 8.3%, above the typical industry benchmark of 2–5%; recommend investigating the payment flow

Example B: Messy Survey Data

Scenario: The user uploads an Excel file with multiple sheets, inconsistent column names (some say "Q1", others say "Question 1"), merged cells, and scattered blank rows and columns.

First step: Don't rush into analysis — first surface the structural issues to the user:

This file contains 3 sheets, and there are several structural issues to clarify before we proceed:

Sheet1 and Sheet2 have different column names — are these different batches of the same survey, or entirely different surveys?

Rows 5–8 are blank — is it safe to remove them?

The "Q3_Other" column is 92% empty — is this a low open-response rate, or a data export issue?

Once confirmed, I'll design a cleaning plan.

Constraint in action: Column meanings are not assumed, sheets are not merged arbitrarily — issues are surfaced first and the user is asked to confirm.

Example C: User Requests "Clean the Data and Output a Processed File"

Processing log example (provided alongside the output file):

Data Processing Log
Source file:  sales_2024.csv  (8,412 rows × 15 columns)
Output file:  sales_2024_cleaned.csv  (8,203 rows × 13 columns)

Changes:
1. Removed duplicate rows: deleted 89 fully duplicate records
2. Dropped "Notes2" column: 96% missing, no useful information
3. Dropped "Internal Code" column: user confirmed it is not needed for analysis
4. Standardized Amount column: converted "¥1,200.00" format to numeric 1200.0 (203 rows affected)
5. Standardized Date column: unified to YYYY-MM-DD format (source had mixed MM/DD/YYYY and written-out date formats)
6. Missing value handling:
   - "City" column: 34 missing values → filled with "Unknown" (user confirmed)
   - "Amount" column: 86 missing values → left blank (user confirmed these are anomalous records that should not be imputed)