Run any Skill in Manus with one click

data-science-pro

Expert Data Science development covering statistical analysis, Exploratory Data Analysis (EDA), machine learning (Scikit-Learn), and data visualization.

Run Skill in Manus

Overview

Expert Data Science development covering statistical analysis, Exploratory Data Analysis (EDA), machine learning (Scikit-Learn), and data visualization.

Install command

npx skills add https://github.com/truongnat/skills --skill data-science-pro

Copy and paste this command into Claude Code to install the skill

Source

truongnat/skills

Stars2

Forks0

UpdatedMay 9, 2026 at 03:10

SKILL.md

readonly

name	data-science-pro
description	Expert Data Science development covering statistical analysis, Exploratory Data Analysis (EDA), machine learning (Scikit-Learn), and data visualization.
metadata	{"short-description":"Data Science — EDA, Pandas, Stats, Scikit-Learn, Visualization","content-language":"en","domain":"data-ai","level":"professional"}

Data Science Pro

Expert-level orchestration of analytical workflows and statistical modeling. Focuses on extracting actionable insights from data through rigorous analysis and scientific methods.

Boundary

data-science-pro covers Data Wrangling (Pandas, NumPy), Exploratory Data Analysis (EDA), Statistical Testing (A/B testing, Hypothesis testing), traditional Machine Learning (Scikit-Learn, XGBoost), and Visualization (Matplotlib, Seaborn). It does NOT cover deep learning/LLM training (use machine-learning-pro) or building production data pipelines (use data-engineering-pro).

When to use

Performing Exploratory Data Analysis (EDA) on a new dataset.
Designing and analyzing A/B tests to validate product changes.
Building predictive models (Classification, Regression, Clustering) using traditional ML.
Creating comprehensive data visualizations to communicate findings to stakeholders.

Workflow

Problem Definition: Define the business question or hypothesis.
Data Acquisition & Cleaning: Gather data and handle missing values, outliers, and formats.
Exploratory Data Analysis (EDA): Understand distributions, correlations, and basic patterns.
Feature Engineering: Create new meaningful features from raw data.
Modeling & Evaluation: Train statistical or ML models and evaluate using appropriate metrics (F1-score, RMSE).
Communication: Present findings visually and document actionable business recommendations.

Operating principles

Garbage In, Garbage Out: The quality of your analysis depends entirely on the quality of your data cleaning.
Start Simple: Always start with a simple baseline model (e.g., Logistic Regression) before trying complex algorithms.
Explainability: In business contexts, an interpretable model is often more valuable than a slightly more accurate "black box".
Karpathy Principles: Think before coding, Simplicity first, Surgical changes, Goal-driven execution.

Suggested response format (STRICT)

Your response MUST follow this structure:

<Role>
Senior Data Scientist.
</Role>

<Methodology>
[Description of statistical approach or ML methodology]
</Methodology>

<Implementation>
[Data Science Artifact: Python/Pandas script, Jupyter Notebook snippet, or Model logic]
</Implementation>

<Verification>
[Validation plan: Cross-validation, P-value checks, or Visual checks]
</Verification>

Resources in this skill

Topic	Reference
Data Scientist Roadmap	roadmap.sh/data-scientist
Pandas Documentation	pandas.pydata.org/docs
Scikit-Learn	scikit-learn.org/stable
Statistical Learning (ISLR)	statlearning.com

Quick example

Methodology: Fill missing values and train a Random Forest Classifier.

import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Data Prep
df = pd.read_csv('data.csv')
df.fillna(df.median(), inplace=True)

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Modeling
clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)

# Evaluation
preds = clf.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, preds):.2f}")

Checklist before calling the skill done

Think Before Coding: Statistical assumptions and bias in the dataset analyzed.
Simplicity First: Simple models and clear visualizations prioritized.
Surgical Changes: Only applied necessary transformations or model tuning.
Goal-Driven Execution: Verified model performance against a holdout test set.
Data leakage prevented (e.g., scaling applied after train-test split).
Visualizations are accessible, labeled, and tell a clear story.
Statistical significance tested where appropriate.

More from this repository

same repository

content-analysis-pro

truongnat/skills

Production-grade multimodal content analysis: explicit analysis pipeline (modality → decode → segment → extract → verify → report), evidence and provenance rules (page, time, region anchors), grounded vs inferred claims, failure modes (OCR error, sampling gaps, chart number invention, deepfakes, token limits, locked files), decision trade-offs (summary vs extract vs compare, full read vs stratified sample, human-in-the-loop for high-stakes), quality and anti-hallucination guardrails, structured reports with limitations and confidence — for documents, images, video, and audio. Not a replacement for legal, medical, or forensic experts. Use when the user supplies or points to content to summarize, extract, compare, or audit with traceable evidence. Combine with business-analysis-pro for BRD-style outputs, security-pro for PII/secrets, data-analysis-pro for tabular math on extracted data, web-research-pro for external fact-check, image-processing-pro for raster prep, testing-pro for extraction-regression tests.

2026-05-092

router-pro

truongnat/skills

System skill for automatic request analysis, prompt optimization, and intelligent routing to skills, workflows, or templates. Instead of calling skills individually, this skill analyzes user input, researches and improves the prompt for clarity and accuracy, identifies relevant skills, workflows, or templates, and coordinates execution. Use this skill when the user provides a general request that needs automatic decomposition, when the prompt needs optimization for better AI understanding, when appropriate skills or workflows need to be identified and executed, or when a template is needed for reports, issues, or other structured outputs. This is a **system skill** - it does not perform domain-specific work but routes to and coordinates **working skills** (chosen using **stack context** — see Stack context resolution; e.g. flutter-pro vs react-pro), **workflows** (/ticket, /debug, /release, etc.), and **templates** (reports, issues, prompts, etc.). Triggers: "route", "analyze", "plan", "break down", "how s

2026-05-092

data-engineering-pro

truongnat/skills

Expert Data Engineering development covering ETL/ELT pipelines, distributed processing (Spark, Flink), message queues (Kafka), and data warehouse architecture (Snowflake, BigQuery).

2026-05-092

machine-learning-pro

truongnat/skills

Expert Machine Learning development covering Deep Learning, PyTorch/TensorFlow, Model Fine-tuning, NLP, and Computer Vision.

2026-05-092

spring-boot-pro

truongnat/skills

Expert Spring Boot development covering REST APIs, Spring Data JPA, Dependency Injection, Security, and Microservices architecture.

2026-05-092

android-pro

truongnat/skills

Expert Android development covering Kotlin, Jetpack Compose, Coroutines, Flow, and modern architecture patterns (MVVM, MVI).

2026-05-092

Source

truongnat

truongnat/skills

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Data ScientistsComputer and Mathematical Occupations15-2051L4

name	data-science-pro
description	Expert Data Science development covering statistical analysis, Exploratory Data Analysis (EDA), machine learning (Scikit-Learn), and data visualization.
metadata	{"short-description":"Data Science — EDA, Pandas, Stats, Scikit-Learn, Visualization","content-language":"en","domain":"data-ai","level":"professional"}

Data Science Pro

Expert-level orchestration of analytical workflows and statistical modeling. Focuses on extracting actionable insights from data through rigorous analysis and scientific methods.

Boundary

When to use

Performing Exploratory Data Analysis (EDA) on a new dataset.
Designing and analyzing A/B tests to validate product changes.
Building predictive models (Classification, Regression, Clustering) using traditional ML.
Creating comprehensive data visualizations to communicate findings to stakeholders.

Workflow

Problem Definition: Define the business question or hypothesis.
Data Acquisition & Cleaning: Gather data and handle missing values, outliers, and formats.
Exploratory Data Analysis (EDA): Understand distributions, correlations, and basic patterns.
Feature Engineering: Create new meaningful features from raw data.
Modeling & Evaluation: Train statistical or ML models and evaluate using appropriate metrics (F1-score, RMSE).
Communication: Present findings visually and document actionable business recommendations.

Operating principles

Garbage In, Garbage Out: The quality of your analysis depends entirely on the quality of your data cleaning.
Start Simple: Always start with a simple baseline model (e.g., Logistic Regression) before trying complex algorithms.
Explainability: In business contexts, an interpretable model is often more valuable than a slightly more accurate "black box".
Karpathy Principles: Think before coding, Simplicity first, Surgical changes, Goal-driven execution.

Suggested response format (STRICT)

Your response MUST follow this structure:

<Role>
Senior Data Scientist.
</Role>

<Methodology>
[Description of statistical approach or ML methodology]
</Methodology>

<Implementation>
[Data Science Artifact: Python/Pandas script, Jupyter Notebook snippet, or Model logic]
</Implementation>

<Verification>
[Validation plan: Cross-validation, P-value checks, or Visual checks]
</Verification>

Resources in this skill

Topic	Reference
Data Scientist Roadmap	roadmap.sh/data-scientist
Pandas Documentation	pandas.pydata.org/docs
Scikit-Learn	scikit-learn.org/stable
Statistical Learning (ISLR)	statlearning.com

Quick example

Methodology: Fill missing values and train a Random Forest Classifier.

import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Data Prep
df = pd.read_csv('data.csv')
df.fillna(df.median(), inplace=True)

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Modeling
clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)

# Evaluation
preds = clf.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, preds):.2f}")

Checklist before calling the skill done

Think Before Coding: Statistical assumptions and bias in the dataset analyzed.
Simplicity First: Simple models and clear visualizations prioritized.
Surgical Changes: Only applied necessary transformations or model tuning.
Goal-Driven Execution: Verified model performance against a holdout test set.
Data leakage prevented (e.g., scaling applied after train-test split).
Visualizations are accessible, labeled, and tell a clear story.
Statistical significance tested where appropriate.