تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

ml-analysis

Name: Ml Analysis
Author: alonegg

// End-to-end machine learning and deep learning analysis skill. Automatically detects data types, selects appropriate models, executes analysis, and delivers results. Use this skill whenever the user wants to: (1) Analyze data with ML/DL methods — classification, regression, clustering, anomaly detection, NLP, computer vision, time series, (2) Build or train models on their data, (3) Perform exploratory data analysis (EDA) with statistical rigor, (4) Use AutoML to find the best model, (5) Set up ML pipelines or MLOps workflows, (6) Evaluate, explain, or audit model performance, (7) Process or engineer features from raw data. Trigger on keywords like: predict, classify, cluster, train, model, regression, neural network, deep learning, feature engineering, hyperparameter, cross-validation, SHAP, EDA, anomaly detection, time series forecast, image classification, text classification, NLP, AutoML, deploy model, ML pipeline.

تشغيل في Manus

$ git log --oneline --stat

stars:١

forks:٠

updated:١٥ أبريل ٢٠٢٦ في ٠٥:٢٧

مستكشف الملفات

18 ملفات

SKILL.md

readonly

related-skills.json

نفس المستودع

research-idea-workshop.md

from "alonegg/skills"

Use ONLY when the user explicitly invokes this skill (e.g. /research-idea-workshop) or asks by name to pressure-test, evaluate, sharpen, transform, incubate, or rescue a research idea. Provides adversarial evaluation with discrete verdicts (KILL/INCUBATE/REVISE/EXECUTE) and generative incubation via transformation moves and exploration tasks. Two evaluation modes — self (own idea) and advisor (someone else's idea you are advising on). Refuses continuous scores, soft validation, and proactive triggering.

2026-05-271

paper-review.md

from "alonegg/skills"

对本科毕业论文进行系统化、多轮、深度学术审阅。按照"全文理解→结构审查→段落精审→问题聚合→复审迭代"五阶段流程，识别段落类型并路由到6大专项规则（引言、综述、方法、分析、结论、事实），输出结构性诊断、段落级修改建议、修订任务单与复审报告。当用户提供论文、毕业论文、学位论文并要求审阅、评审、修改建议、审稿、评阅时，务必使用本技能。即使用户只说"帮我看看这篇论文"或"帮我审一下"或"给我修改建议"，也应触发本技能。审阅包含理论纵深分析：识别论文隐含前提、定位理论坐标、从多个学术框架重审论文站位，并将理论洞察转化为可操作的修改建议。当用户提供 .docx 格式论文并希望获得带批注的审阅结果时，同样必须使用本技能（配合 docx 技能完成文档操作）。

2026-04-151

paper-review-theoretical-depth.md

from "alonegg/skills"

理论纵深审查子规则。在 Phase 1 和 Phase 2 中被调用，用于识别论文隐含前提、定位理论坐标、执行多框架重审。所有理论洞察必须转化为可执行的修改建议，不得以抽象批评形式输出。

2026-03-301

docx.md

from "alonegg/skills"

Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of 'Word doc', 'word document', '.docx', or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a 'report', 'memo', 'letter', 'template', or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation.

2026-03-171

paper-review-analysis.md

from "alonegg/skills"

分析论证专项审阅子规则。当段落被识别为分析论证型时，由主 Skill 路由到此处。覆盖主张清晰度、证据匹配度、解释深度、理论关联、逻辑推进、边界意识、回应研究问题等 7 大审查维度。

2026-03-171

paper-review-conclusion.md

from "alonegg/skills"

结论与建议专项审阅子规则。当段落被识别为结论/建议型时，由主 Skill 路由到此处。覆盖回应研究问题、发现提炼、结论边界、建议可操作性、发现对应关系、局限意识、整体收束等 7 大审查维度。

2026-03-171

package.json

"author": "alonegg"

"repository": "alonegg/skills"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

علماء البياناتمهن الحاسوب والرياضيات15-2051L4

name

ml-analysis

description

End-to-end machine learning and deep learning analysis skill. Automatically detects data types, selects appropriate models, executes analysis, and delivers results. Use this skill whenever the user wants to: (1) Analyze data with ML/DL methods — classification, regression, clustering, anomaly detection, NLP, computer vision, time series, (2) Build or train models on their data, (3) Perform exploratory data analysis (EDA) with statistical rigor, (4) Use AutoML to find the best model, (5) Set up ML pipelines or MLOps workflows, (6) Evaluate, explain, or audit model performance, (7) Process or engineer features from raw data. Trigger on keywords like: predict, classify, cluster, train, model, regression, neural network, deep learning, feature engineering, hyperparameter, cross-validation, SHAP, EDA, anomaly detection, time series forecast, image classification, text classification, NLP, AutoML, deploy model, ML pipeline.

ML Analysis Skill

An end-to-end machine learning and deep learning analysis engine. This skill guides you through a five-stage pipeline — from raw data to deployed model — adapting its depth and complexity to the user's expertise and task scale.

Core Principles

Always start with data understanding — never jump to modeling before EDA
Baseline first — run the simplest viable model before anything complex
Prevent data leakage — all preprocessing must be fit on training data only
Right-size execution — run small tasks directly, generate code for large ones
Explain decisions — tell the user why you chose a specific approach
Adapt to user level — detect expertise from their language and adjust detail accordingly

User Level Detection

Pay attention to how the user describes their task:

Beginner signals: "I have some data", "can you predict", vague about metrics, no mention of validation
Intermediate signals: mentions train/test split, specific algorithms, knows their metric
Expert signals: discusses regularization, architecture choices, distribution assumptions, asks about specific hyperparameters

Adjust your responses:

Beginners: explain each step, use analogies, show visualizations, recommend safe defaults
Intermediate: explain key decisions, offer options, show trade-offs
Expert: be concise, offer advanced options, skip basics, discuss nuances

Execution Mode Decision

Before any computation, assess scale:

Factor	Direct Execution	Generate Code
Data size	<100MB, <100K rows	>100MB or >100K rows
Model complexity	sklearn, small NNs	Large transformers, distributed training
Training time	<5 minutes estimated	>5 minutes estimated
GPU required	No	Yes
User environment	Local Python available	Needs cloud/cluster

For borderline cases, prefer direct execution with sampling strategies.

The Five-Stage Pipeline

Stage 1: Perceive — Understand the Data and Task

Before anything else, understand what you're working with.

Step 1.1: Data Profiling

Run scripts/data_profiler.py on the user's data to get:

Data type detection (tabular / text / image / audio / mixed)
Shape, memory footprint, column types
Missing value rates per column
Cardinality of categorical features
Target variable distribution (if supervised task)
Basic statistics (mean, median, std, skew, kurtosis)

python <skill-path>/scripts/data_profiler.py <data-path> [--target <column>]

Step 1.2: Task Inference

From the user's description and data profile, determine:

Task type: classification, regression, clustering, anomaly detection, time series, NLP, CV, recommendation
Target variable (if supervised)
Evaluation metric (infer from task type if user doesn't specify)
Constraints: latency, interpretability, fairness requirements

Step 1.3: Route to Reference

Based on data type, read the appropriate reference:

Tabular data → references/structured-data.md
Text / Image / Audio → references/unstructured-data.md
Mixed → read both, design a multi-modal pipeline

Stage 2: Prepare — Data Engineering and Feature Engineering

Step 2.1: Data Cleaning

Handle missing values (strategy depends on mechanism: MCAR/MAR/MNAR)
Remove or flag duplicates
Fix data type issues (strings that should be numbers, etc.)
Handle outliers (detect with IQR/Z-score, decide to clip/remove/keep based on domain)

Step 2.2: Train/Test Split — DO THIS BEFORE ANY FITTING

This is critical to prevent data leakage:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
# All subsequent preprocessing is fit ONLY on X_train

For time series: use temporal split, never random split.

Step 2.3: Feature Engineering

Read references/structured-data.md or references/unstructured-data.md for domain-specific guidance.

General principles:

Encode categoricals: ordinal encoding for tree models, one-hot for linear models
Scale numerics: StandardScaler for linear models, not needed for tree models
Create interaction features only when domain knowledge suggests them
For text: TF-IDF for baselines, embeddings for deep learning
For images: use pretrained model features (transfer learning)

Step 2.4: Pipeline Construction

Always wrap preprocessing in sklearn Pipelines to prevent leakage:

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer

pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('model', model)
])

Stage 3: Model — Selection, Training, and Tuning

Step 3.1: Baseline Model

Always start with a simple baseline. This is non-negotiable:

Classification → LogisticRegression or DummyClassifier
Regression → Ridge or mean predictor
Clustering → K-Means
NLP → TF-IDF + LogisticRegression
CV → Pretrained ResNet + linear head

The baseline establishes a performance floor and often reveals data issues.

Step 3.2: Model Selection

Consult references/model-catalog.md for the decision tree. Key heuristics:

Tabular data → gradient boosting (LightGBM/XGBoost) almost always wins
Small tabular (<1K rows) → regularized linear models or small ensembles
Text classification → fine-tuned transformer if data >5K samples, else TF-IDF + classical ML
Image → transfer learning from pretrained CNN/ViT
Time series → start with statistical (ARIMA/Prophet), add ML if needed

Step 3.3: AutoML Option

When the user wants automatic model selection, or says "find the best model":

Read references/automl-guide.md
Recommend AutoGluon (most general), H2O (enterprise), or Optuna (flexible tuning)
Set appropriate time budget based on data size

Step 3.4: Hyperparameter Tuning

Use Optuna or sklearn's RandomizedSearchCV (not GridSearchCV — too slow)
For deep learning: learning rate is the most important hyperparameter
- Use learning rate finder (fast.ai style) when training neural networks
- Apply one-cycle learning rate policy
Always tune with cross-validation, not a single validation split

Step 3.5: Deep Learning Specifics

When using PyTorch/TensorFlow:

Start with a pretrained model when possible (transfer learning)
Use early stopping to prevent overfitting
Monitor both training and validation loss
For small datasets: heavy augmentation + dropout + weight decay
Architecture choice: consult references/model-catalog.md

Stage 4: Evaluate — Rigorous Assessment

Read references/evaluation-guide.md for full details. Summary:

Step 4.1: Metrics

Classification: accuracy (only if balanced), F1, precision, recall, AUC-ROC, AUC-PR
Regression: RMSE, MAE, R², MAPE
Clustering: silhouette score, calinski-harabasz, domain-specific metrics
Always report confidence intervals via cross-validation

Step 4.2: Diagnostic Plots

Learning curves (training size vs. score) — detect underfitting/overfitting
Validation curves (hyperparameter vs. score)
Confusion matrix heatmap (classification)
Residual plots (regression)
Calibration curves (probability calibration)

Step 4.3: Model Comparison

Build a comparison table:

| Model          | CV Mean ± Std | Train Score | Test Score | Fit Time |
|----------------|---------------|-------------|------------|----------|
| Baseline       | ...           | ...         | ...        | ...      |
| Random Forest  | ...           | ...         | ...        | ...      |
| LightGBM       | ...           | ...         | ...        | ...      |

Step 4.4: Explainability

SHAP values for feature importance (works for any model)
Partial Dependence Plots for key features
For deep learning: Grad-CAM (images), attention visualization (text)

Step 4.5: Fairness Audit

If sensitive attributes exist (gender, race, age), evaluate metrics per group
Check for disparate impact
Report and flag any significant performance gaps

Step 4.6: Error Analysis

Examine worst predictions — what do they have in common?
Check for systematic patterns in errors
Feed insights back to feature engineering if needed

Stage 5: Deliver — Output Results

Step 5.1: Determine Output Format

Task Complexity	Output
Quick question ("is this data normally distributed?")	Inline text + plot
Single analysis ("classify these customers")	Results + key visualizations + brief report
Full project ("build me a churn prediction system")	Complete project directory

Step 5.2: For Inline Results

Print key metrics clearly
Save plots as PNG files, display inline
Provide actionable interpretation

Step 5.3: For Notebook Output Run scripts/report_generator.py to create a Jupyter notebook with:

Markdown explanations between code cells (tutorial style, inspired by Hands-On ML)
All visualizations embedded
Reproducible: random seeds set, data paths relative

Step 5.4: For Full Project Generate the standard ML project structure:

project-name/
├── data/raw/              # Original data (never modified)
├── data/processed/        # Cleaned, feature-engineered data
├── notebooks/
│   ├── 01_eda.ipynb
│   ├── 02_modeling.ipynb
│   └── 03_evaluation.ipynb
├── src/
│   ├── data.py            # Data loading and preprocessing
│   ├── features.py        # Feature engineering
│   ├── model.py           # Model definition and training
│   └── evaluate.py        # Evaluation utilities
├── models/                # Saved model artifacts
├── reports/
│   ├── figures/           # Saved plots
│   └── metrics.json       # Final metrics
├── configs/
│   └── config.yaml        # Hyperparameters and settings
├── requirements.txt
├── Dockerfile             # (if deployment requested)
└── README.md

Step 5.5: For Deployment When the user wants to deploy, read references/mlops-deploy.md for:

Model serialization (joblib/pickle for sklearn, torch.save for PyTorch, ONNX for cross-framework)
REST API wrapping (FastAPI/Flask)
Docker containerization
Monitoring and drift detection

Data Engineering Tasks

When the user needs ETL pipelines, data quality management, or large-scale processing:

Read references/data-engineering.md
Design pipelines with clear extraction → transformation → loading stages
For large data: recommend Spark/Dask with code generation

Key Reminders

Never skip EDA — even if the user asks to "just train a model", at least run basic profiling
Always set random seeds — reproducibility is non-negotiable
Document assumptions — in comments or markdown cells
Version control data and models — suggest DVC for large artifacts
Test the pipeline end-to-end — before declaring victory, verify the full flow works

ml-analysis

المزيد من هذا المستودع

المزيد من هذا المستودع

ML Analysis Skill

Core Principles

User Level Detection

Execution Mode Decision

The Five-Stage Pipeline

Stage 1: Perceive — Understand the Data and Task

Stage 2: Prepare — Data Engineering and Feature Engineering

Stage 3: Model — Selection, Training, and Tuning

Stage 4: Evaluate — Rigorous Assessment

Stage 5: Deliver — Output Results

Data Engineering Tasks

Key Reminders

ML Analysis Skill

Core Principles

User Level Detection

Execution Mode Decision

The Five-Stage Pipeline

Stage 1: Perceive — Understand the Data and Task

Stage 2: Prepare — Data Engineering and Feature Engineering

Stage 3: Model — Selection, Training, and Tuning

Stage 4: Evaluate — Rigorous Assessment

Stage 5: Deliver — Output Results

Data Engineering Tasks

Key Reminders