一键在 Manus 中运行任何 Skill

$pwd:

elastic-net-feature-selection

Name: Elastic Net Feature Selection
Author: aipoch

// Use when selecting predictive genes or other molecular features from bulk expression matrices for binary case-vs-control classification with elastic net logistic regression, including coefficient path and cross-validation plots. Trigger keywords: elastic net, glmnet, feature selection, binary classification, lambda.min, lambda.1se. NOT for: survival/Cox modeling, multiclass outcomes, single-cell data, or non-expression tables.

在 Manus 中运行

$ git log --oneline --stat

stars:909

forks:57

updated:2026年5月28日 06:32

文件资源管理器

21 个文件

SKILL.md

readonly

related-skills.json

同仓库

bianque.md

from "aipoch/medical-research-skills"

Evidence-based medical knowledge and research mentor grounded in the Bian Que tradition. Covers clinical reasoning, diagnostic thinking (望闻问切), pharmacology, pathology, differential diagnosis, medical literature appraisal, and the philosophy of early intervention. Trigger whenever users ask about medicine, clinical science, drugs, disease mechanisms, diagnosis, lab interpretation, treatment comparison, or health sciences. Even without explicit research framing, trigger on any topic touching disease, therapeutics, or clinical decision-making. Part of the AIPOCH Medical Research Skill Hub.

2026-05-28909

batch-effect-correction.md

from "aipoch/medical-research-skills"

Use when correcting batch effects in merged bulk expression matrices with sample-level batch metadata while preserving biological group structure and generating before-and-after QC plots. NOT for: single-cell integration, raw FASTQ processing, differential expression without batch labels, or datasets without biological groups.

2026-05-28909

cerna-analysis.md

from "aipoch/medical-research-skills"

Use when building a ceRNA regulatory network from a key gene list by combining bundled miRNA-mRNA and miRNA-lncRNA database files, with flat-file CSV exports and PDF visualization in a single output directory. NOT for: differential expression, single-cell analysis, enrichment analysis, or workflows without a key gene list.

2026-05-28909

decision-curve-analysis.md

from "aipoch/medical-research-skills"

Use when evaluating the clinical utility of a binary prediction model from a single clinical CSV file by fitting a logistic decision-curve model, plotting decision and clinical-impact curves, and exporting summary outputs. NOT for: survival calibration, ROC-only discrimination analysis, nomogram construction, or time-to-event outcomes.

2026-05-28909

estimate-immune-score-analysis.md

from "aipoch/medical-research-skills"

Use this skill to compute ESTIMATE immune-related microenvironment scores from a bulk expression matrix, generate an ESTIMATE score heatmap, and optionally generate group-wise ESTIMATE score boxplots plus significance tables when a sample group file is supplied. Trigger keywords: ESTIMATE, immune score, stromal score, tumor microenvironment score. NOT for: immune cell deconvolution, single-cell analysis, differential expression, clinical diagnosis.

2026-05-28909

external-model-validation.md

from "aipoch/medical-research-skills"

Use when validating an existing prognostic risk signature on an external bulk expression cohort with survival outcomes, producing risk scores, Kaplan-Meier curves, risk distribution plots, heatmap, and time-dependent ROC curves. NOT for: model training, feature selection, nomogram construction, calibration analysis, or single-cell data.

2026-05-28909

package.json

"author": "aipoch"

"repository": "aipoch/medical-research-skills"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

数据科学家计算机与数学类职业15-2051L4

name

elastic-net-feature-selection

description

Use when selecting predictive genes or other molecular features from bulk expression matrices for binary case-vs-control classification with elastic net logistic regression, including coefficient path and cross-validation plots. Trigger keywords: elastic net, glmnet, feature selection, binary classification, lambda.min, lambda.1se. NOT for: survival/Cox modeling, multiclass outcomes, single-cell data, or non-expression tables.

Elastic Net Feature Selection

When to Use

Use this skill for binary case-vs-control classification on bulk expression matrices.
Use it when you need elastic net logistic regression feature selection, coefficient paths, and cv.glmnet-based lambda selection.
Use custom labels such as Tumor and Normal only when the group file still contains exactly two outcome levels.

Out of Scope

Survival or Cox modeling
Multiclass outcomes
Single-cell data
Non-expression tables

Out-of-scope enforcement:

If the group file contains any label outside the requested case_group and control_group, the command stops with SKILL_INVALID_DATA instead of silently dropping samples.
If either requested class is missing after validation, the command stops with SKILL_INVALID_DATA.

When to Read External Files

Situation	File to Read	Purpose
Need to understand alpha, lambda choice, or feature-selection behavior	`references/algorithm.md`	Elastic net logistic regression, penalty mixing, cross-validation, and coefficient selection assumptions
Need the authoritative executable entrypoint	`scripts/main.R`	Run: `Rscript scripts/main.R --input_file ... --group_file ... --output_dir ...`
Need parameter examples, smoke-test commands, or recorded local runs	`references/cli-guide.md`	Verified CLI examples for normal runs, conservative runs, and test-data runs
Need bundled sample inputs for a first run or regression test	`tests/data/`	Sample expression matrix, group file, and feature list
Encounter errors, warnings, or timeout issues	`references/troubleshooting.md`	Common failures, console warning interpretation, and recovery steps

Usage

Rscript scripts/main.R \
  --input_file ./expression_matrix.csv \
  --group_file ./groups.csv \
  --feature_file ./genes.csv \
  --case_group case \
  --control_group control \
  --alpha auto \
  --alpha_grid 0,0.25,0.5,0.75,1 \
  --nfolds 5 \
  --lambda_choice lambda.min \
  --standardize TRUE \
  --timeout_seconds 600 \
  --output_dir ./output/ \
  --seed 42

Arguments

Short	Long	Type	Default	Description
`-i`	`--input_file`	character	required	Expression matrix file (genes as rows, samples as columns)
`-g`	`--group_file`	character	required	Group information file with sample and group columns
`-f`	`--feature_file`	character	`NULL`	Optional feature list file; if omitted, all matrix rows are used
`-c`	`--case_group`	character	`case`	Positive class label in the group file
`-d`	`--control_group`	character	`control`	Negative class label in the group file
`-a`	`--alpha`	character	`0.5`	Elastic net mixing parameter: numeric `0`-`1`, or `auto` for CV-based selection
	`--alpha_grid`	character	`0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1`	Comma-separated alpha candidates evaluated when `alpha=auto`
`-n`	`--nfolds`	integer	`5`	Cross-validation fold count; automatically reduced if a class has fewer samples
`-l`	`--lambda_choice`	character	`lambda.min`	Coefficient extraction rule: `lambda.min` or `lambda.1se`
`-z`	`--standardize`	logical	`TRUE`	Standardize features inside `glmnet`
`-t`	`--timeout_seconds`	integer	`600`	Elapsed timeout limit in seconds
`-o`	`--output_dir`	character	`./output/`	Output directory
`-s`	`--seed`	integer	`42`	Random seed for reproducibility

Input Format

Expression Matrix (input_file)

Genes as rows, samples as columns, CSV format with gene IDs in the first column.

,Sample01,Sample02,Sample03
TNMD,0.0349,0.0533,1.3889
DPM1,4.8627,5.4208,5.6370

Group File (group_file)

CSV with sample IDs and binary group labels.

sample,group
Sample01,case
Sample02,control
Sample03,case

Feature File (feature_file)

Optional plain text or single-column CSV file with one feature per line.

TNMD
DPM1
SCYL3

Output Files

File	Description
`alpha_tuning.csv`	Cross-validated performance summary for each alpha candidate
`model_coefficients.csv`	Coefficients at the selected lambda, including intercept
`selected_features.csv`	Sparse selected features sorted by absolute effect size; written empty when the chosen `alpha` is `0` (ridge)
`feature_matrix.csv`	Sample-by-feature analysis matrix used for model fitting
`coefficient_path.pdf`	Coefficient trajectory plot across lambda values
`cv_curve.pdf`	Cross-validation error curve with `lambda.min` and `lambda.1se`
`session_info.txt`	R session and package version info

Workflow

Step 1: Validate Input

WHEN preparing input files for a first run or regression test, READ: tests/data/
Check file existence
Reject empty input files
Detect sample and group columns in the group file
Reject group files that contain labels outside the requested binary comparison
Validate sample matching between expression matrix and group file

Step 2: Prepare Modeling Matrix

Restrict samples to the requested case and control groups
Intersect the optional feature list with matrix row names
Build a sample-by-feature numeric matrix for glmnet
Drop zero-variance features before modeling

Step 3: Run Elastic Net

WHEN deciding between alpha, lambda.min, and lambda.1se, READ: references/algorithm.md
If alpha=auto, evaluate the candidate alpha_grid with the same cross-validation folds
Fit the regularization path with glmnet
Run cv.glmnet to estimate the optimal lambda
Extract coefficients at lambda.min or lambda.1se
Apply runtime timeout and capture non-fatal warnings

Step 4: Export Results

WHEN you need exact invocation patterns or output inspection commands, READ: references/cli-guide.md
Save tuning tables and selected features
Generate coefficient path and cross-validation plots
Record session information for reproducibility

Methods

Elastic Net Logistic Regression

Elastic net combines lasso (L1) and ridge (L2) penalties through alpha, enabling sparse feature selection while stabilizing correlated predictors.

Cross-Validation

cv.glmnet evaluates the lambda path and reports both lambda.min and the more conservative lambda.1se.

Automatic Alpha Selection

When alpha=auto, the skill reuses the same cross-validation folds across all values in alpha_grid, compares the minimum cross-validated error for each candidate, and selects the best alpha before reporting coefficients and lambda-based outputs.

If the chosen alpha is 0, the model is ridge rather than sparse elastic net. In that case, selected_features.csv is written empty to avoid mislabeling dense ridge coefficients as selected features; use model_coefficients.csv for coefficient ranking instead.

Feature Selection Rule

Selected features are the coefficients whose absolute value exceeds a small numerical tolerance at the chosen lambda, excluding the intercept term.

If the chosen alpha is 0, the workflow writes an empty selected_features.csv because ridge coefficients are dense by design and should not be mislabeled as sparse selected features.

Examples

Recommended First Run

Rscript scripts/main.R \
  -i expression_matrix.csv \
  -g groups.csv \
  -f genes.csv \
  -a auto \
  --alpha_grid 0,0.25,0.5,0.75,1 \
  -o output/first_run

Fixed-Alpha Baseline

Rscript scripts/main.R \
  -i expression_matrix.csv \
  -g groups.csv \
  -f genes.csv \
  -a 0.5 \
  -o output/fixed_alpha

More Conservative Selection

Rscript scripts/main.R \
  -i expression_matrix.csv \
  -g groups.csv \
  -l lambda.1se \
  -o output/lambda_1se

Error Handling

Common Errors

Error	Cause	Solution	Read More
`SKILL_FILE_NOT_FOUND`	Input file does not exist	Check file path and permissions	`references/troubleshooting.md#skill_file_not_found`
`SKILL_EMPTY_DATA`	Input file exists but is empty	Re-export the input file with data rows	`references/troubleshooting.md#skill_empty_data`
`SKILL_MISSING_COLUMNS`	Group file lacks sample/group columns	Verify the group file structure	`references/troubleshooting.md#skill_missing_columns`
`SKILL_SAMPLE_MISMATCH`	Sample IDs do not overlap between files	Ensure matrix column names match the group file	`references/troubleshooting.md#skill_sample_mismatch`
`SKILL_INVALID_PARAMETER`	CLI parameter is invalid	Check allowed values and ranges	`references/troubleshooting.md#skill_invalid_parameter`
`SKILL_INVALID_DATA`	Too few samples or usable features remain	Review filtering choices and input data	`references/troubleshooting.md#skill_invalid_data`
`SKILL_DEPENDENCY_MISSING`	Required R package is not installed	Install missing packages before rerunning	`references/troubleshooting.md#skill_dependency_missing`
`SKILL_PKG_VERSION`	Installed package is too old	Upgrade the required package	`references/troubleshooting.md#skill_pkg_version`
`SKILL_TIMEOUT`	Run exceeded the configured time limit	Increase `timeout_seconds` or reduce data size	`references/troubleshooting.md#skill_timeout`
`SKILL_RUNTIME_ERROR`	An unexpected runtime or output-write failure occurred	Check output path permissions, free space, and the last console message	`references/troubleshooting.md#skill_runtime_error`

IF error persists, READ: references/troubleshooting.md

Testing

Test with Sample Data

# Check help
Rscript scripts/main.R --help

# Run with bundled test data
Rscript scripts/main.R \
  -i tests/data/expression_matrix.csv \
  -g tests/data/groups.csv \
  -f tests/data/genes.csv \
  -a auto \
  --alpha_grid 0,0.5,1 \
  -o tests/output \
  -n 5 \
  -t 600

Validation Commands

# Inspect selected features (may be header-only if auto-alpha selects ridge)
cat tests/output/selected_features.csv

# Check plots exist
ls -la tests/output

Implementation Checklist

Last updated: 2026-04-20 | Version: 1.0.0

elastic-net-feature-selection

同仓库更多 Skills

Elastic Net Feature Selection

When to Use

Out of Scope

When to Read External Files

Usage

Arguments

Input Format

Expression Matrix (input_file)

Group File (group_file)

Feature File (feature_file)

Output Files

Workflow

Step 1: Validate Input

Step 2: Prepare Modeling Matrix

Step 3: Run Elastic Net

Step 4: Export Results

Methods

Elastic Net Logistic Regression

Cross-Validation

Automatic Alpha Selection

Feature Selection Rule

Examples

Recommended First Run

Fixed-Alpha Baseline

More Conservative Selection

Error Handling

Common Errors

Testing

Test with Sample Data

Validation Commands

Implementation Checklist

Elastic Net Feature Selection

When to Use

Out of Scope

When to Read External Files

Usage

Arguments

Input Format

Expression Matrix (input_file)

Group File (group_file)

Feature File (feature_file)

Output Files

Workflow

Step 1: Validate Input

Step 2: Prepare Modeling Matrix

Step 3: Run Elastic Net

Step 4: Export Results

Methods

Elastic Net Logistic Regression

Cross-Validation

Automatic Alpha Selection

Feature Selection Rule

Examples

Recommended First Run

Fixed-Alpha Baseline

More Conservative Selection

Error Handling

Common Errors

Testing

Test with Sample Data

Validation Commands

Implementation Checklist

同仓库更多 Skills