Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

$pwd:

batch-effect-correction

Name: Batch Effect Correction
Author: aipoch

// Use when correcting batch effects in merged bulk expression matrices with sample-level batch metadata while preserving biological group structure and generating before-and-after QC plots. NOT for: single-cell integration, raw FASTQ processing, differential expression without batch labels, or datasets without biological groups.

Ejecutar en Manus

$ git log --oneline --stat

stars:909

forks:57

updated:28 de mayo de 2026, 06:32

Explorador de archivos

18 archivos

SKILL.md

readonly

related-skills.json

mismo repositorio

bianque.md

from "aipoch/medical-research-skills"

Evidence-based medical knowledge and research mentor grounded in the Bian Que tradition. Covers clinical reasoning, diagnostic thinking (望闻问切), pharmacology, pathology, differential diagnosis, medical literature appraisal, and the philosophy of early intervention. Trigger whenever users ask about medicine, clinical science, drugs, disease mechanisms, diagnosis, lab interpretation, treatment comparison, or health sciences. Even without explicit research framing, trigger on any topic touching disease, therapeutics, or clinical decision-making. Part of the AIPOCH Medical Research Skill Hub.

2026-05-28909

cerna-analysis.md

from "aipoch/medical-research-skills"

Use when building a ceRNA regulatory network from a key gene list by combining bundled miRNA-mRNA and miRNA-lncRNA database files, with flat-file CSV exports and PDF visualization in a single output directory. NOT for: differential expression, single-cell analysis, enrichment analysis, or workflows without a key gene list.

2026-05-28909

decision-curve-analysis.md

from "aipoch/medical-research-skills"

Use when evaluating the clinical utility of a binary prediction model from a single clinical CSV file by fitting a logistic decision-curve model, plotting decision and clinical-impact curves, and exporting summary outputs. NOT for: survival calibration, ROC-only discrimination analysis, nomogram construction, or time-to-event outcomes.

2026-05-28909

elastic-net-feature-selection.md

from "aipoch/medical-research-skills"

Use when selecting predictive genes or other molecular features from bulk expression matrices for binary case-vs-control classification with elastic net logistic regression, including coefficient path and cross-validation plots. Trigger keywords: elastic net, glmnet, feature selection, binary classification, lambda.min, lambda.1se. NOT for: survival/Cox modeling, multiclass outcomes, single-cell data, or non-expression tables.

2026-05-28909

estimate-immune-score-analysis.md

from "aipoch/medical-research-skills"

Use this skill to compute ESTIMATE immune-related microenvironment scores from a bulk expression matrix, generate an ESTIMATE score heatmap, and optionally generate group-wise ESTIMATE score boxplots plus significance tables when a sample group file is supplied. Trigger keywords: ESTIMATE, immune score, stromal score, tumor microenvironment score. NOT for: immune cell deconvolution, single-cell analysis, differential expression, clinical diagnosis.

2026-05-28909

external-model-validation.md

from "aipoch/medical-research-skills"

Use when validating an existing prognostic risk signature on an external bulk expression cohort with survival outcomes, producing risk scores, Kaplan-Meier curves, risk distribution plots, heatmap, and time-dependent ROC curves. NOT for: model training, feature selection, nomogram construction, calibration analysis, or single-cell data.

2026-05-28909

package.json

"author": "aipoch"

"repository": "aipoch/medical-research-skills"

Abrir repositorio de GitHub Ver repositorios del creador

$ install --global

$ download --local

Ejecutar en Manus

$ useful --forSOC

Científicos de datosOcupaciones informáticas y matemáticas15-2051L4

name	batch-effect-correction
description	Use when correcting batch effects in merged bulk expression matrices with sample-level batch metadata while preserving biological group structure and generating before-and-after QC plots. NOT for: single-cell integration, raw FASTQ processing, differential expression without batch labels, or datasets without biological groups.
license	MIT
skill-author	AIPOCH

Batch Effect Correction

Prerequisites

Run the following before the first analysis to install all required R packages:

Rscript -e "if (!require('BiocManager', quietly=TRUE)) install.packages('BiocManager'); BiocManager::install(c('sva', 'limma')); install.packages('ggplot2', repos='https://cloud.r-project.org')"

Note: sva and limma are Bioconductor packages and require BiocManager for installation. ggplot2 is a standard CRAN package.

The skill cannot run until these packages are installed. In new or bare R environments, always run the prerequisite step first.

When to Read External Files

Situation	File to Read	Purpose
Need algorithm details	`references/algorithm.md`	ComBat workflow, assumptions, and QC logic
Need to run analysis	`scripts/main.R`	Execute: `Rscript scripts/main.R --input_file ... --group_file ...`
Encounter errors	`references/troubleshooting.md`	Common errors and solutions
Need CLI examples	`references/cli-guide.md`	Detailed CLI usage examples and baseline run record
Need test data	`tests/data/`	Sample input files for testing

Usage

Rscript scripts/main.R \
  --input_file ./expression_matrix.csv \
  --group_file ./sample_info.csv \
  --output_dir ./output/ \
  --batch_column batch \
  --group_column group \
  --sample_column sample \
  --log_transform auto \
  --timeout_seconds 600 \
  --seed 42

Arguments

Short	Long	Type	Default	Description
`-i`	`--input_file`	character	required	Expression matrix file (genes as rows, samples as columns)
`-g`	`--group_file`	character	required	Sample metadata file (sample ID, group, and batch columns)
`-o`	`--output_dir`	character	`./output/`	Output directory
`-b`	`--batch_column`	character	`batch`	Batch column name in metadata
`-c`	`--group_column`	character	`group`	Biological group column name in metadata
`-n`	`--sample_column`	character	`sample`	Sample ID column name in metadata
`-l`	`--log_transform`	character	`auto`	Log transform mode: `auto`, `yes`, `no`
`-t`	`--timeout_seconds`	integer	`600`	Elapsed time limit in seconds; use `0` to disable
`-s`	`--seed`	integer	`42`	Random seed for reproducibility

Input Format

Expression Matrix (input_file)

Genes as rows, samples as columns, CSV format with gene ID in the first column.

"","Sample01","Sample02","Sample03"
"GeneA",5.12,4.87,6.03
"GeneB",8.44,8.11,7.95

Requirements:

Gene IDs must be unique and non-empty
Sample column names must be unique and non-empty
Expression values must be numeric and finite
Extra expression-matrix sample columns not present in metadata are allowed and will be ignored with a warning

Sample Metadata (group_file)

CSV with sample ID, biological group, and batch columns.

"sample","group","batch"
"Sample01","Control","Batch1"
"Sample02","Case","Batch1"
"Sample03","Case","Batch2"

Requirements:

Sample IDs must be unique and non-empty
At least 2 biological groups are required
At least 2 batches are required
Each group and each batch must contain at least 2 samples
Metadata may describe a subset of expression-matrix samples; the analysis will keep only metadata-matched samples and warn about ignored expression columns

Output Files

File	Description
`corrected_expression_matrix.csv`	Batch-corrected expression matrix
`matched_sample_info.csv`	Standardized metadata used in the analysis
`batch_before_boxplot.pdf`	Sample distribution boxplot before correction
`batch_after_boxplot.pdf`	Sample distribution boxplot after correction
`batch_before_pca.pdf`	PCA scatter plot before correction with batch-colored points
`batch_after_pca.pdf`	PCA scatter plot after correction with batch-colored points
`batch_before_clustering.pdf`	Hierarchical clustering before correction
`batch_after_clustering.pdf`	Hierarchical clustering after correction
`session_info.txt`	R session and package version info

Workflow

Step 1: Validate Input

Check file existence and non-empty input files
Validate metadata column presence
Verify expression values are numeric and finite
Confirm at least 2 groups, 2 batches, and at least 2 samples per group/batch

Step 2: Align and Prepare Matrix

Reorder expression columns to match metadata sample order
Keep only metadata-matched samples; warn if the expression matrix contains extra samples absent from metadata
Decide whether log transformation is needed (auto, yes, or no)
Apply log2(x + 1) only when required

Step 3: Run Batch Correction

Build the design matrix with biological group information
Run sva::ComBat() to remove batch-driven variation
Preserve modeled biological group structure during correction

Step 4: Normalize and Export Results

Apply limma::normalizeBetweenArrays() after ComBat
Write the corrected matrix and matched metadata
Save before/after QC plots and session information

Methods

ComBat

Empirical Bayes batch-effect correction using sva::ComBat(). Recommended when merged bulk expression datasets contain known batch labels and at least two biological groups.

Log Transformation

Supports auto, yes, and no. The auto mode applies log2(x + 1) only when the matrix appears to be on a raw-like scale.

normalizeBetweenArrays

Post-correction normalization with limma::normalizeBetweenArrays() to reduce remaining cross-sample distribution differences.

QC Visualization

Generates paired boxplots, PCA scatter plots with conditional batch ellipses, and hierarchical clustering plots before and after correction to assess whether batch-driven structure is reduced.

Agent Response Contract

After a successful run, report:

Sample count retained after metadata matching and any subset filtering
Batch count and group count used in the ComBat design matrix
Log transformation applied (auto-detected, forced yes, or skipped)
QC assessment: describe whether before/after PCA plots show reduced batch clustering
Artifact paths: corrected_expression_matrix.csv, batch_after_pca.pdf, batch_after_clustering.pdf

Examples

Basic Usage

Rscript scripts/main.R \
  -i expression_matrix.csv \
  -g sample_info.csv \
  -o ./output

With Custom Metadata Columns

Rscript scripts/main.R \
  -i expression_matrix.csv \
  -g metadata.csv \
  -o ./output \
  -n sample_id \
  -c condition \
  -b platform_batch

Disable Log Transform and Timeout

Rscript scripts/main.R \
  -i expression_matrix.csv \
  -g sample_info.csv \
  -o ./output \
  -l no \
  -t 0 \
  -s 42

Error Handling

Common Errors

Error	Cause	Solution
`SKILL_FILE_NOT_FOUND`	Input file does not exist	Check file path
`SKILL_EMPTY_FILE`	Input file exists but contains no data	Recreate or re-export the file
`SKILL_MISSING_COLUMNS`	Metadata file is missing sample, group, or batch columns	Check header names or pass custom column names
`SKILL_SAMPLE_MISMATCH`	Metadata sample IDs do not match expression matrix columns	Verify sample names between files
`SKILL_INVALID_DATA`	Dataset fails minimum design checks (< 2 batches, < 2 groups, < 2 samples per batch/group)	Review group counts, batch counts, and ID validity
`SKILL_INVALID_TYPE`	Expression values are non-numeric or non-finite	Clean matrix values before running
`SKILL_TIMEOUT`	Run exceeded the configured time limit	Increase `--timeout_seconds` or set it to `0`
`SKILL_DEPENDENCY_MISSING`	Required R package is not installed	Install with: `Rscript -e "BiocManager::install(c('sva','limma')); install.packages('ggplot2')"`
`SKILL_RUNTIME_ERROR`	Runtime I/O or filesystem error occurred	Check read/write permissions and environment

IF error persists, READ: references/troubleshooting.md

Troubleshooting note: In environments where packages are not yet installed, SKILL_DEPENDENCY_MISSING will fire before file-validation or --help. Install dependencies first, then re-run to expose file-related errors or access --help.

Input Validation

This skill accepts:

A bulk RNA-seq or microarray expression matrix (CSV, genes as rows, samples as columns)
A sample metadata file (CSV) with sample ID, biological group, and batch columns; at least 2 batches and 2 biological groups are required

If the user's request does not involve batch effect correction on merged bulk expression matrices — for example, asking to integrate single-cell RNA-seq data, process raw FASTQ files, run differential expression without batch labels, or analyze datasets with only one batch — do not proceed with the workflow. Instead respond:

"Batch Effect Correction is designed to remove batch-driven variation from merged bulk expression matrices using ComBat, while preserving biological group structure. Your request appears to be outside this scope. Please provide a multi-batch expression matrix with sample-level batch metadata, or use a more appropriate tool for single-cell integration, differential expression, or raw sequencing processing."

Testing

Test with Sample Data

# Check help (requires packages installed)
Rscript scripts/main.R --help

# Run with bundled test data
Rscript scripts/main.R \
  -i tests/data/expression_matrix_merged.csv \
  -g tests/data/sample_info.csv \
  -o tests/output/

Validation Commands

# Check corrected matrix exists
ls -la tests/output/corrected_expression_matrix.csv

# Check matched metadata exists
ls -la tests/output/matched_sample_info.csv

# Check PCA output exists
ls -la tests/output/batch_after_pca.pdf

Implementation Checklist

Last updated: 2026-04-27 | Version: 1.1.0

batch-effect-correction

Más de este repositorio

Más de este repositorio

Batch Effect Correction

Prerequisites

When to Read External Files

Usage

Arguments

Input Format

Expression Matrix (input_file)

Sample Metadata (group_file)

Output Files

Workflow

Step 1: Validate Input

Step 2: Align and Prepare Matrix

Step 3: Run Batch Correction

Step 4: Normalize and Export Results

Methods

ComBat

Log Transformation

normalizeBetweenArrays

QC Visualization

Agent Response Contract

Examples

Basic Usage

With Custom Metadata Columns

Disable Log Transform and Timeout

Error Handling

Common Errors

Input Validation

Testing

Test with Sample Data

Validation Commands

Implementation Checklist

Batch Effect Correction

Prerequisites

When to Read External Files

Usage

Arguments

Input Format

Expression Matrix (input_file)

Sample Metadata (group_file)

Output Files

Workflow

Step 1: Validate Input

Step 2: Align and Prepare Matrix

Step 3: Run Batch Correction

Step 4: Normalize and Export Results

Methods

ComBat

Log Transformation

normalizeBetweenArrays

QC Visualization

Agent Response Contract

Examples

Basic Usage

With Custom Metadata Columns

Disable Log Transform and Timeout

Error Handling

Common Errors

Input Validation

Testing

Test with Sample Data

Validation Commands

Implementation Checklist