Run any Skill in Manus with one click

$pwd:

bulk-rna-seq-batch-correction-with-combat

Name: Bulk Rna Seq Batch Correction With Combat
Author: omicverse

// Bulk RNA-seq batch correction with pyComBat: remove batch effects from merged cohorts, export corrected matrices, and benchmark visualizations.

Run Skill in Manus

$ git log --oneline --stat

stars:29

forks:5

updated:March 12, 2026 at 11:02

File Explorer

2 files

SKILL.md

readonly

name	bulk-rna-seq-batch-correction-with-combat
title	Bulk RNA-seq batch correction with ComBat
description	Bulk RNA-seq batch correction with pyComBat: remove batch effects from merged cohorts, export corrected matrices, and benchmark visualizations.

Bulk RNA-seq batch correction with ComBat

Overview

Apply this skill when a user has multiple bulk expression matrices measured across different batches and needs to harmonise them before downstream analysis. It follows t_bulk_combat.ipynb, w hich demonstrates the pyComBat workflow on ovarian cancer microarray cohorts.

Instructions

Import core libraries
- Load omicverse as ov, anndata, pandas as pd, and matplotlib.pyplot as plt.
- Call ov.ov_plot_set() (aliased ov.plot_set() in some releases) to align figures with omicverse styling.
Load each batch separately
- Read the prepared pickled matrices (or user-provided expression tables) with pd.read_pickle(...)/pd.read_csv(...).
- Transpose to gene × sample before wrapping them in anndata.AnnData objects so adata.obs stores sample metadata.
- Assign a batch column for every cohort (adata.obs['batch'] = '1', '2', ...). Encourage descriptive labels when availa ble.
Concatenate on shared genes
- Use anndata.concat([adata1, adata2, adata3], merge='same') to retain the intersection of genes across batches.
- Confirm the combined adata reports balanced sample counts per batch; if not, prompt users to re-check inputs.
Run ComBat batch correction
- Execute ov.bulk.batch_correction(adata, batch_key='batch').
- Explain that corrected values are stored in adata.layers['batch_correction'] while the original counts remain in adata.X.
Export corrected and raw matrices
- Obtain DataFrames via adata.to_df().T (raw) and adata.to_df(layer='batch_correction').T (corrected).
- Encourage saving both tables (.to_csv(...)) plus the harmonised AnnData (adata.write_h5ad('adata_batch.h5ad', compressio n='gzip')).
Benchmark the correction
- For per-sample variance checks, draw before/after boxplots and recolour boxes using ov.pl.red_color, blue_color, gree n_color palettes to match batches.
- Copy raw counts to a named layer with adata.layers['raw'] = adata.X.copy() before PCA.
- Run ov.pp.pca(adata, layer='raw', n_pcs=50) and ov.pp.pca(adata, layer='batch_correction', n_pcs=50).
- Visualise embeddings with ov.pl.embedding(..., basis='raw|original|X_pca', color='batch', frameon='small') and repeat fo r the corrected layer to verify mixing.

Defensive validation

# Before ComBat: verify batch column exists and has >1 batch
assert 'batch' in adata.obs.columns, "adata.obs must contain a 'batch' column"
n_batches = adata.obs['batch'].nunique()
assert n_batches > 1, f"Only {n_batches} batch — need >1 for batch correction"
# Verify gene overlap after concatenation
if adata.n_vars < 100:
    print(f"WARNING: Only {adata.n_vars} shared genes after concat — check gene ID harmonization")

Troubleshooting tips
- Mismatched gene identifiers cause dropped features—remind users to harmonise feature names (e.g., gene symbols) before conca tenation.
- pyComBat expects log-scale intensities or similarly distributed counts; recommend log-transforming strongly skewed matrices.
- If batch_correction layer is missing, ensure the batch_key matches the column name in adata.obs.

Examples

"Combine three GEO ovarian cohorts, run ComBat, and export both the raw and corrected CSV matrices."
"Plot PCA embeddings before and after batch correction to confirm that batches 1–3 overlap."
"Save the harmonised AnnData file so I can reload it later for downstream DEG analysis."

References

Tutorial notebook: t_bulk_combat.ipynb
Example inputs: see the tutorial page for example batch-correction inputs: t_bulk_combat
Quick copy/paste commands: reference.md

related-skills.json

same repository

bulk-rna-seq-differential-expression-with-omicverse.md

from "omicverse/omicclaw"

Bulk RNA-seq DEG pipeline: gene ID mapping, DESeq2 normalization, statistical testing, volcano plots, and pathway enrichment in OmicVerse.

2026-03-1229

bulk-rna-seq-deseq2-analysis-with-omicverse.md

from "omicverse/omicclaw"

PyDESeq2 differential expression: ID mapping, DE testing, fold-change thresholding, and GSEA enrichment visualization in OmicVerse.

2026-03-1229

bulk-rna-seq-deconvolution-with-bulk2single.md

from "omicverse/omicclaw"

Turn bulk RNA-seq cohorts into synthetic single-cell datasets using omicverse's Bulk2Single workflow for cell fraction estimation, beta-VAE generation, and quality control comparisons against reference scRNA-seq.

2026-03-1229

bulktrajblend-trajectory-interpolation.md

from "omicverse/omicclaw"

Extend scRNA-seq developmental trajectories with BulkTrajBlend by generating intermediate cells from bulk RNA-seq, training beta-VAE and GNN models, and interpolating missing states.

2026-03-1229

bulk-wgcna-analysis-with-omicverse.md

from "omicverse/omicclaw"

WGCNA co-expression network: soft-threshold, module detection, eigengenes, hub genes, and trait correlation in OmicVerse.

2026-03-1229

gsea-enrichment-analysis.md

from "omicverse/omicclaw"

Gene set enrichment analysis with correct geneset format handling. Critical guidance for loading pathway databases and running enrichment in OmicVerse.

2026-03-1229

package.json

"author": "omicverse"

"repository": "omicverse/omicclaw"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Biological Scientists, All OtherLife, Physical, and Social Science Occupations19-1029L4

name	bulk-rna-seq-batch-correction-with-combat
title	Bulk RNA-seq batch correction with ComBat
description	Bulk RNA-seq batch correction with pyComBat: remove batch effects from merged cohorts, export corrected matrices, and benchmark visualizations.

Bulk RNA-seq batch correction with ComBat

Overview

Instructions

Import core libraries
- Load omicverse as ov, anndata, pandas as pd, and matplotlib.pyplot as plt.
- Call ov.ov_plot_set() (aliased ov.plot_set() in some releases) to align figures with omicverse styling.
Load each batch separately
- Read the prepared pickled matrices (or user-provided expression tables) with pd.read_pickle(...)/pd.read_csv(...).
- Transpose to gene × sample before wrapping them in anndata.AnnData objects so adata.obs stores sample metadata.
- Assign a batch column for every cohort (adata.obs['batch'] = '1', '2', ...). Encourage descriptive labels when availa ble.
Concatenate on shared genes
- Use anndata.concat([adata1, adata2, adata3], merge='same') to retain the intersection of genes across batches.
- Confirm the combined adata reports balanced sample counts per batch; if not, prompt users to re-check inputs.
Run ComBat batch correction
- Execute ov.bulk.batch_correction(adata, batch_key='batch').
- Explain that corrected values are stored in adata.layers['batch_correction'] while the original counts remain in adata.X.
Export corrected and raw matrices
- Obtain DataFrames via adata.to_df().T (raw) and adata.to_df(layer='batch_correction').T (corrected).
- Encourage saving both tables (.to_csv(...)) plus the harmonised AnnData (adata.write_h5ad('adata_batch.h5ad', compressio n='gzip')).
Benchmark the correction
- For per-sample variance checks, draw before/after boxplots and recolour boxes using ov.pl.red_color, blue_color, gree n_color palettes to match batches.
- Copy raw counts to a named layer with adata.layers['raw'] = adata.X.copy() before PCA.
- Run ov.pp.pca(adata, layer='raw', n_pcs=50) and ov.pp.pca(adata, layer='batch_correction', n_pcs=50).
- Visualise embeddings with ov.pl.embedding(..., basis='raw|original|X_pca', color='batch', frameon='small') and repeat fo r the corrected layer to verify mixing.

Defensive validation

# Before ComBat: verify batch column exists and has >1 batch
assert 'batch' in adata.obs.columns, "adata.obs must contain a 'batch' column"
n_batches = adata.obs['batch'].nunique()
assert n_batches > 1, f"Only {n_batches} batch — need >1 for batch correction"
# Verify gene overlap after concatenation
if adata.n_vars < 100:
    print(f"WARNING: Only {adata.n_vars} shared genes after concat — check gene ID harmonization")

Troubleshooting tips
- Mismatched gene identifiers cause dropped features—remind users to harmonise feature names (e.g., gene symbols) before conca tenation.
- pyComBat expects log-scale intensities or similarly distributed counts; recommend log-transforming strongly skewed matrices.
- If batch_correction layer is missing, ensure the batch_key matches the column name in adata.obs.

Examples

"Combine three GEO ovarian cohorts, run ComBat, and export both the raw and corrected CSV matrices."
"Plot PCA embeddings before and after batch correction to confirm that batches 1–3 overlap."
"Save the harmonised AnnData file so I can reload it later for downstream DEG analysis."

References

Tutorial notebook: t_bulk_combat.ipynb
Example inputs: see the tutorial page for example batch-correction inputs: t_bulk_combat
Quick copy/paste commands: reference.md

bulk-rna-seq-batch-correction-with-combat

Bulk RNA-seq batch correction with ComBat

Overview

Instructions

Examples

References

More from this repository

More from this repository

Bulk RNA-seq batch correction with ComBat

Overview

Instructions

Examples

References