with one click
bulk-rna-seq-batch-correction-with-combat
// Bulk RNA-seq batch correction with pyComBat: remove batch effects from merged cohorts, export corrected matrices, and benchmark visualizations.
// Bulk RNA-seq batch correction with pyComBat: remove batch effects from merged cohorts, export corrected matrices, and benchmark visualizations.
Bulk RNA-seq DEG pipeline: gene ID mapping, DESeq2 normalization, statistical testing, volcano plots, and pathway enrichment in OmicVerse.
PyDESeq2 differential expression: ID mapping, DE testing, fold-change thresholding, and GSEA enrichment visualization in OmicVerse.
Turn bulk RNA-seq cohorts into synthetic single-cell datasets using omicverse's Bulk2Single workflow for cell fraction estimation, beta-VAE generation, and quality control comparisons against reference scRNA-seq.
Extend scRNA-seq developmental trajectories with BulkTrajBlend by generating intermediate cells from bulk RNA-seq, training beta-VAE and GNN models, and interpolating missing states.
WGCNA co-expression network: soft-threshold, module detection, eigengenes, hub genes, and trait correlation in OmicVerse.
Gene set enrichment analysis with correct geneset format handling. Critical guidance for loading pathway databases and running enrichment in OmicVerse.
| name | bulk-rna-seq-batch-correction-with-combat |
| title | Bulk RNA-seq batch correction with ComBat |
| description | Bulk RNA-seq batch correction with pyComBat: remove batch effects from merged cohorts, export corrected matrices, and benchmark visualizations. |
Apply this skill when a user has multiple bulk expression matrices measured across different batches and needs to harmonise them
before downstream analysis. It follows t_bulk_combat.ipynb, w
hich demonstrates the pyComBat workflow on ovarian cancer microarray cohorts.
omicverse as ov, anndata, pandas as pd, and matplotlib.pyplot as plt.ov.ov_plot_set() (aliased ov.plot_set() in some releases) to align figures with omicverse styling.pd.read_pickle(...)/pd.read_csv(...).anndata.AnnData objects so adata.obs stores sample metadata.batch column for every cohort (adata.obs['batch'] = '1', '2', ...). Encourage descriptive labels when availa
ble.anndata.concat([adata1, adata2, adata3], merge='same') to retain the intersection of genes across batches.adata reports balanced sample counts per batch; if not, prompt users to re-check inputs.ov.bulk.batch_correction(adata, batch_key='batch').adata.layers['batch_correction'] while the original counts remain in adata.X.adata.to_df().T (raw) and adata.to_df(layer='batch_correction').T (corrected)..to_csv(...)) plus the harmonised AnnData (adata.write_h5ad('adata_batch.h5ad', compressio n='gzip')).ov.pl.red_color, blue_color, gree n_color palettes to match batches.adata.layers['raw'] = adata.X.copy() before PCA.ov.pp.pca(adata, layer='raw', n_pcs=50) and ov.pp.pca(adata, layer='batch_correction', n_pcs=50).ov.pl.embedding(..., basis='raw|original|X_pca', color='batch', frameon='small') and repeat fo
r the corrected layer to verify mixing.# Before ComBat: verify batch column exists and has >1 batch
assert 'batch' in adata.obs.columns, "adata.obs must contain a 'batch' column"
n_batches = adata.obs['batch'].nunique()
assert n_batches > 1, f"Only {n_batches} batch — need >1 for batch correction"
# Verify gene overlap after concatenation
if adata.n_vars < 100:
print(f"WARNING: Only {adata.n_vars} shared genes after concat — check gene ID harmonization")
batch_correction layer is missing, ensure the batch_key matches the column name in adata.obs.t_bulk_combat.ipynbreference.md