一键导入
bulk-rna-seq-differential-expression-with-omicverse
// Bulk RNA-seq DEG pipeline: gene ID mapping, DESeq2 normalization, statistical testing, volcano plots, and pathway enrichment in OmicVerse.
// Bulk RNA-seq DEG pipeline: gene ID mapping, DESeq2 normalization, statistical testing, volcano plots, and pathway enrichment in OmicVerse.
| name | bulk-rna-seq-differential-expression-with-omicverse |
| title | Bulk RNA-seq differential expression with omicverse |
| description | Bulk RNA-seq DEG pipeline: gene ID mapping, DESeq2 normalization, statistical testing, volcano plots, and pathway enrichment in OmicVerse. |
Follow this skill to run the end-to-end differential expression (DEG) workflow showcased in t_deg.ipynb. It assumes the user provides a raw gene-level count matrix (e.g., from featureCounts) and wants to analyse bulk RNA-seq cohorts inside omicverse.
omicverse as ov, scanpy as sc, and matplotlib.pyplot as plt.ov.plot_set() so downstream plots adopt omicverse styling.ov.utils.download_geneid_annotation_pair() and store them under genesets/.ov.pd.read_csv(..., sep='\t', header=1, index_col=0)..bam segments from column names using list comprehension so sample IDs are clean.ov.bulk.Matrix_ID_mapping(counts_df, 'genesets/pair_<GENOME>.tsv') to replace gene_id entries with gene symbols.dds = ov.bulk.pyDEG(mapped_counts).dds.drop_duplicates_index() to keep the highest expressed version.dds.normalize() to calculate DESeq2 size factors, correcting for library size and batch differences.dds.deg_analysis(treatment_groups, control_groups, method='ttest') for the default Welch t-test.method='edgepy' for edgeR-like tests and method='limma' for limma-style modelling.dds.result.loc[dds.result['log2(BaseMean)'] > 1] when needed.dds.foldchange_set(fc_threshold=-1, pval_threshold=0.05, logp_max=6) (fc_threshold=-1 auto-selects based on log2FC distribution).dds.plot_volcano(title=..., figsize=..., plot_genes=... or plot_genes_num=...) to highlight key genes.dds.plot_boxplot(genes=[...], treatment_groups=..., control_groups=..., figsize=..., legend_bbox=...); adjust y-axis tick labels if required.ov.utils.download_pathway_database().ov.utils.geneset_prepare(<path>, organism='Mouse'|'Human'|...).dds.result.loc[dds.result['sig'] != 'normal'].index.ov.bulk.geneset_enrichment(gene_list=deg_genes, pathways_dict=..., pvalue_type='auto', organism=...). Encourage users without internet access to provide a background gene list.ov.bulk.geneset_plot(...) and combine multiple ontologies using ov.bulk.geneset_plot_multi(enr_dict, colors_dict, num=...).dds.result and enrichment tables to CSV for downstream reporting.plt.savefig(...)) when running outside notebooks.# Before DEG: verify treatment/control groups exist as column names
all_cols = set(dds.result.columns) if hasattr(dds, 'result') else set(counts_df.columns)
for g in treatment_groups + control_groups:
assert g in all_cols, f"Sample '{g}' not found in count matrix columns"
# Verify groups don't overlap
assert not set(treatment_groups) & set(control_groups), "Treatment and control groups must not overlap"
treatment_groups/control_groups exactly match column names post-cleanup.omicverse, pyComplexHeatmap, gseapy) are installed for enrichment visualisations.t_deg.ipynbreference.mdBulk RNA-seq batch correction with pyComBat: remove batch effects from merged cohorts, export corrected matrices, and benchmark visualizations.
PyDESeq2 differential expression: ID mapping, DE testing, fold-change thresholding, and GSEA enrichment visualization in OmicVerse.
Turn bulk RNA-seq cohorts into synthetic single-cell datasets using omicverse's Bulk2Single workflow for cell fraction estimation, beta-VAE generation, and quality control comparisons against reference scRNA-seq.
Extend scRNA-seq developmental trajectories with BulkTrajBlend by generating intermediate cells from bulk RNA-seq, training beta-VAE and GNN models, and interpolating missing states.
WGCNA co-expression network: soft-threshold, module detection, eigengenes, hub genes, and trait correlation in OmicVerse.
Gene set enrichment analysis with correct geneset format handling. Critical guidance for loading pathway databases and running enrichment in OmicVerse.