// Guide Claude through omicverse's bulk RNA-seq DEG pipeline, from gene ID mapping and DESeq2 normalization to statistical testing, visualization, and pathway enrichment. Use when a user has bulk count matrices and needs differential expression analysis in omicverse.
| name | bulk-rna-seq-differential-expression-with-omicverse |
| title | Bulk RNA-seq differential expression with omicverse |
| description | Guide Claude through omicverse's bulk RNA-seq DEG pipeline, from gene ID mapping and DESeq2 normalization to statistical testing, visualization, and pathway enrichment. Use when a user has bulk count matrices and needs differential expression analysis in omicverse. |
Follow this skill to run the end-to-end differential expression (DEG) workflow showcased in t_deg.ipynb. It assumes the user provides a raw gene-level count matrix (e.g., from featureCounts) and wants to analyse bulk RNA-seq cohorts inside omicverse.
omicverse as ov, scanpy as sc, and matplotlib.pyplot as plt.ov.plot_set() so downstream plots adopt omicverse styling.ov.utils.download_geneid_annotation_pair() and store them under genesets/.ov.pd.read_csv(..., sep='\t', header=1, index_col=0)..bam segments from column names using list comprehension so sample IDs are clean.ov.bulk.Matrix_ID_mapping(counts_df, 'genesets/pair_<GENOME>.tsv') to replace gene_id entries with gene symbols.dds = ov.bulk.pyDEG(mapped_counts).dds.drop_duplicates_index() to keep the highest expressed version.dds.normalize() to calculate DESeq2 size factors, correcting for library size and batch differences.dds.deg_analysis(treatment_groups, control_groups, method='ttest') for the default Welch t-test.method='edgepy' for edgeR-like tests and method='limma' for limma-style modelling.dds.result.loc[dds.result['log2(BaseMean)'] > 1] when needed.dds.foldchange_set(fc_threshold=-1, pval_threshold=0.05, logp_max=6) (fc_threshold=-1 auto-selects based on log2FC distribution).dds.plot_volcano(title=..., figsize=..., plot_genes=... or plot_genes_num=...) to highlight key genes.dds.plot_boxplot(genes=[...], treatment_groups=..., control_groups=..., figsize=..., legend_bbox=...); adjust y-axis tick labels if required.ov.utils.download_pathway_database().ov.utils.geneset_prepare(<path>, organism='Mouse'|'Human'|...).dds.result.loc[dds.result['sig'] != 'normal'].index.ov.bulk.geneset_enrichment(gene_list=deg_genes, pathways_dict=..., pvalue_type='auto', organism=...). Encourage users without internet access to provide a background gene list.ov.bulk.geneset_plot(...) and combine multiple ontologies using ov.bulk.geneset_plot_multi(enr_dict, colors_dict, num=...).dds.result and enrichment tables to CSV for downstream reporting.plt.savefig(...)) when running outside notebooks.treatment_groups/control_groups exactly match column names post-cleanup.omicverse, pyComplexHeatmap, gseapy) are installed for enrichment visualisations.t_deg.ipynbsample/counts.txtreference.md