원클릭으로
bulk-rna-seq-differential-expression-with-omicverse
// Bulk RNA-seq DEG pipeline: gene ID mapping, DESeq2 normalization, statistical testing, volcano plots, and pathway enrichment in OmicVerse.
// Bulk RNA-seq DEG pipeline: gene ID mapping, DESeq2 normalization, statistical testing, volcano plots, and pathway enrichment in OmicVerse.
Two adjacent LC-MS workflows on AnnData — (1) untargeted metabolomics with m/z-based peak annotation, mummichog pathway inference and adduct-ppm matching, and (2) lipidomics with LIPID MAPS shorthand parsing, lipid-class aggregation, and LION term enrichment. Use when converting `t_metabol_04_untargeted` or `t_metabol_05_lipidomics` into a reusable skill, when the input feature IDs encode `m/z`/`RT`, or when the var_names look like `PC 34:1` / `Cer d18:1/24:0` / `TAG 54:3`.
End-to-end bulk RNA-seq quantification with omicverse's alignment module — SRA download, fastp QC, two interchangeable quantification paths (STAR + featureCount, OR alignment-free kb-python with technology='BULK'), and wiring into `ov.bulk.pyDEG` DESeq2. Single-cell kb-python (10XV2/10XV3) is out of scope — use the `single-cell-kb-alignment` skill instead.
CellRank fate maps from RNA velocity. Combine VelocityKernel + ConnectivityKernel into a transition matrix, fit a GPCCA estimator, predict terminal states, and produce per-cell fate probabilities. Visualise with `ov.pl.branch_streamplot` and feed branch-resolved gene-trends into `ov.single.dynamic_features` / `ov.pl.dynamic_trends` / `ov.pl.dynamic_heatmap`. Use after RNA velocity is computed (scvelo / dynamo / latentvelo / graphvelo) and before reporting fate probabilities or marker dynamics.
Run OmicVerse single-cell NMF program discovery as a reusable, triggerable skill — both the classical Python `ov.single.cNMF` (consensus NMF with CPU/GPU factorization, K-selection, RFC labelling) and the Rust-backed `ov.single.NMF` (fast `nmf-rs` backend: dnmf default, Brunet-style K-selection with stability-drop auto-K, cNMF-style consensus heatmap, RFC labels). Use when fitting consensus NMF gene programs on single-cell AnnData, choosing K, building consensus, or converting normalized usage programs into hard cluster labels.
Monocle2-style single-cell trajectory analysis on AnnData via the `ov.single.Monocle` class - DDRTree pseudotime + branch detection + per-gene differential test + BEAM branch-dependent gene discovery, plus the unified `ov.pl.trajectory` / `ov.pl.trajectory_overlay` / `ov.pl.trajectory_tree` plotters and the shared pseudotime visualisations (`branch_streamplot`, `dynamic_heatmap`, `dynamic_trends`). Use when fitting a Monocle2 trajectory on an annotated AnnData, when deriving branch-aware gene trends with `dynamic_features`, or when reproducing `t_traj_monocle2`.
Run the OmicVerse sctour trajectory branch on raw-count single-cell AnnData. Use when adapting the scTour part of an OmicVerse trajectory notebook, or when you need sctour pseudotime, latent space, or vector-field outputs instead of the diffusion_map, slingshot, or palantir branches.
| name | bulk-rna-seq-differential-expression-with-omicverse |
| title | Bulk RNA-seq differential expression with omicverse |
| description | Bulk RNA-seq DEG pipeline: gene ID mapping, DESeq2 normalization, statistical testing, volcano plots, and pathway enrichment in OmicVerse. |
Follow this skill to run the end-to-end differential expression (DEG) workflow showcased in t_deg.ipynb. It assumes the user provides a raw gene-level count matrix (e.g., from featureCounts) and wants to analyse bulk RNA-seq cohorts inside omicverse.
omicverse as ov, scanpy as sc, and matplotlib.pyplot as plt.ov.style() so downstream plots adopt omicverse styling.ov.utils.download_geneid_annotation_pair() and store them under genesets/.ov.pd.read_csv(..., sep='\t', header=1, index_col=0)..bam segments from column names using list comprehension so sample IDs are clean.ov.bulk.Matrix_ID_mapping(counts_df, 'genesets/pair_<GENOME>.tsv') to replace gene_id entries with gene symbols.dds = ov.bulk.pyDEG(mapped_counts).dds.drop_duplicates_index() to keep the highest expressed version.dds.normalize() to calculate DESeq2 size factors, correcting for library size and batch differences.dds.deg_analysis(treatment_groups, control_groups, method='ttest') for the default Welch t-test.method='edgepy' for edgeR-like tests and method='limma' for limma-style modelling.dds.result.loc[dds.result['log2(BaseMean)'] > 1] when needed.dds.foldchange_set(fc_threshold=-1, pval_threshold=0.05, logp_max=6) (fc_threshold=-1 auto-selects based on log2FC distribution).dds.plot_volcano(title=..., figsize=..., plot_genes=... or plot_genes_num=...) to highlight key genes.dds.plot_boxplot(genes=[...], treatment_groups=..., control_groups=..., figsize=..., legend_bbox=...); adjust y-axis tick labels if required.ov.utils.download_pathway_database().ov.utils.geneset_prepare(<path>, organism='Mouse'|'Human'|...).dds.result.loc[dds.result['sig'] != 'normal'].index.ov.bulk.geneset_enrichment(gene_list=deg_genes, pathways_dict=..., pvalue_type='auto', organism=...). Encourage users without internet access to provide a background gene list.ov.bulk.geneset_plot(...) and combine multiple ontologies using ov.bulk.geneset_plot_multi(enr_dict, colors_dict, num=...).dds.result and enrichment tables to CSV for downstream reporting.plt.savefig(...)) when running outside notebooks.# Before DEG: verify treatment/control groups exist as column names
all_cols = set(dds.result.columns) if hasattr(dds, 'result') else set(counts_df.columns)
for g in treatment_groups + control_groups:
assert g in all_cols, f"Sample '{g}' not found in count matrix columns"
# Verify groups don't overlap
assert not set(treatment_groups) & set(control_groups), "Treatment and control groups must not overlap"
treatment_groups/control_groups exactly match column names post-cleanup.omicverse, pyComplexHeatmap, gseapy) are installed for enrichment visualisations.t_deg.ipynbreference.md