원클릭으로 Manus에서 모든 스킬 실행

$pwd:

bulk-rna-seq-differential-expression-with-omicverse

Name: Bulk Rna Seq Differential Expression With Omicverse
Author: omicverse

// Bulk RNA-seq DEG pipeline: gene ID mapping, DESeq2 normalization, statistical testing, volcano plots, and pathway enrichment in OmicVerse.

Manus에서 실행

$ git log --oneline --stat

stars:4

forks:1

updated:2026년 5월 11일 10:55

파일 탐색기

2 개 파일

SKILL.md

readonly

name	bulk-rna-seq-differential-expression-with-omicverse
title	Bulk RNA-seq differential expression with omicverse
description	Bulk RNA-seq DEG pipeline: gene ID mapping, DESeq2 normalization, statistical testing, volcano plots, and pathway enrichment in OmicVerse.

Bulk RNA-seq differential expression with omicverse

Overview

Follow this skill to run the end-to-end differential expression (DEG) workflow showcased in t_deg.ipynb. It assumes the user provides a raw gene-level count matrix (e.g., from featureCounts) and wants to analyse bulk RNA-seq cohorts inside omicverse.

Instructions

Set up the session
- Import omicverse as ov, scanpy as sc, and matplotlib.pyplot as plt.
- Call ov.style() so downstream plots adopt omicverse styling.
Prepare ID mapping assets
- When gene IDs must be converted to gene symbols, instruct the user to download mapping pairs via ov.utils.download_geneid_annotation_pair() and store them under genesets/.
- Mention the available prebuilt genomes (T2T-CHM13, GRCh38, GRCh37, GRCm39, danRer7, danRer11) and that users can generate their own mapping from GTF files if needed.
Load the raw counts
- Read tab-delimited featureCounts output with ov.pd.read_csv(..., sep='\t', header=1, index_col=0).
- Strip trailing .bam segments from column names using list comprehension so sample IDs are clean.
Map gene identifiers(optional)
- Run ov.bulk.Matrix_ID_mapping(counts_df, 'genesets/pair_<GENOME>.tsv') to replace gene_id entries with gene symbols.
Initialise the DEG object
- Create dds = ov.bulk.pyDEG(mapped_counts).
- Handle duplicate gene symbols with dds.drop_duplicates_index() to keep the highest expressed version.
Normalise and estimate size factors
- Execute dds.normalize() to calculate DESeq2 size factors, correcting for library size and batch differences.
Run differential testing
- Collect treatment and control replicate labels into lists.
- Call dds.deg_analysis(treatment_groups, control_groups, method='ttest') for the default Welch t-test.
- Offer optional alternatives: method='edgepy' for edgeR-like tests and method='limma' for limma-style modelling.
Filter and threshold results
- Note that lowly expressed genes are retained by default; filter using dds.result.loc[dds.result['log2(BaseMean)'] > 1] when needed.
- Set dynamic fold-change and significance cutoffs via dds.foldchange_set(fc_threshold=-1, pval_threshold=0.05, logp_max=6) (fc_threshold=-1 auto-selects based on log2FC distribution).
Visualise differential expression
- Produce volcano plots with dds.plot_volcano(title=..., figsize=..., plot_genes=... or plot_genes_num=...) to highlight key genes.
- Generate per-gene boxplots using dds.plot_boxplot(genes=[...], treatment_groups=..., control_groups=..., figsize=..., legend_bbox=...); adjust y-axis tick labels if required.
Perform pathway enrichment (optional)
- Download curated pathway libraries through ov.utils.download_pathway_database().
- Load genesets with ov.utils.geneset_prepare(<path>, organism='Mouse'|'Human'|...).
- Build the DEG gene list from dds.result.loc[dds.result['sig'] != 'normal'].index.
- Run enrichment with ov.bulk.geneset_enrichment(gene_list=deg_genes, pathways_dict=..., pvalue_type='auto', organism=...). Encourage users without internet access to provide a background gene list.
- Visualise single-library results via ov.bulk.geneset_plot(...) and combine multiple ontologies using ov.bulk.geneset_plot_multi(enr_dict, colors_dict, num=...).
Document outputs
- Suggest exporting dds.result and enrichment tables to CSV for downstream reporting.
- Encourage users to save figures generated by matplotlib (plt.savefig(...)) when running outside notebooks.

Defensive validation

# Before DEG: verify treatment/control groups exist as column names
all_cols = set(dds.result.columns) if hasattr(dds, 'result') else set(counts_df.columns)
for g in treatment_groups + control_groups:
    assert g in all_cols, f"Sample '{g}' not found in count matrix columns"
# Verify groups don't overlap
assert not set(treatment_groups) & set(control_groups), "Treatment and control groups must not overlap"

Troubleshooting tips
- Ensure sample labels in treatment_groups/control_groups exactly match column names post-cleanup.
- Verify required packages (omicverse, pyComplexHeatmap, gseapy) are installed for enrichment visualisations.
- Remind users that internet access is required the first time they download gene mappings or pathway databases.

Examples

"I have a featureCounts matrix for mouse tumour samples—normalize it with DESeq2, run t-test DEG, and highlight the top 8 genes in a volcano plot."
"Use omicverse to compute edgeR-style differential expression between treated and control replicates, then run GO enrichment on significant genes."
"Guide me through converting Ensembl IDs to symbols, performing limma DEG, and plotting boxplots for Krtap9-5 and Lef1."

References

Detailed walkthrough notebook: t_deg.ipynb
Sample count matrix example: see the tutorial page for downloadable example inputs: t_deg
Quick copy/paste commands: reference.md

related-skills.json

같은 저장소

omicverse-bulk-metabol-untargeted-lipidomics.md

from "omicverse/omicverse-skills"

Two adjacent LC-MS workflows on AnnData — (1) untargeted metabolomics with m/z-based peak annotation, mummichog pathway inference and adduct-ppm matching, and (2) lipidomics with LIPID MAPS shorthand parsing, lipid-class aggregation, and LION term enrichment. Use when converting `t_metabol_04_untargeted` or `t_metabol_05_lipidomics` into a reusable skill, when the input feature IDs encode `m/z`/`RT`, or when the var_names look like `PC 34:1` / `Cer d18:1/24:0` / `TAG 54:3`.

2026-05-164

bulk-fastq-quantification.md

from "omicverse/omicverse-skills"

End-to-end bulk RNA-seq quantification with omicverse's alignment module — SRA download, fastp QC, two interchangeable quantification paths (STAR + featureCount, OR alignment-free kb-python with technology='BULK'), and wiring into `ov.bulk.pyDEG` DESeq2. Single-cell kb-python (10XV2/10XV3) is out of scope — use the `single-cell-kb-alignment` skill instead.

2026-05-114

omicverse-single-cell-cellrank-fate.md

from "omicverse/omicverse-skills"

CellRank fate maps from RNA velocity. Combine VelocityKernel + ConnectivityKernel into a transition matrix, fit a GPCCA estimator, predict terminal states, and produce per-cell fate probabilities. Visualise with `ov.pl.branch_streamplot` and feed branch-resolved gene-trends into `ov.single.dynamic_features` / `ov.pl.dynamic_trends` / `ov.pl.dynamic_heatmap`. Use after RNA velocity is computed (scvelo / dynamo / latentvelo / graphvelo) and before reporting fate probabilities or marker dynamics.

2026-05-114

omicverse-single-cell-cnmf-program-discovery.md

from "omicverse/omicverse-skills"

Run OmicVerse single-cell NMF program discovery as a reusable, triggerable skill — both the classical Python `ov.single.cNMF` (consensus NMF with CPU/GPU factorization, K-selection, RFC labelling) and the Rust-backed `ov.single.NMF` (fast `nmf-rs` backend: dnmf default, Brunet-style K-selection with stability-drop auto-K, cNMF-style consensus heatmap, RFC labels). Use when fitting consensus NMF gene programs on single-cell AnnData, choosing K, building consensus, or converting normalized usage programs into hard cluster labels.

2026-05-114

omicverse-single-cell-monocle2-trajectory.md

from "omicverse/omicverse-skills"

Monocle2-style single-cell trajectory analysis on AnnData via the `ov.single.Monocle` class - DDRTree pseudotime + branch detection + per-gene differential test + BEAM branch-dependent gene discovery, plus the unified `ov.pl.trajectory` / `ov.pl.trajectory_overlay` / `ov.pl.trajectory_tree` plotters and the shared pseudotime visualisations (`branch_streamplot`, `dynamic_heatmap`, `dynamic_trends`). Use when fitting a Monocle2 trajectory on an annotated AnnData, when deriving branch-aware gene trends with `dynamic_features`, or when reproducing `t_traj_monocle2`.

2026-05-114

omicverse-single-cell-sctour-trajectory.md

from "omicverse/omicverse-skills"

Run the OmicVerse sctour trajectory branch on raw-count single-cell AnnData. Use when adapting the scTour part of an OmicVerse trajectory notebook, or when you need sctour pseudotime, latent space, or vector-field outputs instead of the diffusion_map, slingshot, or palantir branches.

2026-05-114

package.json

"author": "omicverse"

"repository": "omicverse/omicverse-skills"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

데이터 과학자컴퓨터 및 수학직15-2051L4

Bulk RNA-seq differential expression with omicverse

Overview

Instructions

Set up the session

Import omicverse as ov, scanpy as sc, and matplotlib.pyplot as plt.
Call ov.style() so downstream plots adopt omicverse styling.

Prepare ID mapping assets

When gene IDs must be converted to gene symbols, instruct the user to download mapping pairs via ov.utils.download_geneid_annotation_pair() and store them under genesets/.
Mention the available prebuilt genomes (T2T-CHM13, GRCh38, GRCh37, GRCm39, danRer7, danRer11) and that users can generate their own mapping from GTF files if needed.

Load the raw counts

Read tab-delimited featureCounts output with ov.pd.read_csv(..., sep='\t', header=1, index_col=0).
Strip trailing .bam segments from column names using list comprehension so sample IDs are clean.

Map gene identifiers(optional)

Run ov.bulk.Matrix_ID_mapping(counts_df, 'genesets/pair_<GENOME>.tsv') to replace gene_id entries with gene symbols.

Initialise the DEG object

Create dds = ov.bulk.pyDEG(mapped_counts).
Handle duplicate gene symbols with dds.drop_duplicates_index() to keep the highest expressed version.

Normalise and estimate size factors

Execute dds.normalize() to calculate DESeq2 size factors, correcting for library size and batch differences.

Run differential testing

Collect treatment and control replicate labels into lists.
Call dds.deg_analysis(treatment_groups, control_groups, method='ttest') for the default Welch t-test.
Offer optional alternatives: method='edgepy' for edgeR-like tests and method='limma' for limma-style modelling.

Filter and threshold results

Note that lowly expressed genes are retained by default; filter using dds.result.loc[dds.result['log2(BaseMean)'] > 1] when needed.
Set dynamic fold-change and significance cutoffs via dds.foldchange_set(fc_threshold=-1, pval_threshold=0.05, logp_max=6) (fc_threshold=-1 auto-selects based on log2FC distribution).

Visualise differential expression

Produce volcano plots with dds.plot_volcano(title=..., figsize=..., plot_genes=... or plot_genes_num=...) to highlight key genes.
Generate per-gene boxplots using dds.plot_boxplot(genes=[...], treatment_groups=..., control_groups=..., figsize=..., legend_bbox=...); adjust y-axis tick labels if required.

Perform pathway enrichment (optional)

Download curated pathway libraries through ov.utils.download_pathway_database().
Load genesets with ov.utils.geneset_prepare(<path>, organism='Mouse'|'Human'|...).
Build the DEG gene list from dds.result.loc[dds.result['sig'] != 'normal'].index.
Run enrichment with ov.bulk.geneset_enrichment(gene_list=deg_genes, pathways_dict=..., pvalue_type='auto', organism=...). Encourage users without internet access to provide a background gene list.
Visualise single-library results via ov.bulk.geneset_plot(...) and combine multiple ontologies using ov.bulk.geneset_plot_multi(enr_dict, colors_dict, num=...).

Document outputs

Suggest exporting dds.result and enrichment tables to CSV for downstream reporting.
Encourage users to save figures generated by matplotlib (plt.savefig(...)) when running outside notebooks.

Defensive validation

# Before DEG: verify treatment/control groups exist as column names
all_cols = set(dds.result.columns) if hasattr(dds, 'result') else set(counts_df.columns)
for g in treatment_groups + control_groups:
    assert g in all_cols, f"Sample '{g}' not found in count matrix columns"
# Verify groups don't overlap
assert not set(treatment_groups) & set(control_groups), "Treatment and control groups must not overlap"

Troubleshooting tips

Ensure sample labels in treatment_groups/control_groups exactly match column names post-cleanup.
Verify required packages (omicverse, pyComplexHeatmap, gseapy) are installed for enrichment visualisations.
Remind users that internet access is required the first time they download gene mappings or pathway databases.

Examples

"I have a featureCounts matrix for mouse tumour samples—normalize it with DESeq2, run t-test DEG, and highlight the top 8 genes in a volcano plot."

"Use omicverse to compute edgeR-style differential expression between treated and control replicates, then run GO enrichment on significant genes."

"Guide me through converting Ensembl IDs to symbols, performing limma DEG, and plotting boxplots for Krtap9-5 and Lef1."

References

Detailed walkthrough notebook: t_deg.ipynb

Sample count matrix example: see the tutorial page for downloadable example inputs: t_deg

Quick copy/paste commands: reference.md

bulk-rna-seq-differential-expression-with-omicverse

Bulk RNA-seq differential expression with omicverse

Overview

Instructions

Examples

References

이 저장소의 다른 Skills

이 저장소의 다른 Skills

Bulk RNA-seq differential expression with omicverse

Overview

Instructions

Examples

References