Run any Skill in Manus with one click

$pwd:

bulktrajblend-trajectory-interpolation

Name: Bulktrajblend Trajectory Interpolation
Author: omicverse

// Extend scRNA-seq developmental trajectories with BulkTrajBlend by generating intermediate cells from bulk RNA-seq, training beta-VAE and GNN models, and interpolating missing states.

Run Skill in Manus

$ git log --oneline --stat

stars:1,011

forks:129

updated:March 12, 2026 at 00:26

File Explorer

2 files

SKILL.md

readonly

name	bulktrajblend-trajectory-interpolation
title	BulkTrajBlend trajectory interpolation
description	Extend scRNA-seq developmental trajectories with BulkTrajBlend by generating intermediate cells from bulk RNA-seq, training beta-VAE and GNN models, and interpolating missing states.

BulkTrajBlend trajectory interpolation

Overview

Invoke this skill when users need to bridge gaps in single-cell developmental trajectories using matched bulk RNA-seq. It follows t_bulktrajblend.ipynb, showcasing how BulkTrajBlend deconvolves PDAC bulk samples, identifies overlapping communities with a GNN, and interpolates "interrupted" cell states.

Instructions

Prepare libraries and inputs
- Import omicverse as ov, scanpy as sc, scvelo as scv, and helper functions like from omicverse.utils import mde; run ov.plot_set().
- Load the reference scRNA-seq AnnData (scv.datasets.dentategyrus()) and raw bulk counts with ov.utils.read(...) followed by ov.bulk.Matrix_ID_mapping(...) for gene ID harmonisation.
Configure BulkTrajBlend
- Instantiate ov.bulk2single.BulkTrajBlend(bulk_seq=bulk_df, single_seq=adata, bulk_group=['dg_d_1','dg_d_2','dg_d_3'], celltype_key='clusters').
- Explain that bulk_group names correspond to raw bulk columns and the method expects unscaled counts.
Set beta-VAE expectations
- Call bulktb.vae_configure(cell_target_num=100) (or pass a dictionary) to define expected cell counts per cluster. Mention that omitting the argument triggers TAPE-based estimation.
Train or load the beta-VAE
- Use bulktb.vae_train(batch_size=512, learning_rate=1e-4, hidden_size=256, epoch_num=3500, vae_save_dir='...', vae_save_name='dg_btb_vae', generate_save_dir='...', generate_save_name='dg_btb').
- Highlight resuming with bulktb.vae_load('.../dg_btb_vae.pth') and the need to regenerate cells with consistent random seeds for reproducibility.
Generate synthetic cells
- Produce filtered AnnData via bulktb.vae_generate(leiden_size=25) and inspect compositions with ov.bulk2single.bulk2single_plot_cellprop(...).
- Save outputs to disk for reuse (adata.write_h5ad).
Configure and train the GNN
- Call bulktb.gnn_configure(max_epochs=2000, use_rep='X', neighbor_rep='X_pca', gpu=0, ...) to set hyperparameters.
- Train using bulktb.gnn_train(); reload checkpoints with bulktb.gnn_load('save_model/gnn.pth').
- Generate overlapping community assignments through bulktb.gnn_generate().
Visualise community structure
- Create MDE embeddings: bulktb.nocd_obj.adata.obsm['X_mde'] = mde(bulktb.nocd_obj.adata.obsm['X_pca']).
- Plot clusters vs. discovered communities using sc.pl.embedding(..., color=['clusters','nocd_n'], palette=ov.utils.pyomic_palette()) and filtered subsets excluding synthetic labels with hyphens.
Interpolate missing states
- Run bulktb.interpolation('OPC') (replace with target lineage) to synthesise continuity, then preprocess the interpolated AnnData (HVG selection, scaling, PCA).
- Compute embeddings with mde, visualise with ov.pl.embedding, and compare to the original atlas.
Analyse trajectories
- Initialise ov.single.pyVIA on both original and interpolated data to derive pseudotime, followed by get_pseudotime, ov.pp.neighbors, ov.utils.cal_paga, and ov.utils.plot_paga for topology validation.

Defensive validation

# Before BulkTrajBlend: verify bulk_group columns exist
for g in bulk_group:
    assert g in bulk_df.columns, f"Bulk group '{g}' not in bulk data columns"
# Verify celltype_key exists in reference
assert celltype_key in adata.obs.columns, f"Cell type column '{celltype_key}' not in reference AnnData"
# Verify gene name overlap
shared = set(bulk_df.index) & set(adata.var_names)
assert len(shared) > 100, f"Only {len(shared)} shared genes — harmonize gene IDs first"

Troubleshooting tips
- If the VAE collapses (high reconstruction loss), lower learning_rate or reduce hidden_size.
- Ensure the same generated dataset is used before calling gnn_train; regenerating cells changes the graph and can break checkpoint loading.
- Sparse clusters may need adjusted cell_target_num thresholds or a smaller leiden_size filter to retain rare populations.

Examples

"Train BulkTrajBlend on PDAC cohorts, then interpolate missing OPC states in the trajectory."
"Load saved beta-VAE and GNN weights to regenerate overlapping communities and plot cluster vs. nocd labels."
"Run VIA on interpolated cells and compare PAGA graphs with the original scRNA-seq trajectory."

References

Tutorial notebook: t_bulktrajblend.ipynb
Example datasets and checkpoints: omicverse_guide/docs/Tutorials-bulk2single/data/
Quick copy/paste commands: reference.md

related-skills.json

same repository

omicverse-visualization-for-bulk-color-systems-and-single-cell-d.md

from "omicverse/omicverse"

OmicVerse plotting: volcano, venn, boxplot, embedding, density, heatmap families, dotplot, convex hull, stacked bar, and Forbidden City color palettes.

2026-04-031.0k

single-cell-cellphonedb-communication-mapping.md

from "omicverse/omicverse"

CellPhoneDB v5 ligand-receptor analysis, CellChatViz plots, and the newer ccc_heatmap / ccc_network_plot / ccc_stat_plot communication visualizations in OmicVerse.

2026-04-031.0k

single-cell-annotation-skills-with-omicverse.md

from "omicverse/omicverse"

Cell type annotation: SCSA, MetaTiME, CellVote consensus, CellMatch, GPTAnno, weighted KNN label transfer in OmicVerse.

2026-03-261.0k

bulk-rna-seq-batch-correction-with-combat.md

from "omicverse/omicverse"

Bulk RNA-seq batch correction with pyComBat: remove batch effects from merged cohorts, export corrected matrices, and benchmark visualizations.

2026-03-121.0k

bulk-rna-seq-deseq2-analysis-with-omicverse.md

from "omicverse/omicverse"

PyDESeq2 differential expression: ID mapping, DE testing, fold-change thresholding, and GSEA enrichment visualization in OmicVerse.

2026-03-121.0k

string-protein-interaction-analysis-with-omicverse.md

from "omicverse/omicverse"

STRING protein-protein interaction network analysis with pyPPI: query STRING database, build PPI graphs, expand with add_nodes, and visualize styled networks for bulk gene lists.

2026-03-121.0k

package.json

"author": "omicverse"

"repository": "omicverse/omicverse"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Biological Scientists, All OtherLife, Physical, and Social Science Occupations19-1029L4

name	bulktrajblend-trajectory-interpolation
title	BulkTrajBlend trajectory interpolation
description	Extend scRNA-seq developmental trajectories with BulkTrajBlend by generating intermediate cells from bulk RNA-seq, training beta-VAE and GNN models, and interpolating missing states.

BulkTrajBlend trajectory interpolation

Overview

Instructions

Prepare libraries and inputs
- Import omicverse as ov, scanpy as sc, scvelo as scv, and helper functions like from omicverse.utils import mde; run ov.plot_set().
- Load the reference scRNA-seq AnnData (scv.datasets.dentategyrus()) and raw bulk counts with ov.utils.read(...) followed by ov.bulk.Matrix_ID_mapping(...) for gene ID harmonisation.
Configure BulkTrajBlend
- Instantiate ov.bulk2single.BulkTrajBlend(bulk_seq=bulk_df, single_seq=adata, bulk_group=['dg_d_1','dg_d_2','dg_d_3'], celltype_key='clusters').
- Explain that bulk_group names correspond to raw bulk columns and the method expects unscaled counts.
Set beta-VAE expectations
- Call bulktb.vae_configure(cell_target_num=100) (or pass a dictionary) to define expected cell counts per cluster. Mention that omitting the argument triggers TAPE-based estimation.
Train or load the beta-VAE
- Use bulktb.vae_train(batch_size=512, learning_rate=1e-4, hidden_size=256, epoch_num=3500, vae_save_dir='...', vae_save_name='dg_btb_vae', generate_save_dir='...', generate_save_name='dg_btb').
- Highlight resuming with bulktb.vae_load('.../dg_btb_vae.pth') and the need to regenerate cells with consistent random seeds for reproducibility.
Generate synthetic cells
- Produce filtered AnnData via bulktb.vae_generate(leiden_size=25) and inspect compositions with ov.bulk2single.bulk2single_plot_cellprop(...).
- Save outputs to disk for reuse (adata.write_h5ad).
Configure and train the GNN
- Call bulktb.gnn_configure(max_epochs=2000, use_rep='X', neighbor_rep='X_pca', gpu=0, ...) to set hyperparameters.
- Train using bulktb.gnn_train(); reload checkpoints with bulktb.gnn_load('save_model/gnn.pth').
- Generate overlapping community assignments through bulktb.gnn_generate().
Visualise community structure
- Create MDE embeddings: bulktb.nocd_obj.adata.obsm['X_mde'] = mde(bulktb.nocd_obj.adata.obsm['X_pca']).
- Plot clusters vs. discovered communities using sc.pl.embedding(..., color=['clusters','nocd_n'], palette=ov.utils.pyomic_palette()) and filtered subsets excluding synthetic labels with hyphens.
Interpolate missing states
- Run bulktb.interpolation('OPC') (replace with target lineage) to synthesise continuity, then preprocess the interpolated AnnData (HVG selection, scaling, PCA).
- Compute embeddings with mde, visualise with ov.pl.embedding, and compare to the original atlas.
Analyse trajectories
- Initialise ov.single.pyVIA on both original and interpolated data to derive pseudotime, followed by get_pseudotime, ov.pp.neighbors, ov.utils.cal_paga, and ov.utils.plot_paga for topology validation.

Defensive validation

# Before BulkTrajBlend: verify bulk_group columns exist
for g in bulk_group:
    assert g in bulk_df.columns, f"Bulk group '{g}' not in bulk data columns"
# Verify celltype_key exists in reference
assert celltype_key in adata.obs.columns, f"Cell type column '{celltype_key}' not in reference AnnData"
# Verify gene name overlap
shared = set(bulk_df.index) & set(adata.var_names)
assert len(shared) > 100, f"Only {len(shared)} shared genes — harmonize gene IDs first"

Troubleshooting tips
- If the VAE collapses (high reconstruction loss), lower learning_rate or reduce hidden_size.
- Ensure the same generated dataset is used before calling gnn_train; regenerating cells changes the graph and can break checkpoint loading.
- Sparse clusters may need adjusted cell_target_num thresholds or a smaller leiden_size filter to retain rare populations.

Examples

"Train BulkTrajBlend on PDAC cohorts, then interpolate missing OPC states in the trajectory."
"Load saved beta-VAE and GNN weights to regenerate overlapping communities and plot cluster vs. nocd labels."
"Run VIA on interpolated cells and compare PAGA graphs with the original scRNA-seq trajectory."

References

Tutorial notebook: t_bulktrajblend.ipynb
Example datasets and checkpoints: omicverse_guide/docs/Tutorials-bulk2single/data/
Quick copy/paste commands: reference.md

bulktrajblend-trajectory-interpolation

BulkTrajBlend trajectory interpolation

Overview

Instructions

Examples

References

More from this repository

More from this repository

BulkTrajBlend trajectory interpolation

Overview

Instructions

Examples

References