with one click
bulktrajblend-trajectory-interpolation
// Extend scRNA-seq developmental trajectories with BulkTrajBlend by generating intermediate cells from bulk RNA-seq, training beta-VAE and GNN models, and interpolating missing states.
// Extend scRNA-seq developmental trajectories with BulkTrajBlend by generating intermediate cells from bulk RNA-seq, training beta-VAE and GNN models, and interpolating missing states.
OmicVerse plotting: volcano, venn, boxplot, embedding, density, heatmap families, dotplot, convex hull, stacked bar, and Forbidden City color palettes.
CellPhoneDB v5 ligand-receptor analysis, CellChatViz plots, and the newer ccc_heatmap / ccc_network_plot / ccc_stat_plot communication visualizations in OmicVerse.
Cell type annotation: SCSA, MetaTiME, CellVote consensus, CellMatch, GPTAnno, weighted KNN label transfer in OmicVerse.
Bulk RNA-seq batch correction with pyComBat: remove batch effects from merged cohorts, export corrected matrices, and benchmark visualizations.
PyDESeq2 differential expression: ID mapping, DE testing, fold-change thresholding, and GSEA enrichment visualization in OmicVerse.
STRING protein-protein interaction network analysis with pyPPI: query STRING database, build PPI graphs, expand with add_nodes, and visualize styled networks for bulk gene lists.
| name | bulktrajblend-trajectory-interpolation |
| title | BulkTrajBlend trajectory interpolation |
| description | Extend scRNA-seq developmental trajectories with BulkTrajBlend by generating intermediate cells from bulk RNA-seq, training beta-VAE and GNN models, and interpolating missing states. |
Invoke this skill when users need to bridge gaps in single-cell developmental trajectories using matched bulk RNA-seq. It follows t_bulktrajblend.ipynb, showcasing how BulkTrajBlend deconvolves PDAC bulk samples, identifies overlapping communities with a GNN, and interpolates "interrupted" cell states.
omicverse as ov, scanpy as sc, scvelo as scv, and helper functions like from omicverse.utils import mde; run ov.plot_set().scv.datasets.dentategyrus()) and raw bulk counts with ov.utils.read(...) followed by ov.bulk.Matrix_ID_mapping(...) for gene ID harmonisation.ov.bulk2single.BulkTrajBlend(bulk_seq=bulk_df, single_seq=adata, bulk_group=['dg_d_1','dg_d_2','dg_d_3'], celltype_key='clusters').bulk_group names correspond to raw bulk columns and the method expects unscaled counts.bulktb.vae_configure(cell_target_num=100) (or pass a dictionary) to define expected cell counts per cluster. Mention that omitting the argument triggers TAPE-based estimation.bulktb.vae_train(batch_size=512, learning_rate=1e-4, hidden_size=256, epoch_num=3500, vae_save_dir='...', vae_save_name='dg_btb_vae', generate_save_dir='...', generate_save_name='dg_btb').bulktb.vae_load('.../dg_btb_vae.pth') and the need to regenerate cells with consistent random seeds for reproducibility.bulktb.vae_generate(leiden_size=25) and inspect compositions with ov.bulk2single.bulk2single_plot_cellprop(...).adata.write_h5ad).bulktb.gnn_configure(max_epochs=2000, use_rep='X', neighbor_rep='X_pca', gpu=0, ...) to set hyperparameters.bulktb.gnn_train(); reload checkpoints with bulktb.gnn_load('save_model/gnn.pth').bulktb.gnn_generate().bulktb.nocd_obj.adata.obsm['X_mde'] = mde(bulktb.nocd_obj.adata.obsm['X_pca']).sc.pl.embedding(..., color=['clusters','nocd_n'], palette=ov.utils.pyomic_palette()) and filtered subsets excluding synthetic labels with hyphens.bulktb.interpolation('OPC') (replace with target lineage) to synthesise continuity, then preprocess the interpolated AnnData (HVG selection, scaling, PCA).mde, visualise with ov.pl.embedding, and compare to the original atlas.ov.single.pyVIA on both original and interpolated data to derive pseudotime, followed by get_pseudotime, ov.pp.neighbors, ov.utils.cal_paga, and ov.utils.plot_paga for topology validation.# Before BulkTrajBlend: verify bulk_group columns exist
for g in bulk_group:
assert g in bulk_df.columns, f"Bulk group '{g}' not in bulk data columns"
# Verify celltype_key exists in reference
assert celltype_key in adata.obs.columns, f"Cell type column '{celltype_key}' not in reference AnnData"
# Verify gene name overlap
shared = set(bulk_df.index) & set(adata.var_names)
assert len(shared) > 100, f"Only {len(shared)} shared genes — harmonize gene IDs first"
learning_rate or reduce hidden_size.gnn_train; regenerating cells changes the graph and can break checkpoint loading.cell_target_num thresholds or a smaller leiden_size filter to retain rare populations.t_bulktrajblend.ipynbomicverse_guide/docs/Tutorials-bulk2single/data/reference.md