Wiring bulk outputs into ov.bulk.pyDEG (DESeq2)
Both paths end with a gene × sample integer count matrix that pyDEG consumes.
From Path A — already a pandas DataFrame:
dds = ov.bulk.pyDEG(counts)
From Path B — load per-sample h5ad, harmonise gene names, concatenate, transpose:
ad_dict = {}
for sra in ['SRR12544419', 'SRR12544421', 'SRR12544433', 'SRR12544435']:
ad = ov.read(f'./results/{sra}/counts_unfiltered/adata.h5ad')
gene_name = ov.pd.read_csv(
f'./results/{sra}/counts_unfiltered/cells_x_genes.genes.names.txt',
header=None,
)
ad.var['gene_name'] = gene_name[0].tolist()
ad.var['gene_id'] = ad.var.index
ad.var.index = ad.var['gene_name']
ad.var_names_make_unique()
ad.obs['sra'] = sra
ad_dict[sra] = ad
adata = ov.concat(ad_dict)
adata.obs_names_make_unique()
adata.obs['Group'] = ['no', 'no', 'yes', 'yes']
counts = adata.to_df().T
dds = ov.bulk.pyDEG(counts)
Run DE + filter + volcano (same regardless of upstream path):
dds.drop_duplicates_index()
result = dds.deg_analysis(
treatment_groups=[...],
control_groups=[...],
method='DEseq2',
)
result = result.loc[result['log2(BaseMean)'] > 1]
dds.foldchange_set(fc_threshold=-1, pval_threshold=0.05, logp_max=10)
dds.plot_volcano(title='DEG Analysis', figsize=(4, 4),
plot_genes_num=8, plot_genes_fontsize=12)
- DESeq2 expects raw integer counts — do not log-normalize before
deg_analysis.
treatment_groups / control_groups are sample IDs (matrix columns), not group labels.
- For downstream GSEA / pathway / PPI, see the
gsea-enrichment, bulk-deg-analysis, bulk-stringdb-ppi skills.