| name | bio-machine-learning-atlas-mapping |
| description | Maps query single-cell data to reference atlases using scArches transfer learning with scVI and scANVI models. Transfers cell type labels without retraining on combined data. Use when annotating new single-cell datasets using pre-trained reference models. |
| tool_type | python |
| primary_tool | scvi-tools |
Version Compatibility
Reference examples tested with: anndata 0.10+, scanpy 1.10+, scvi-tools 1.1+
Before using code patterns, verify installed versions match. If versions differ:
- Python:
pip show <package> then help(module.function) to check signatures
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
Transfer Learning for Single-Cell Data
"Map my scRNA-seq data onto a reference atlas" → Transfer cell type labels from a pre-trained reference model to query cells using architectural surgery (scArches) without retraining.
- Python:
scvi.model.SCVI.load_query_data() → get_latent_representation() → scanpy.tl.ingest()
scVI Reference Mapping (scArches)
Goal: Map query single-cell data onto a pre-trained reference model to obtain a shared latent embedding.
Approach: Load a pre-trained scVI model, prepare query data with matching gene sets, then perform surgical fine-tuning that updates only query-specific parameters.
import scvi
import scanpy as sc
adata_ref = sc.read_h5ad('reference.h5ad')
scvi.model.SCVI.setup_anndata(adata_ref, layer='counts', batch_key='batch')
ref_model = scvi.model.SCVI.load('reference_model/', adata=adata_ref)
adata_query = sc.read_h5ad('query.h5ad')
adata_query = adata_query[:, adata_ref.var_names].copy()
scvi.model.SCVI.prepare_query_anndata(adata_query, ref_model)
query_model = scvi.model.SCVI.load_query_data(adata_query, ref_model)
query_model.train(max_epochs=200, plan_kwargs={'weight_decay': 0.0})
adata_query.obsm['X_scVI'] = query_model.get_latent_representation()
scANVI for Label Transfer
Goal: Transfer cell type labels from a labeled reference atlas to unlabeled query data.
Approach: Train a semi-supervised scANVI model on the reference, then map query cells via surgical fine-tuning and predict labels with confidence scores.
import scvi
import scanpy as sc
adata_ref = sc.read_h5ad('reference_labeled.h5ad')
scvi.model.SCVI.setup_anndata(adata_ref, layer='counts', batch_key='batch')
ref_vae = scvi.model.SCVI(adata_ref, n_latent=30)
ref_vae.train(max_epochs=100)
scvi.model.SCANVI.setup_anndata(adata_ref, layer='counts', batch_key='batch', labels_key='cell_type', unlabeled_category='Unknown')
ref_scanvi = scvi.model.SCANVI.from_scvi_model(ref_vae, labels_key='cell_type', unlabeled_category='Unknown')
ref_scanvi.train(max_epochs=50)
ref_scanvi.save('reference_scanvi/')
adata_query = sc.read_h5ad('query.h5ad')
adata_query = adata_query[:, adata_ref.var_names].copy()
scvi.model.SCANVI.prepare_query_anndata(adata_query, ref_scanvi)
query_scanvi = scvi.model.SCANVI.load_query_data(adata_query, ref_scanvi)
query_scanvi.train(max_epochs=100, plan_kwargs={'weight_decay': 0.0})
adata_query.obs['predicted_cell_type'] = query_scanvi.predict()
adata_query.obsm['X_scANVI'] = query_scanvi.get_latent_representation()
Prediction Confidence
Goal: Assess reliability of transferred labels and flag cells that may represent novel types.
Approach: Extract soft prediction probabilities from the scANVI model and identify low-confidence assignments below a threshold.
soft_predictions = query_scanvi.predict(soft=True)
adata_query.obs['prediction_confidence'] = soft_predictions.max(axis=1)
low_conf = adata_query.obs['prediction_confidence'] < 0.5
print(f'Low confidence predictions: {low_conf.sum()} ({low_conf.mean():.1%})')
Joint Embedding Visualization
Goal: Visualize reference and query cells together to assess integration quality.
Approach: Concatenate reference and query datasets, compute UMAP from the shared latent representation, and color by dataset and cell type.
import scanpy as sc
adata_combined = adata_ref.concatenate(adata_query, batch_key='dataset', batch_categories=['reference', 'query'])
sc.pp.neighbors(adata_combined, use_rep='X_scVI')
sc.tl.umap(adata_combined)
sc.pl.umap(adata_combined, color=['dataset', 'cell_type'], save='_transfer.png')
Pre-trained Reference Atlases
| Atlas | Model | URL |
|---|
| Human Lung Cell Atlas | scANVI | cellxgene.cziscience.com |
| Tabula Sapiens | scVI | tabula-sapiens-portal.ds.czbiohub.org |
| Mouse Cell Atlas | scVI | bis.zju.edu.cn/MCA |
Training Parameters
| Parameter | Surgical | Full Retrain | Notes |
|---|
| weight_decay | 0.0 | 0.001 | 0.0 preserves reference |
| max_epochs | 100-200 | 200-400 | Less for surgery |
| early_stopping | True | True | Prevents overfitting |
Related Skills
- single-cell/cell-annotation - Manual annotation methods
- single-cell/batch-integration - Batch effect correction
- single-cell/preprocessing - Data preparation before transfer