// Comprehensive skill for scARCHES - Deep learning library for single-cell analysis of chromatin accessibility data. Use for scATAC-seq data processing, model training, batch correction, integration with scRNA-seq, and spatial chromatin analysis.
| name | scarches |
| description | Comprehensive skill for scARCHES - Deep learning library for single-cell analysis of chromatin accessibility data. Use for scATAC-seq data processing, model training, batch correction, integration with scRNA-seq, and spatial chromatin analysis. |
Comprehensive assistance with scarches development, generated from official documentation.
This skill should be triggered when you need to:
Data Analysis & Processing:
Model Development & Training:
Advanced Applications:
Technical Implementation:
import scanpy as sc
import torch
import scarches as sca
from scarches.dataset.trvae.data_handling import remove_sparsity
import matplotlib.pyplot as plt
import numpy as np
import gdown
# Configure scanpy settings
sc.settings.set_figure_params(dpi=200, frameon=False)
sc.set_figure_params(figsize=(4, 4))
torch.set_printoptions(precision=3, sci_mode=False, edgeitems=7)
# Ensure count data is in adata.X for models using 'nb' or 'zinb' loss
# Remove sparsity for memory efficiency
adata = remove_sparsity(adata)
# Remove obsm and varm matrices to prevent downstream errors
adata.obsm = {}
adata.varm = {}
# Check for integer count data
print(f"Data type: {adata.X.dtype}")
# TRVAE model with early stopping
condition_key = 'study'
cell_type_key = 'cell_type'
target_conditions = ['Pancreas CelSeq2', 'Pancreas SS2']
trvae_epochs = 500
surgery_epochs = 500
early_stopping_kwargs = {
"early_stopping_metric": "val_unweighted_loss",
"threshold": 0,
"patience": 20,
"reduce_lr": True,
"lr_patience": 13,
"lr_factor": 0.1,
}
# Create TRVAE model with NB loss (default)
trvae_model = sca.models.TRVAE(
adata=adata,
condition_key=condition_key,
recon_loss='nb', # Use 'mse' for normalized data, 'zinb' for zero-inflated
beta=1.0, # MMD regularization strength
hidden_layers=[128, 128],
latent_dim=10
)
# Create SCVI model with ZINB loss (default)
scvi_model = sca.models.SCVI(
adata=adata,
gene_likelihood='zinb' # Use 'nb' for negative binomial loss
)
# Train the model
scvi_model.train(max_epochs=400)
# Save for later use
scvi_model.save("scvi_model")
# Load pretrained model
scvi_model = sca.models.SCVI.load("scvi_model")
# Perform surgery to adapt to query data
query_model = sca.models.SCVI.load_query_data(
scvi_model,
adata_query,
freeze_classifier=True # Keep reference embeddings stable
)
# Train on query data with fewer epochs
query_model.train(max_epochs=200)
# Get latent representations
latent = query_model.get_latent_representation()
adata_query.obsm["X_scVI"] = latent
# For HLCA (Human Lung Cell Atlas) mapping
batch_key = 'batch' # How batches are identified
labels_key = 'cell_type' # Cell type annotations
unlabeled_category = 'unlabeled' # Category for unknown cells
# Set query batch (usually single batch per dataset)
adata_query.obs[batch_key] = 'my_dataset'
# Set all query cells as unlabeled for mapping
adata_query.obs[labels_key] = unlabeled_category
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import f1_score
# Train KNN on latent representations
knn = KNeighborsClassifier(n_neighbors=15)
knn.fit(reference_latent, reference_labels)
# Predict labels for query data
predicted_labels = knn.predict(query_latent)
adata_query.obs['predicted_cell_type'] = predicted_labels
# Calculate prediction confidence
confidence_scores = knn.predict_proba(query_latent)
adata_query.obs['prediction_confidence'] = np.max(confidence_scores, axis=1)
# Model architecture recommendations
hidden_layers = [128, 128] # Default, good for most datasets
latent_dim = 10 # Default (10-20 recommended)
beta = 1.0 # MMD strength for TRVAE
# Loss function selection
if has_raw_counts:
recon_loss = 'nb' or 'zinb' # For count data
else:
recon_loss = 'mse' # For normalized data
# Early stopping for TRVAE
early_stopping_kwargs = {
"early_stopping_metric": "val_unweighted_loss", # Best for TRVAE
"threshold": 0,
"patience": 20,
"reduce_lr": True,
"lr_patience": 13,
"lr_factor": 0.1,
}
A technique to adapt pre-trained reference models to new query data by:
This skill includes comprehensive documentation in references/:
Contains detailed tutorials and walkthroughs:
Essential information for model development:
Advanced use cases and domain-specific methods:
Use view to read specific reference files when detailed information is needed.
Organized documentation extracted from official sources:
Add your automation scripts here:
Store templates and reference materials:
To refresh this skill with updated documentation:
For the most current information, always cross-reference with the official scArches documentation and GitHub repository.