| name | bio-proteomics-differential-abundance |
| description | Statistical testing for differentially abundant proteins between conditions. Covers limma and MSstats workflows with multiple testing correction. Use when identifying proteins with significant abundance changes between experimental groups. |
| tool_type | mixed |
| primary_tool | MSstats |
Version Compatibility
Reference examples tested with: R stats (base), ggplot2 3.5+, limma 3.58+, numpy 1.26+, pandas 2.2+, scipy 1.12+, statsmodels 0.14+
Before using code patterns, verify installed versions match. If versions differ:
- Python:
pip show <package> then help(module.function) to check signatures
- R:
packageVersion('<pkg>') then ?function_name to verify parameters
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
Differential Protein Abundance
"Find differentially abundant proteins between my conditions" → Perform statistical testing on quantified protein intensities to identify proteins with significant abundance changes between experimental groups.
- R:
MSstats::groupComparison() for feature-level mixed models
- R:
limma::eBayes() for empirical Bayes moderated t-tests on protein-level data
- Python:
scipy.stats.ttest_ind() with statsmodels FDR correction
MSstats Group Comparison (R stats (base)+)
Goal: Identify differentially abundant proteins between experimental conditions using feature-level mixed models or moderated t-tests.
Approach: Define contrast matrices for pairwise comparisons, run MSstats groupComparison (or limma eBayes for protein-level data), then filter results by adjusted p-value and log2 fold change thresholds.
library(MSstats)
comparison_matrix <- matrix(c(1, -1, 0, 0,
1, 0, -1, 0,
0, 1, -1, 0),
nrow = 3, byrow = TRUE)
rownames(comparison_matrix) <- c('Treatment1-Control', 'Treatment2-Control', 'Treatment1-Treatment2')
colnames(comparison_matrix) <- c('Control', 'Treatment1', 'Treatment2', 'Treatment3')
results <- groupComparison(contrast.matrix = comparison_matrix, data = processed)
sig_proteins <- results$ComparisonResult[results$ComparisonResult$adj.pvalue < 0.05 &
abs(results$ComparisonResult$log2FC) > 1, ]
limma for Proteomics (R stats (base)+)
library(limma)
design <- model.matrix(~ 0 + condition, data = sample_info)
colnames(design) <- levels(sample_info$condition)
fit <- lmFit(protein_matrix, design)
contrast_matrix <- makeContrasts(Treatment - Control, levels = design)
fit2 <- contrasts.fit(fit, contrast_matrix)
fit2 <- eBayes(fit2)
results <- topTable(fit2, number = Inf, adjust.method = 'BH')
sig_results <- results[results$adj.P.Val < 0.05 & abs(results$logFC) > 1, ]
QFeatures/proDA (Modern Alternative)
library(QFeatures)
library(proDA)
fit <- proDA(protein_matrix, design = ~ condition, data = sample_info)
results <- test_diff(fit, contrast = 'conditionTreatment')
results$adj_pval <- p.adjust(results$pval, method = 'BH')
sig_results <- results[results$adj_pval < 0.05 & abs(results$diff) > 1, ]
Python: scipy/statsmodels
import pandas as pd
import numpy as np
from scipy import stats
from statsmodels.stats.multitest import multipletests
def differential_test(intensities, group1_cols, group2_cols):
results = []
for protein in intensities.index:
g1 = intensities.loc[protein, group1_cols].dropna()
g2 = intensities.loc[protein, group2_cols].dropna()
if len(g1) >= 2 and len(g2) >= 2:
stat, pval = stats.ttest_ind(g1, g2)
log2fc = g2.mean() - g1.mean()
results.append({'protein': protein, 'log2FC': log2fc, 'pvalue': pval})
df = pd.DataFrame(results)
df['adj_pvalue'] = multipletests(df['pvalue'], method='fdr_bh')[1]
return df
sig = results[(results['adj_pvalue'] < 0.05) & (abs(results['log2FC']) > 1)]
Visualization (R stats (base)+)
library(ggplot2)
ggplot(results, aes(x = log2FC, y = -log10(adj.P.Val))) +
geom_point(aes(color = significant), alpha = 0.6) +
geom_hline(yintercept = -log10(0.05), linetype = 'dashed') +
geom_vline(xintercept = c(-1, 1), linetype = 'dashed') +
scale_color_manual(values = c('grey', 'red')) +
theme_minimal()
Related Skills
- quantification - Prepare normalized data for testing
- differential-expression/deseq2-basics - Similar concepts for RNA-seq
- data-visualization/specialized-omics-plots - Volcano plots, MA plots