| name | bio-differential-expression-edger-basics |
| description | Performs differential expression on bulk RNA-seq count data with edgeR's negative-binomial GLM and quasi-likelihood F-test framework. Covers DGEList construction, filterByExpr, TMM/TMMwsp normalization, robust dispersion estimation, glmQLFit/glmQLFTest, TREAT for magnitude-bounded hypotheses, contrasts via no-intercept designs, voom and voomWithQualityWeights for heterogeneous samples, and the edgeR v4 bias-corrected APL changes. Use when running bulk DE with edgeR, choosing edgeR over DESeq2 (small n, transcript DE via catchSalmon, large samples), needing TREAT for a fold-change-threshold hypothesis, troubleshooting v3-to-v4 reproducibility, building paired or interaction designs, or handling library-quality heterogeneity. |
| tool_type | r |
| primary_tool | edgeR |
Version Compatibility
Reference examples tested with: edgeR 4.0+, limma 3.58+, statmod 1.5+ (for voom internals), tximport 1.30+ (for catchSalmon path)
Before using code patterns, verify installed versions match. If versions differ:
- R:
packageVersion('<pkg>') then ?function_name to verify parameters
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
edgeR Basics
"Find differentially expressed genes between conditions" -> Fit a negative-binomial GLM per gene with empirical-Bayes-moderated dispersions, test coefficients with the quasi-likelihood F-test (proper finite-sample FPR control), and report ranked DE lists.
The Single Most Important Modern Insight -- edgeR v4 changed the QL framework and legacy=FALSE is now default
Chen, Chen, Lun, Baldoni, Smyth (2025) Nucleic Acids Res 53(2):gkaf018 introduced a bias-corrected adjusted profile likelihood (APL) for dispersion estimation that handles small/zero counts properly, and reworked the QL framework. glmQLFit() in v4 takes legacy=FALSE by default. Old (v3) pipelines that worked perfectly in 2023 now produce subtly different numbers in 2025 -- not wrong, just different. If a tutorial result doesn't match, check which version was used; set legacy=TRUE to reproduce v3 exactly OR (preferred) re-run with v4 defaults and accept the new -- better -- numbers.
Two other v4 changes worth knowing: (1) calcNormFactors() is deprecated in favor of normLibSizes() (same function, new name); (2) method='TMM' remains the documented default; method='TMMwsp' (TMM with singleton pairing) is an alternative introduced for samples with many zeros and is the preferred choice for sparse / low-count data. Pass method= explicitly for reproducibility. Old names still work, so old scripts run, but new code should use the new names.
The choice between Wald-equivalent (glmLRT) and QL-F (glmQLFTest) is not a stylistic preference: glmQLFTest is the modern default because it accounts for uncertainty in the dispersion estimate via an additional QL dispersion. glmLRT is anti-conservative with small n. The two p-values differ -- glmQLFTest p-values are typically >= glmLRT p-values for the same model (the second-stage QL dispersion correction is usually >= 1).
Algorithmic Taxonomy
| Test | What it does | When mandatory | Failure mode |
|---|
QL F-test (glmQLFit + glmQLFTest) | NB GLM with second-stage QL dispersion to model dispersion uncertainty | DEFAULT for any modern bulk DE | None within design assumptions |
LRT (glmFit + glmLRT) | Likelihood-ratio of nested GLMs using point dispersion estimate | Only when n>=10/group AND simple design AND no QL F-test available | Anti-conservative with small n -- inflated false positives |
Exact test (exactTest) | Conditional NB test for two groups, no covariates | Legacy; one-factor two-group ONLY | Cannot adjust for batch or covariates |
TREAT (glmTreat) | Tests H0: | LFC | <= tau vs HA: |
| voom + lmFit + eBayes | Linear model with empirical-Bayes moderation on voom-weighted log-CPM | Heterogeneous library sizes (>3x); samples with widely varying quality | Assumes log-normal-after-weighting; less direct count model |
| voomWithQualityWeights | voom + per-sample quality weights from arrayWeights (Liu 2015) | Some samples markedly worse quality (low RIN, contamination) | With very small n, sample weights are noisy |
Decision Tree by Scenario
| Scenario | Recommended path | Why |
|---|
| Standard bulk RNA-seq, n>=3/group, modern script | filterByExpr -> normLibSizes -> estimateDisp(robust=TRUE) -> glmQLFit(robust=TRUE) -> glmQLFTest | Modern default with proper FPR control |
| n=2-3/group | edgeR QL F-test (over DESeq2) | Schurch 2016 RNA 22:839 + edgeR v4 changes: QL is the tightest FPR control at small n |
| Library sizes vary >3x or RIN heterogeneous | voom or voomWithQualityWeights + lmFit + eBayes | Precision weights downweight noisy observations per sample |
| Pre-specified biologically meaningful fold-change threshold | glmTreat(fit, coef=..., lfc=log2(1.5)) | Post-hoc filtering by |
| Multi-group, "any change" omnibus | glmQLFTest(fit, coef=2:k) | Joint F-test on multiple coefficients |
| Multi-group, all pairwise contrasts | ~ 0 + group parameterization + makeContrasts | Clean, readable contrasts |
| Transcript-level DE (DTE) | catchSalmon() / catchKallisto() -> DGEList with overdispersion | Baldoni 2024 NAR 52:e13: properly handles inferential variance |
| Cross-tool sanity check | Run DESeq2 in parallel; expect >=70% overlap at top 500 | <60% overlap suggests modeling problem, not tool difference |
| Single-cell pseudobulk | Aggregate counts per donor; standard edgeR QL pipeline | Crowell 2020 Nat Commun 11:6077: pseudobulk avoids the FDR inflation of cell-level DE |
| Reproducing pre-2025 result | glmQLFit(legacy=TRUE) | edgeR v4 introduced bias-corrected APL with legacy=FALSE default; v3 was slightly biased |
Standard Workflow
Goal: Take a raw integer count matrix and group labels to a ranked DE table with proper finite-sample FPR control.
Approach: DGEList -> design-aware filtering -> TMM normalization (or TMMwsp for sparse data) -> robust dispersion estimation -> QL fit with robust dispersion shrinkage -> QL F-test on the coefficient of interest -> ranked table.
library(edgeR)
library(limma)
y <- DGEList(counts = counts, group = group)
design <- model.matrix(~ group)
keep <- filterByExpr(y, design)
y <- y[keep, , keep.lib.sizes = FALSE]
y <- normLibSizes(y)
y <- estimateDisp(y, design, robust = TRUE)
plotBCV(y)
fit <- glmQLFit(y, design, robust = TRUE)
qlf <- glmQLFTest(fit, coef = 2)
topTags(qlf, n = 20)
all_de <- topTags(qlf, n = Inf, sort.by = 'none')$table
The robust = TRUE flags propagate Phipson, Lee, Majewski, Alexander, Smyth (2016) Ann Appl Stat 10:946 robust hyperparameter estimation through both the NB dispersion shrinkage (estimateDisp) AND the QL dispersion shrinkage (glmQLFit). Set BOTH; setting only one is the single most common omission. The default robust=FALSE is a relic.
Filtering -- filterByExpr Internals
Goal: Remove genes with insufficient expression to reduce noise and multiple-testing burden, in a design-aware way.
Approach: filterByExpr(y, design) uses the smallest group size from the design matrix to set the minimum number of samples; threshold is CPM >= min.count / median.lib.size * 1e6 AND total count >= min.total.count.
Default parameters: min.count = 10, min.total.count = 15, large.n = 10, min.prop = 0.7.
keep <- filterByExpr(y, design)
y <- y[keep, , keep.lib.sizes = FALSE]
Filter ONCE, before normalization and dispersion estimation. Filtering after estimateDisp invalidates the trended dispersion (the trend was fit on the now-smaller gene set). edgeR has no automatic independent filtering like DESeq2 -- skipping filterByExpr is the single biggest reason an edgeR analysis underperforms a comparable DESeq2 analysis.
Normalization -- TMM/TMMwsp Is an Offset, Not a Division
Goal: Correct for library composition bias (the "few genes consume disproportionate reads" problem) via a per-sample scaling factor.
Approach: normLibSizes(y) defaults to method='TMM' in v4 (same as v3); method='TMMwsp' is the preferred alternative for sparse / single-cell-pseudobulk data with many zeros. Result is a vector of normalization factors stored in y$samples$norm.factors; the effective library size is lib.size * norm.factors. Counts are NOT divided.
y <- normLibSizes(y)
y$samples$norm.factors
cpm_vis <- cpm(y, normalized.lib.sizes = TRUE)
log_cpm <- cpm(y, log = TRUE, prior.count = 2)
The factor enters the GLM as part of the offset. Running TMM on pre-normalized values is wrong (and silent -- gives wrong numbers without error). prior.count = 2 is the modern edgeR default for cpm(log=TRUE); smaller priors (0.25) make low-count log values noisy; larger priors (5-10) shrink them toward zero.
When TMM/TMMwsp assumptions fail (see Failure Modes: prokaryotic stress, MYC amplification, viral host shutoff): use method='upperquartile' or supply known stable reference genes via offsets.
QL F-test vs LRT vs Exact Test
fit_ql <- glmQLFit(y, design, robust = TRUE)
qlf <- glmQLFTest(fit_ql, coef = 2)
fit_lrt <- glmFit(y, design)
lrt <- glmLRT(fit_lrt, coef = 2)
y <- estimateDisp(y)
et <- exactTest(y)
QL F-test is the modern default. LRT is anti-conservative with small n because it does not account for dispersion uncertainty. The exact test handles only two-group one-factor designs and exists for backward compatibility -- use the GLM pipeline for anything with covariates or blocking.
In v4, the QL F-test is bias-corrected via the new APL when legacy=FALSE (the v4 default). To exactly reproduce a v3 result: glmQLFit(y, design, robust = TRUE, legacy = TRUE).
TREAT -- Testing Against a Fold-Change Threshold
Goal: Control FDR for the hypothesis "biologically meaningful fold change", not "fold change non-zero".
Approach: glmTreat(fit, coef=..., lfc=log2(tau)) tests H0: |LFC| <= tau vs HA: |LFC| > tau. The threshold tau MUST be biologically pre-specified -- choosing tau after seeing the data is p-hacking.
tr <- glmTreat(fit, coef = 2, lfc = log2(1.5))
topTags(tr)
The cardinal sin TREAT avoids: filtering padj < 0.05 & abs(logFC) > 1 after a vanilla QL F-test does NOT control FDR for "the LFC exceeds 1". Post-hoc filtering is FDR-controlled for the |LFC|>0 hypothesis only. If reviewers ask "what is the FDR of the '2-fold up' gene list", the only honest answers are TREAT (FDR controlled at the threshold) or "FDR is for non-zero only; the magnitude filter has no FDR guarantee".
Equivalent in DESeq2: results(dds, lfcThreshold = log2(1.5), altHypothesis = 'greaterAbs'). Both implement the McCarthy & Smyth 2009 Bioinformatics 25:765 idea.
Contrasts via No-Intercept Designs
Goal: Define clean pairwise or multi-group contrasts.
Approach: Use ~ 0 + group parameterization so each coefficient is the group mean; build makeContrasts(...) for any pairwise or weighted-average comparison.
design <- model.matrix(~ 0 + group)
colnames(design) <- levels(group)
y <- estimateDisp(y, design, robust = TRUE)
fit <- glmQLFit(y, design, robust = TRUE)
con <- makeContrasts(
TreatedVsControl = treated - control,
DrugAVsDrugB = drugA - drugB,
ATvsBT = (treated_A - control_A) - (treated_B - control_B),
levels = design
)
qlf_t_vs_c <- glmQLFTest(fit, contrast = con[, 'TreatedVsControl'])
qlf_interaction <- glmQLFTest(fit, contrast = con[, 'ATvsBT'])
makeContrasts is more readable than numeric vectors for any non-trivial design.
voom and voomWithQualityWeights
Goal: Use limma's linear-model framework with empirical-Bayes moderation, applying voom precision weights for the mean-variance trend of log-CPM.
Approach: voom transforms counts to log-CPM with per-observation precision weights derived from the mean-variance trend; lmFit + eBayes proceeds as for microarrays.
v <- voom(y, design, plot = TRUE)
fit <- lmFit(v, design)
fit <- eBayes(fit, robust = TRUE)
tt <- topTable(fit, coef = 2, number = Inf)
voomWithQualityWeights (Liu, Holik, Su et al. 2015 NAR 43:e97) adds per-sample weights to downweight outlier samples. Use when:
- Library sizes vary >5x across samples
- RIN varies >2 units
- PCA shows one or more samples clearly off the main cluster
v <- voomWithQualityWeights(y, design, plot = TRUE)
fit <- lmFit(v, design)
fit <- eBayes(fit, robust = TRUE)
eBayes(robust = TRUE) uses Phipson 2016 robust hyperparameter estimation -- NOT the default but strongly recommended for RNA-seq.
Transcript-Level DE via catchSalmon
Goal: Test differential transcript expression with proper inferential variance from quantification bootstrap replicates.
Approach: catchSalmon() / catchKallisto() import per-transcript counts with overdispersion estimates from the bootstrap replicates; pass to DGEList; the rest of the pipeline is standard.
salmon <- catchSalmon(paths = file.path('salmon_out', samples$id))
y_tx <- DGEList(counts = salmon$counts / salmon$annotation$Overdispersion,
genes = salmon$annotation)
Baldoni, Chen, Hediyeh-zadeh et al. 2024 NAR 52:e13 ("Dividing out quantification uncertainty"): the per-transcript Overdispersion column captures the variance contribution from the Salmon EM. Dividing the counts by this scaling provides effective counts that limma/edgeR can model as if quantification was certain.
edgeR vs DESeq2 vs limma-voom
| Scenario | Recommended | Rationale |
|---|
| Modern default | DESeq2 or edgeR QL (both fine) | ~70-90% overlap on top hits; choose by ecosystem |
| n = 2-3/group | edgeR QL | Tightest FPR at small n |
| Library sizes heterogeneous | limma-voom or voomWithQualityWeights | Precision weights matter most here |
| Salmon/kallisto -> gene-level DGE | DESeq2 (DESeqDataSetFromTximport handles offsets natively) | Cleanest path |
| Salmon -> transcript-level DTE | edgeR catchSalmon | Baldoni 2024 framework |
| Need apeglm/ashr LFC shrinkage | DESeq2 | edgeR has no equivalent built-in |
| Need TREAT-style threshold testing | Both (glmTreat or lfcThreshold=) | Equivalent |
| Python-only environment | PyDESeq2 | No edgeR Python equivalent |
If DESeq2 and edgeR agree on >=70% of the top 500: results are robust. <60%: investigate filtering, normalization, dispersion, or a confounded covariate -- usually a modeling issue.
Per-Method Failure Modes
Forgot filterByExpr -- inflated multiple-testing burden
Trigger: edgeR pipeline run on a count matrix with all genes (~60k for human) including many with zero or near-zero counts; significant gene list smaller than expected.
Mechanism: edgeR's QL pipeline does NOT have DESeq2's automatic independent filtering. Every gene tested contributes to the BH denominator.
Symptom: Low-count genes dominate the tested set; tested-gene count is 60k instead of ~15-20k; padj distribution skewed toward 1.
Fix: filterByExpr(y, design) BEFORE normalization and dispersion estimation. Filtering after estimateDisp invalidates the trended dispersion.
Used glmLRT with n=3 -- inflated false positives
Trigger: Tutorial copy-paste of glmFit + glmLRT on small-n data; many "significant" genes that don't replicate.
Mechanism: glmLRT uses a point dispersion estimate; with small n, dispersion is uncertain and the test is anti-conservative.
Symptom: Many more DE genes than DESeq2 or limma-voom on the same data; p-value histogram has anti-conservative left tail.
Fix: Use glmQLFit + glmQLFTest with robust = TRUE on both.
TMM/RLE assumption broken -- size factors absorb biology
Trigger: Bacterial stress response, viral infection with host shutoff, MYC-amplified tumor vs normal -- biological systems where >50% of genes truly change.
Mechanism: TMM/TMMwsp assumes most genes are unchanged. When violated, the "trimmed reference" is dominated by DE genes; the size factor compensates by absorbing the biological shift.
Symptom: MA plot shows the bulk cloud shifted off zero; reported fold changes don't match orthogonal validation (qPCR, Western); known DE genes show muted LFC.
Fix: normLibSizes(y, method='upperquartile') is a partial fix; supply ERCC spike-in offsets or curated stable housekeeping genes via y$offset for the principled solution.
Reproducibility break between edgeR v3 and v4
Trigger: Pre-2025 script run on edgeR 4.0+ produces different DE genes than the published result.
Mechanism: v4 changed the QL framework (bias-corrected APL) and the default behavior of glmQLFit (legacy = FALSE). TMM remains the documented default for normLibSizes; TMMwsp is an added alternative for sparse data.
Symptom: Different significant gene set; absolute LFC differences of 5-15% on low-count genes; reviewer confusion.
Fix: For exact reproduction: glmQLFit(y, design, robust=TRUE, legacy=TRUE). The normalization default has not actually changed -- TMM remains the v4 default. Better: rerun and accept the v4 -- improved -- numbers, noting the version change in methods.
Common errors
| Error / symptom | Cause | Fix |
|---|
decidetestsDGE not found | Removed in edgeR v4 | Use decideTests(qlf) |
design matrix not full rank | Confounded covariates | Inspect with alias(design)$Complete |
No residual df | Too few replicates for the model | Reduce model complexity or get more samples |
Script expects $adj.P.Val but topTags returns $FDR | Tool-name column mix-up | edgeR uses $FDR; limma uses $adj.P.Val; DESeq2 uses $padj |
| QL F p-values numerically different from a 2023 paper | edgeR v4 default legacy=FALSE | Set legacy=TRUE to reproduce v3, or note the version change |
References
- Robinson MD, McCarthy DJ, Smyth GK. 2010. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139-140. doi:10.1093/bioinformatics/btp616
- McCarthy DJ, Chen Y, Smyth GK. 2012. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res 40(10):4288-4297. doi:10.1093/nar/gks042
- Chen Y, Lun ATL, Smyth GK. 2016. From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline. F1000Res 5:1438. doi:10.12688/f1000research.8987.2
- Chen Y, Chen L, Lun ATL, Baldoni PL, Smyth GK. 2025. edgeR v4: powerful differential analysis of sequencing data with expanded functionality and improved support for small counts and larger datasets. Nucleic Acids Res 53(2):gkaf018. doi:10.1093/nar/gkaf018
- Lund SP, Nettleton D, McCarthy DJ, Smyth GK. 2012. Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Stat Appl Genet Mol Biol 11(5):Article 8. doi:10.1515/1544-6115.1826
- Phipson B, Lee S, Majewski IJ, Alexander WS, Smyth GK. 2016. Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Ann Appl Stat 10(2):946-963. doi:10.1214/16-AOAS920
- Law CW, Chen Y, Shi W, Smyth GK. 2014. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15(2):R29. doi:10.1186/gb-2014-15-2-r29
- Liu R, Holik AZ, Su S, Jansz N, Chen K, Leong HS, Blewitt ME, Asselin-Labat M-L, Smyth GK, Ritchie ME. 2015. Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses. Nucleic Acids Res 43(15):e97. doi:10.1093/nar/gkv412
- McCarthy DJ, Smyth GK. 2009. Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics 25(6):765-771. doi:10.1093/bioinformatics/btp053
- Baldoni PL, Chen Y, Hediyeh-zadeh S, Liao Y, Dong X, Ritchie ME, Shi W, Smyth GK. 2024. Dividing out quantification uncertainty allows efficient assessment of differential transcript expression with edgeR. Nucleic Acids Res 52(3):e13. doi:10.1093/nar/gkad1167
- Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. 2015. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43(7):e47. doi:10.1093/nar/gkv007
- Schurch NJ et al. 2016. How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA 22(6):839-851. doi:10.1261/rna.053959.115
- Robinson MD, Oshlack A. 2010. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 11(3):R25. doi:10.1186/gb-2010-11-3-r25
Related Skills
- deseq2-basics - Cross-check, or use when LFC shrinkage (apeglm) needed
- de-results - FDR, IHW, GSEA preparation, padj column name variation
- de-visualization - BCV plot, MD plot, PCA via plotMDS
- batch-correction - Include batch in design vs correct-then-test cardinal sin
- timeseries-de - voom + splines for time-course
- expression-matrix/counts-ingest - Salmon/kallisto input via tximport or catchSalmon
- expression-matrix/normalization - TMM/TMMwsp/RLE mechanics
- expression-matrix/metadata-joins - Reference level, paired design, interaction parameterization
- expression-matrix/gene-id-mapping - Annotating DE results with symbols
- rna-quantification/tximport-workflow - Detailed tximport mechanics
- pathway-analysis/gsea - Ranked-list input from DE