Run any Skill in Manus with one click

bio-differential-expression-de-results

Stars943

Forks165

UpdatedMay 29, 2026 at 16:17

Extracts, filters, annotates, and exports differential expression results from DESeq2 or edgeR with proper handling of padj=NA (independent filtering, Cook's outliers, all-zero), multiple-testing correction choice (BH vs Storey q-value vs IHW vs lfsr), TREAT vs post-hoc fold-change filtering, p-value histogram diagnostics, gene annotation via org.db/biomaRt/mygene, GSEA preranked input, ORA background construction, replication reality (Schurch 2016 small-n result), and SABV/sex-stratified reporting. Use when extracting and interpreting DE results, troubleshooting padj=NA, choosing FDR method, preparing ranked lists for pathway analysis, annotating gene IDs, or comparing DESeq2 vs edgeR outputs.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

GPTomics

GPTomics/bioSkills

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Software DevelopersComputer and Mathematical Occupations·SOC 15-1252

File Explorer

5 files

SKILL.md

readonly

Version Compatibility

Reference examples tested with: DESeq2 1.42+, edgeR 4.0+, IHW 1.34+, qvalue 2.34+, ashr 2.2+, AnnotationDbi 1.66+, org.Hs.eg.db 3.18+, biomaRt 2.58+, mygene 1.38+ (Python), dplyr 1.1+, openxlsx 4.2+

Before using code patterns, verify installed versions match. If versions differ:

R: packageVersion('<pkg>') then ?function_name to verify parameters
Python: pip show <package> then help(module.function) to check signatures

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

DE Results

"What are my significant genes?" -> Extract DE estimates and p-values from the fitted model, handle missing padj correctly, apply FDR control appropriate to the design, and produce the table or ranked list the downstream tool actually needs.

The Single Most Important Modern Insight -- `padj = NA` has three distinct meanings

A NA in the padj column is not a missing value; it is a flag indicating which filter excluded the gene. The three causes -- independent filtering, Cook's distance outlier, and all-zero in a group -- have completely different remediations. Dropping all NA rows blindly silently discards real signal, most often from low-count master regulators (transcription factors expressed at ~10 counts) that pass biology but fail the data-driven baseMean threshold.

`padj = NA` cause	DESeq2 detection	What it means	Fix if undesired
Independent filtering	finite `pvalue`, `NA` `padj`, baseMean below auto threshold	Removed before BH adjustment to maximize rejections at `alpha`	`results(dds, independentFiltering = FALSE)` OR `filterFun = ihw`
Cook's distance outlier	`NA` `pvalue`, `NA` `padj`, baseMean > 0, group has >=3 reps	One sample has Cook's > `qf(0.99, p, m-p)`	`results(dds, cooksCutoff = FALSE)`
All-zero or near-zero in a group	`NA` `pvalue` AND baseMean very low	Insufficient information to test	Filter at preprocess time; or accept

Independent filtering (Bourgon, Gentleman, Huber 2010 PNAS 107:9546) chooses the baseMean threshold to maximize rejections. The filter MUST be independent of the test statistic under the null -- this is why baseMean (the across-sample mean) is the canonical choice. Using "min count in treatment group" as a filter VIOLATES the independence requirement and inflates type-I error. Most pipelines unknowingly do this; do not.

A second axis: at n>=7 per group, DESeq() also REPLACES outlier counts via replaceOutliers() and refits (default minReplicatesForReplace = 7). Cook's filtering is NOT computed for continuous covariates -- a continuous-covariate analysis has effectively no outlier filtering.

Algorithmic Taxonomy

Method	What it computes	When to use	Failure mode
BH (`p.adjust(method='BH')`, DESeq2 default `pAdjustMethod='BH'`)	FDR at fixed alpha; Benjamini-Hochberg 1995	Default for most RNA-seq DE	Assumes independence or PRDS; many overlapping tests violate
Storey q-value (`qvalue::qvalue`)	q-value using estimated pi_0	Genome-scale with many true nulls	Pi_0 estimation can fail at small test counts
IHW (`results(filterFun=ihw)`, Ignatiadis 2016 Nat Methods 13:577)	Weighted BH with covariate-informed weights	Modern default for DESeq2; +5-20% discoveries at same FDR	Covariate MUST be independent under null
ashr local false sign rate (`lfsr` / `svalue`, with `svalue=TRUE`)	P(sign of estimate is wrong)	When effect-direction certainty is what matters	Conservative lower bound on FDR; not interchangeable with padj
BY (`p.adjust(method='BY')`)	Benjamini-Yekutieli; arbitrary dependence	Strongly correlated tests	Uniformly conservative; rarely needed for DE
Holm / Bonferroni	FWER	Small confirmatory test sets	Far too conservative for genome-scale
TREAT / `lfcThreshold=`	FDR for "	LFC	> tau" hypothesis

Decision Tree by Scenario

Scenario	Recommended approach	Why
Standard bulk DE, two groups	DESeq2 results with default BH; report padj < 0.05	Default works
Want more power at same FDR	`results(dds, filterFun = ihw)`	IHW typically gains 5-20%
Pre-specified biological fold-change matters	`results(dds, lfcThreshold = log2(1.5), altHypothesis = 'greaterAbs')` OR `glmTreat(fit, lfc = log2(1.5))`	Post-hoc `padj<0.05 & abs(LFC)>1` does NOT control FDR for the magnitude claim
Ranking for GSEA preranked	`stat` (Wald Z) for DESeq2 OR shrunken LFC	Never use unshrunken LFC -- low-count noise dominates
ORA input	Subset by padj<0.05; background = ALL TESTED genes (post-independent-filtering)	Background = "all genes in genome" is wrong; pre-filtering already excluded many
Many NA padj including biologically interesting genes	Diagnose: independent filtering vs Cook's vs all-zero; turn off the offending filter only for that gene set	Blanket `na.omit` discards signal
Multi-condition design	LRT for "any change" first; pairwise per-level Wald for effect sizes	LRT padj is omnibus; LRT LFC is one specific coefficient
Small n (<=3/group)	Report as exploratory, top hits only	Schurch 2016: tools miss 20-40% of true positives at n=3
Human / mouse with mixed sexes	Include sex as covariate; run sex-stratified sensitivity	SABV mandate; sex effect is real and chromosomal
Prokaryotic	Use Prokka/Bakta GFF, KEGG strain code	Ensembl/org.db are eukaryote-only

Extracting Results

Goal: Pull DE estimates and p-values from a fitted DESeq2 or edgeR object into a usable data frame with explicit contrast naming.

Approach: results() (DESeq2) or topTags() (edgeR) with explicit name= / coef=; convert to data.frame; preserve row order if planning to join with annotation.

library(DESeq2)
library(dplyr)

resultsNames(dds)
res <- results(dds, name = 'condition_treated_vs_control', alpha = 0.05)
res_shrunk <- lfcShrink(dds, coef = 'condition_treated_vs_control', type = 'apeglm')

res_df <- as.data.frame(res)
res_df$gene <- rownames(res_df)

library(edgeR)
tt <- topTags(qlf, n = Inf, sort.by = 'none')$table
tt$gene <- rownames(tt)

sort.by = 'none' in topTags preserves the original gene order -- critical when joining with an annotation table by row index. Default is sort by p-value.

Column name reminder (a recurring cross-tool bug):

Tool	LFC column	Adjusted p-value column
DESeq2	`log2FoldChange`	`padj`
edgeR	`logFC`	`FDR`
limma `topTable`	`logFC`	`adj.P.Val`
limma `topTreat`	`logFC`	`adj.P.Val` (post-TREAT)

TREAT vs Post-hoc LFC Filtering

Goal: Make a defensible FDR claim about "biologically meaningful fold change" genes.

Approach: Use TREAT or lfcThreshold= to test a magnitude hypothesis with proper FDR control. Post-hoc filtering of padj<0.05 & abs(LFC)>tau does NOT control FDR for the magnitude claim.

res_treat <- results(dds, lfcThreshold = log2(1.5), altHypothesis = 'greaterAbs', alpha = 0.05)

# edgeR equivalent
tr <- glmTreat(fit, coef = 2, lfc = log2(1.5))

What a reviewer is really probing with a "200 genes >2x changed at FDR 5%" claim: is the FDR for the change>2x claim or for the change-non-zero claim? Post-hoc filtering controls FDR only for the latter. TREAT (or lfcThreshold=) controls FDR for the former. McCarthy & Smyth 2009 Bioinformatics 25:765 is the canonical citation.

IHW for Better Power

Goal: Gain 5-20% more discoveries at the same FDR by weighting p-values with a covariate (typically baseMean) that informs power but is independent of the null.

Approach: results(dds, filterFun = ihw) replaces independent filtering with Ignatiadis 2016 hypothesis weighting.

library(IHW)
res_ihw <- results(dds, filterFun = ihw, alpha = 0.05)

When IHW does NOT help:

Small number of tests (<5000 after filtering) -- not enough data to learn weights
Covariate is treatment-correlated (violates the null-independence requirement)
Covariate uninformative about test power

Storey q-value as an alternative (different framework -- estimates pi_0 fraction of true nulls):

library(qvalue)
qv <- qvalue(res$pvalue[!is.na(res$pvalue)])
res$qvalue <- NA
res$qvalue[!is.na(res$pvalue)] <- qv$qvalues

ashr lfsr (local false sign rate -- probability the estimated direction is wrong):

res_ashr <- lfcShrink(dds, coef = 'condition_treated_vs_control',
                       type = 'ashr', svalue = TRUE)
res_ashr$svalue  # FDR-like, based on lfsr; requires svalue=TRUE

svalue=TRUE is required to populate the svalue column; the default returns the standard pvalue/padj columns only. lfsr and padj are NOT interchangeable. lfsr asks "P(sign wrong)"; padj asks "expected fraction of false discoveries". When reporting, state which.

P-value Histogram Diagnostics

Goal: Diagnose model misspecification, hidden batch effects, or over-correction by inspecting the raw p-value distribution.

Approach: Plot raw p-values; under a correctly specified null, the histogram is uniform with an upward spike near zero (the true DE genes).

library(ggplot2)
ggplot(res_df, aes(x = pvalue)) +
    geom_histogram(bins = 50, fill = 'steelblue', color = 'white') +
    labs(x = 'P-value', y = 'Frequency', title = 'P-value distribution') +
    theme_bw()

Shape	Meaning	Action
Uniform + spike near 0	Correct: null genes uniform, true DE near 0	Proceed
Anti-conservative (U-shape; both ends spiked)	Hidden batch effect, unmodeled confounder, dispersion misspecified	Inspect PCA for batch; add covariate; check `plotDispEsts`
Conservative (depleted near 0, spike near 1)	Over-correction; too many covariates; wrong dispersion	Simplify model; check dispersion plot for excess shrinkage
Spike only at p = 1	Discrete artifact from very-low-count genes	Pre-filter more aggressively
Bimodal with spike at 0.5	Unusual; suggests a discrete categorical test masquerading	Investigate

The histogram is one of the cheapest sanity checks in a DE pipeline; always plot it before believing the gene list.

Filtering and Ordering

Goal: Subset to significant genes and rank by p-value, fold change, or expression level for downstream use.

Approach: dplyr-style filter + arrange; handle NA padj explicitly per the three-meanings table at the top.

sig <- res_df %>%
    filter(!is.na(padj), padj < 0.05, abs(log2FoldChange) > 1, baseMean > 10) %>%
    arrange(padj)

# Up- vs down-regulated
up   <- sig %>% filter(log2FoldChange > 0)
down <- sig %>% filter(log2FoldChange < 0)

# Summary
n_tested <- sum(!is.na(res$padj))
n_sig    <- sum(res$padj < 0.05, na.rm = TRUE)
cat(sprintf('Tested: %d   Significant (padj<0.05): %d   Up: %d   Down: %d\n',
            n_tested, n_sig, sum(sig$log2FoldChange > 0), sum(sig$log2FoldChange < 0)))

Gene Annotation

Goal: Map gene IDs to symbols, descriptions, and cross-database identifiers for human-readable results.

Approach: Prefer AnnotationDbi::mapIds with org.db (fast, local, version-pinned); fall back to biomaRt or mygene for symbols/aliases not in org.db; for prokaryotes, use Prokka/Bakta GFF.

library(org.Hs.eg.db)
library(AnnotationDbi)

res_df$symbol <- mapIds(org.Hs.eg.db, keys = sub('\\..*', '', res_df$gene),
                         keytype = 'ENSEMBL', column = 'SYMBOL', multiVals = 'first')
res_df$entrez <- mapIds(org.Hs.eg.db, keys = sub('\\..*', '', res_df$gene),
                         keytype = 'ENSEMBL', column = 'ENTREZID', multiVals = 'first')

The sub('\\..*', '', ...) strips the Ensembl version. CAUTION: this regex destroys the _PAR_Y suffix in GENCODE 25-43 PAR genes -- use sub('\\.[0-9]+(_PAR_Y)?$', '\\1', ...) to preserve. See expression-matrix/gene-id-mapping for full details.

For HGNC symbols changed since 2020 (SEPT1 -> SEPTIN1, MARCH1 -> MARCHF1, MARC1 -> MTARC1, DEC1 -> DELEC1) old symbol-keyed downstream tools silently drop genes. Always join on stable Ensembl or Entrez IDs; use symbols as display labels only.

For prokaryotes:

library(rtracklayer)
gff <- import('annotation.gff3')
gene_info <- as.data.frame(gff[gff$type == 'gene',
                                c('locus_tag', 'Name', 'product')])
res_annotated <- merge(res_df, gene_info, by.x = 'gene',
                        by.y = 'locus_tag', all.x = TRUE)

GSEA Preranked Input

Goal: Produce a ranked list of all genes (no significance filter) for fgsea / clusterProfiler GSEA.

Approach: Rank by Wald statistic (DESeq2 stat) or shrunken LFC. NEVER use a filtered set as GSEA input -- GSEA's permutation null requires the full background.

gsea_ranks <- res_df$stat
names(gsea_ranks) <- res_df$gene
gsea_ranks <- sort(gsea_ranks[!is.na(gsea_ranks)], decreasing = TRUE)

# edgeR equivalent
gsea_ranks_edger <- sign(tt$logFC) * -log10(tt$PValue)
names(gsea_ranks_edger) <- rownames(tt)
gsea_ranks_edger <- sort(gsea_ranks_edger[is.finite(gsea_ranks_edger)],
                         decreasing = TRUE)

stat (Wald Z) is preferred over raw LFC for GSEA because it combines effect and precision in one number. Unshrunken LFC is dominated by low-count noise.

ORA Input

Goal: Run over-representation analysis (enrichGO, enrichKEGG) on a significant gene list with the correct background.

Approach: Subset to padj<0.05; background = ALL TESTED genes (post-independent-filtering), NOT the genome.

library(clusterProfiler)

sig_entrez <- na.omit(res_df$entrez[res_df$padj < 0.05])
bg_entrez  <- na.omit(res_df$entrez[!is.na(res_df$padj)])

ora <- enrichGO(gene          = sig_entrez,
                universe      = bg_entrez,
                OrgDb         = org.Hs.eg.db,
                keyType       = 'ENTREZID',
                ont           = 'BP',
                pAdjustMethod = 'BH')

Common mistake: omitting universe= lets clusterProfiler default to "all annotated genes for this organism" -- which includes thousands of genes never tested. The resulting enrichment p-values are wrong (too small). The background MUST be the tested set.

Cross-Tool Concordance Check

deseq2_sig <- rownames(subset(deseq2_res, padj < 0.05))
edger_sig  <- rownames(subset(edger_tt,  FDR  < 0.05))

common      <- intersect(deseq2_sig, edger_sig)
deseq2_only <- setdiff(deseq2_sig, edger_sig)
edger_only  <- setdiff(edger_sig, deseq2_sig)

cat(sprintf('DESeq2 sig: %d   edgeR sig: %d   Common: %d (%.1f%%)\n',
            length(deseq2_sig), length(edger_sig), length(common),
            100 * length(common) / min(length(deseq2_sig), length(edger_sig))))

Concordance >70% at the top 500: robust. <60%: suspect filtering, normalization, or design difference -- not a tool difference. Run both pipelines with the same filtering and design to isolate.

Per-Method Failure Modes

Dropped a key gene by removing NAs

Trigger: Pipeline does res_df <- na.omit(res_df); downstream gene of interest is missing from results.

Mechanism: Gene was flagged by independent filtering OR Cook's distance; padj is NA but the biology is real.

Symptom: A gene with clear differential expression in the count matrix is absent from the results table.

Fix: Diagnose which filter fired (independent filtering vs Cook's vs all-zero); rerun results() with the appropriate filter off (independentFiltering = FALSE or cooksCutoff = FALSE).

Reported FDR on a magnitude-filtered gene set

Trigger: Methods section says "genes with padj < 0.05 and abs(LFC) > 1 (FDR < 5%)".

Mechanism: BH controls FDR for the |LFC| > 0 hypothesis, not the |LFC| > 1 hypothesis. The post-hoc filter adds no FDR control.

Symptom: Reviewer challenges the FDR claim; replication studies show many of the filtered genes are not the magnitude expected.

Fix: Use TREAT (glmTreat) or lfcThreshold= to test the magnitude hypothesis with proper FDR control. Re-do the methods sentence to match what was actually computed.

ORA universe wrong

Trigger: ORA p-values look implausibly small for a small significant gene set.

Mechanism: universe= argument omitted; clusterProfiler defaulted to all annotated genes in the organism, including thousands never in the tested set.

Symptom: Many enriched pathways at strict thresholds; results don't replicate; reviewer questions the background.

Fix: Explicitly pass universe = bg_entrez where bg_entrez is the set of tested gene IDs (i.e., those with non-NA padj).

"Significant" gene list in n=3 study doesn't replicate

Trigger: Small RNA-seq study finds 200 DE genes; validation in independent cohort recovers 60.

Mechanism: Schurch 2016 RNA 22:839: at n=3/group, all tools miss 20-40% of true positives compared to n=30. Variability of the gene list itself is high.

Symptom: 30-50% replication of the gene list across independent runs of the SAME data.

Fix: Frame the small-n DE list as hypothesis-generating, not as a stable set of facts. Validate top hits orthogonally before drawing conclusions. Use TREAT for biologically meaningful thresholds to require larger effects.

Sex-confounded design gives spurious chrX/chrY signal

Trigger: Mixed-sex cohort; sex not in the design; many chrY genes call as DE.

Mechanism: Sex distribution differs across the experimental groups; the "treatment effect" partially captures sex.

Symptom: chrY genes (DDX3Y, RPS4Y1, UTY) and XIST dominate the top DE list.

Fix: Include sex in the design (~ sex + condition); rerun. For chrX/chrY-specific analyses, sex MUST be in the model or the analysis is uninterpretable. Mauvais-Jarvis et al. 2020 Lancet 396:565 reviews the SABV requirement.

Common errors

Error / symptom	Cause	Fix
`$FDR` not found on DESeq2 result	DESeq2 uses `padj`; edgeR uses `FDR`	Check tool, use correct column
`summary(res)` shows different cutoff than `results(alpha=)`	`summary(res, alpha=)` defaults to 0.1	Pass `alpha` explicitly to `summary()`
All `padj` NA	All genes filtered (rare; usually a data problem)	Check `independentFilteringResults(res)`; inspect baseMean distribution
Direction of LFC reversed	Reference level not set; alphabetical default	`relevel()` BEFORE `DESeq()`
Gene symbol mapping rate <50%	Mixed Ensembl versions; recent HGNC renames	Verify Ensembl release, check for SEPT/MARCH/MARC renames
`enrichGO` reports thousands of pathways	Wrong `universe=`	Pass `universe = bg_entrez` (tested set, not genome)

References

Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57(1):289-300. doi:10.1111/j.2517-6161.1995.tb02031.x
Storey JD. 2003. The positive false discovery rate: a Bayesian interpretation and the q-value. Ann Stat 31(6):2013-2035. doi:10.1214/aos/1074290335
Ignatiadis N, Klaus B, Zaugg JB, Huber W. 2016. Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nat Methods 13(7):577-580. doi:10.1038/nmeth.3885
Bourgon R, Gentleman R, Huber W. 2010. Independent filtering increases detection power for high-throughput experiments. PNAS 107(21):9546-9551. doi:10.1073/pnas.0914005107
Stephens M. 2017. False discovery rates: a new deal. Biostatistics 18(2):275-294. doi:10.1093/biostatistics/kxw041
McCarthy DJ, Smyth GK. 2009. Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics 25(6):765-771. doi:10.1093/bioinformatics/btp053
Schurch NJ et al. 2016. How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA 22(6):839-851. doi:10.1261/rna.053959.115
Love MI, Huber W, Anders S. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15(12):550. doi:10.1186/s13059-014-0550-8
Mauvais-Jarvis F et al. 2020. Sex and gender: modifiers of health, disease, and medicine. Lancet 396(10250):565-582. doi:10.1016/S0140-6736(20)31561-0
Bruford EA et al. 2020. Guidelines for human gene nomenclature. Nat Genet 52:754-758. doi:10.1038/s41588-020-0669-3
Ziemann M, Eren Y, El-Osta A. 2016. Gene name errors are widespread in the scientific literature. Genome Biol 17:177. doi:10.1186/s13059-016-1044-7
Wu T et al. 2021. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation 2(3):100141. doi:10.1016/j.xinn.2021.100141

Related Skills

deseq2-basics - Generate DESeq2 results; design, contrasts, LRT, shrinkage
edger-basics - Generate edgeR results; QL F-test, TREAT, voom
de-visualization - P-value histogram, MA plot, volcano with shrunken LFC, heatmap, sample distance
batch-correction - Include batch in design (vs Nygaard 2016 cardinal sin)
timeseries-de - LRT-with-reduced-model patterns for time
expression-matrix/gene-id-mapping - ID conversion, HGNC renames, ortholog mapping
expression-matrix/metadata-joins - Sex covariate, paired design, sample swap detection
pathway-analysis/go-enrichment - ORA with proper background
pathway-analysis/gsea - GSEA preranked input from stat or shrunken LFC
pathway-analysis/kegg-pathways - KEGG with strain-specific organism codes
data-visualization/volcano-and-ma-plots - Custom volcano with apeglm-shrunken LFC

name	bio-differential-expression-de-results
description	Extracts, filters, annotates, and exports differential expression results from DESeq2 or edgeR with proper handling of padj=NA (independent filtering, Cook's outliers, all-zero), multiple-testing correction choice (BH vs Storey q-value vs IHW vs lfsr), TREAT vs post-hoc fold-change filtering, p-value histogram diagnostics, gene annotation via org.db/biomaRt/mygene, GSEA preranked input, ORA background construction, replication reality (Schurch 2016 small-n result), and SABV/sex-stratified reporting. Use when extracting and interpreting DE results, troubleshooting padj=NA, choosing FDR method, preparing ranked lists for pathway analysis, annotating gene IDs, or comparing DESeq2 vs edgeR outputs.
tool_type	r
primary_tool	DESeq2

bio-differential-expression-de-results

More from this repository

Version Compatibility

DE Results

The Single Most Important Modern Insight -- padj = NA has three distinct meanings

Algorithmic Taxonomy

Decision Tree by Scenario

Extracting Results

TREAT vs Post-hoc LFC Filtering

IHW for Better Power

P-value Histogram Diagnostics

Filtering and Ordering

Gene Annotation

GSEA Preranked Input

ORA Input

Cross-Tool Concordance Check

Per-Method Failure Modes

Dropped a key gene by removing NAs

Reported FDR on a magnitude-filtered gene set

ORA universe wrong

"Significant" gene list in n=3 study doesn't replicate

Sex-confounded design gives spurious chrX/chrY signal

Common errors

References

Related Skills

Version Compatibility

DE Results

The Single Most Important Modern Insight -- padj = NA has three distinct meanings

Algorithmic Taxonomy

Decision Tree by Scenario

Extracting Results

TREAT vs Post-hoc LFC Filtering

IHW for Better Power

P-value Histogram Diagnostics

Filtering and Ordering

Gene Annotation

GSEA Preranked Input

ORA Input

Cross-Tool Concordance Check

Per-Method Failure Modes

Dropped a key gene by removing NAs

Reported FDR on a magnitude-filtered gene set

ORA universe wrong

"Significant" gene list in n=3 study doesn't replicate

Sex-confounded design gives spurious chrX/chrY signal

Common errors

References

Related Skills

More from this repository

The Single Most Important Modern Insight -- `padj = NA` has three distinct meanings

The Single Most Important Modern Insight -- `padj = NA` has three distinct meanings