Run any Skill in Manus with one click

$pwd:

bio-causal-genomics-pleiotropy-detection

Name: Bio Causal Genomics Pleiotropy Detection
Author: GPTomics

// Detect and adjust for horizontal pleiotropy in two-sample Mendelian randomization by distinguishing uncorrelated (UHP) from correlated (CHP) pleiotropy and choosing among Egger, MR-PRESSO, MR-RAPS, CAUSE, LHC-MR, LCV, MR-Clust, MR-Mix, and contamination-mixture methods. Use when validating an MR causal claim, running the STROBE-MR sensitivity battery, suspecting a shared heritable confounder, working under weak-instrument or polygenic-exposure regimes, or reconciling discordant estimates across robust methods.

Run Skill in Manus

$ git log --oneline --stat

stars:604

forks:110

updated:May 12, 2026 at 20:23

File Explorer

7 files

SKILL.md

readonly

package.json

"author": "GPTomics"

"repository": "GPTomics/bioSkills"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name	bio-causal-genomics-pleiotropy-detection
description	Detect and adjust for horizontal pleiotropy in two-sample Mendelian randomization by distinguishing uncorrelated (UHP) from correlated (CHP) pleiotropy and choosing among Egger, MR-PRESSO, MR-RAPS, CAUSE, LHC-MR, LCV, MR-Clust, MR-Mix, and contamination-mixture methods. Use when validating an MR causal claim, running the STROBE-MR sensitivity battery, suspecting a shared heritable confounder, working under weak-instrument or polygenic-exposure regimes, or reconciling discordant estimates across robust methods.
tool_type	r
primary_tool	TwoSampleMR

Version Compatibility

Reference examples tested with: TwoSampleMR 0.5.11+, MendelianRandomization 0.9.0+, MR-PRESSO 1.0+, CAUSE 1.2.0+, MR-Clust 0.1.0+, MRMix 0.1+, mr.raps 0.4.1+ (GitHub), LHC-MR 0.0.0.9000+ (GitHub), LCV (script-based, no version tag), simex 1.8+.

Before using code patterns, verify installed versions match. If versions differ:

R: packageVersion('<pkg>') then ?function_name to verify parameters
For GitHub-only packages, check the repo HEAD vs the local install date

If code throws errors, introspect the installed package and adapt the example rather than retrying.

Pleiotropy Detection in Mendelian Randomization

"Validate my MR result against pleiotropic bias" -> Decompose violations of the exclusion-restriction assumption into uncorrelated horizontal pleiotropy (UHP, addressable by Egger / median / mode / MR-PRESSO) and correlated horizontal pleiotropy (CHP, addressable only by CAUSE / LHC-MR / LCV), then run a method battery whose assumptions span both regimes.

R: TwoSampleMR::mr() (IVW + Egger + median + mode), mr_pleiotropy_test(), mr_heterogeneity(), mr_leaveoneout(), directionality_test()
R: MRPRESSO::mr_presso() for UHP outlier removal + distortion test
R: cause::cause() for CHP-aware estimation; mrclust::mr_clust_em() for mechanism-heterogeneous instruments
R: MendelianRandomization::mr_conmix() for contamination mixture; MRMix::MRMix() for mixture-of-distributions

UHP vs CHP: The Central Postdoc-Grade Distinction

Horizontal pleiotropy comes in two regimes, and most "standard" MR sensitivity methods address only one of them.

Regime	Definition	InSIDE assumption	Methods that handle it
UHP (uncorrelated horizontal pleiotropy)	Pleiotropic effect alpha_j independent of instrument-exposure effect gamma_j	Holds	IVW (balanced UHP only), MR-Egger, weighted median, weighted mode, MR-PRESSO, MR-RAPS, MR-Mix, contamination mixture
CHP (correlated horizontal pleiotropy)	alpha_j correlates with gamma_j through a shared upstream factor (heritable confounder, network mediator)	Violated	CAUSE, LHC-MR, LCV, MR-Clust (partial), Steiger-filtered MR (partial)

InSIDE = INstrument Strength Independent of Direct Effect (Bowden 2015 IJE 44:512). Plain English: across SNPs, the per-SNP pleiotropic effect alpha and per-SNP instrument-exposure effect gamma are treated as independent random variables. CHP is the case where they covary because both flow from a shared upstream genetic factor.

The trap (Morrison 2020 Nat Genet 52:740): IVW, MR-Egger, MR-PRESSO, and GSMR are all blind to CHP. Under a shared heritable confounder they each return a plausible-looking corrected causal estimate that is systematically biased in the direction of the confounder. The MR-PRESSO global test does not flag CHP because correlated pleiotropy is not an outlier pattern, it is a population mean shift in the alpha distribution conditional on gamma.

Operational rule: If genetic correlation rg(exposure, outcome) is high (LDSC >= 0.3) or biology strongly suggests a shared upstream factor, the IVW / Egger / PRESSO triple is insufficient. Add CAUSE (preferred when sig SNPs >= 100) or LHC-MR (preferred for polygenic genome-wide IVs).

Operational Decision Flow (4 Steps)

Compute genetic correlation (LDSC). Run ldsc.py --rg <exposure.sumstats.gz>,<outcome.sumstats.gz> (see causal-genomics/genetic-correlation). If |rg| > 0.3, CHP is plausible -> flag for Step 3 escalation. If the LDSC rg standard error spans zero broadly, treat low-rg evidence as weak rather than confirming absence of CHP.
Standard battery. IVW (random-effects when Cochran Q p < 0.05) + MR-Egger (with NOME I^2_GX check) + weighted median + weighted mode + MR-PRESSO (NbDistribution >= 10000 for stringent reporting). Report all five with point estimate, SE, p, 95% CI, and n_SNPs_used. Compute Egger I^2_GX; apply SIMEX if I^2_GX < 0.9 (see examples/simex_egger_correction.R).
CHP escalation. Trigger when rg > 0.3 OR PRESSO global p < 0.05 with > 50% nominal outliers OR Egger / median / mode disagree by > 2 SE. Run CAUSE (if >= 100 significant SNPs after pruning) or LHC-MR (any N; uses genome-wide sumstats). Report ELPD delta + z + q (CHP fraction) + gamma (CHP-adjusted causal estimate).
Triangulate. Pre-MR Steiger filter; bidirectional MR (examples/bidirectional_mr.R); LCV gcp; LDSC rg report. Consensus across methods supports a publication-ready claim. Disagreement requires narrowing the scope (e.g., subgroup, cis-MR, time-varying analysis) rather than reporting a single point estimate.

Algorithmic Taxonomy

Method	Models	UHP-robust	CHP-robust	Min #SNPs	Fails when	Citation
Inverse-variance weighted (IVW)	Weighted regression through origin	Balanced UHP only	No	3	Directional UHP; CHP; weak IV bias; heterogeneity	Burgess 2013 Genet Epidemiol 37:658
MR-Egger intercept + slope	IVW + free intercept	Directional UHP	No	>=10 for power	NOME violated (I^2_GX < 0.9); <10 SNPs; CHP	Bowden 2015 IJE 44:512
Weighted median	Median of Wald ratios	Up to 50% invalid	No	>=10	>50% invalid; CHP	Bowden 2016 Genet Epidemiol 40:304
Weighted mode (MBE)	Mode of estimate density	Plurality valid	Partial	>=10	Multimodal estimates from CHP clusters	Hartwig 2017 IJE 46:1985
Cochran Q	Heterogeneity across Wald ratios	Total heterogeneity flag, not direction-specific	No	3	Cannot distinguish UHP from heterogeneity from CHP	Greco 2015 Stat Med 34:2926
MR-PRESSO	Detect + remove UHP outliers via RSS-out	Yes (assumes majority valid)	No	>=4	>50% pleiotropic; any CHP; small n	Verbanck 2018 Nat Genet 50:693
GSMR + HEIDI-outlier	Outlier removal via single-instrument estimate heterogeneity	Yes	No	>=10	CHP (HEIDI-outlier is heterogeneity-driven)	Zhu 2018 Nat Commun 9:224
MR-RAPS	Profile likelihood with overdispersion + Huber/Tukey loss	Yes; weak-IV robust	Partial via overdispersion	>=10	Strong CHP	Zhao 2020 Ann Stat 48:1742
MR-Mix	Mixture-of-distributions over valid + invalid	Yes	Partial	>=20	Few SNPs; very heterogeneous CHP	Qi & Chatterjee 2019 Nat Commun 10:1941
Contamination mixture	Profile likelihood over contamination fraction	Yes	Partial	>=20	Few SNPs	Burgess 2020 Stat Med 39:711
MR-Clust	k-means over Wald estimates with NULL cluster	Yes	Diagnostic for CHP via clusters	>=20	Single-mechanism exposure (no clustering signal)	Foley 2020 Bioinformatics 37:531
CAUSE	Bayesian mixture: shared causal + shared-factor (CHP) components	Yes	Yes (explicit)	>=100 sig SNPs at p<5e-8	<100 sig SNPs; non-overlapping GWAS samples	Morrison 2020 Nat Genet 52:740
LHC-MR	Latent heritable confounder + bidirectional + heritability	Yes	Yes	Genome-wide GWAS sumstats	Heritability mis-estimated; severe sample overlap	Darrous 2021 Nat Commun 12:7274
LCV (latent causal variable)	gcp parameter on genome-wide rg	N/A (not an MR method)	Diagnostic	Genome-wide GWAS sumstats	Heritability low; non-Gaussian effect distribution	O'Connor & Price 2018 Nat Genet 50:1728

Methodology evolves; verify against the Burgess & Thompson textbook (2nd ed 2021), Hemani 2018 (basic four-method battery), and Sanderson 2022 Nat Rev Methods Primers 2:6 before locking a sensitivity battery.

Decision Tree by Scenario

Scenario	Primary estimator	Sensitivity / triangulation
Many strong IVs, no biological shared trait suspected	IVW + Egger + weighted median + weighted mode + PRESSO	Cochran Q; leave-one-out; F-stat; Steiger filtering
LDSC rg(exposure, outcome) `>= 0.3` or strong shared-factor biology	CAUSE (if sig SNPs `>= 100`) OR LHC-MR	LCV gcp; cross-check IVW after Steiger filter
Many weak IVs (mean F < 20)	MR-RAPS with overdispersion + robust Huber loss	MR-Mix or contamination mixture; report F-stat range
Suspected heterogeneous causal mechanisms (e.g. LDL on CHD via multiple lipoprotein pathways)	MR-Clust; report per-cluster IVW	Pathway annotation of cluster instruments; Bayesian mixture
Cis-MR drug target (single locus, few SNPs in LD)	Colocalization (causal-genomics/colocalization-analysis) + Steiger	PWCoCo; conditional analysis; not Egger (low SNP count)
Polygenic exposure (heritability spread genome-wide; few significant loci)	LHC-MR (uses all SNPs)	LDSC rg; genome-wide IVW with weak-IV-aware methods (RAPS)
Reverse causation suspected	Bidirectional MR with Steiger filter; LHC-MR (jointly estimates both directions)	directionality_test; effect-size r2 comparison
Population-level summary discordant with biology	Re-examine instrument selection; check Winner's curse; LD pruning settings	Triangulate with cis-MR; family-based MR if available

Per-Method Failure Modes

MR-Egger NOME violation

Trigger: I^2_GX = (Q_GX - df) / Q_GX is below 0.9, indicating measurement-error attenuation of the Egger slope (NOME = "no measurement error" in the exposure GWAS effect sizes).

Mechanism: MR-Egger regresses outcome effects on exposure effects with a free intercept. Imprecise exposure effects (high beta.exposure SE relative to beta.exposure variability across instruments) introduce regression dilution that pulls the Egger slope toward the null and inflates the intercept.

Symptom: Egger slope much closer to zero than IVW, weighted median, and weighted mode estimates; large Egger SE.

Fix: Apply SIMEX correction (Bowden 2016 IJE 45:1961; Cook & Stefanski 1994 JASA 89:1314 SIMEX framework) using the simex package on the Egger regression, treating beta.exposure SE as measurement error. See examples/simex_egger_correction.R. Alternative: use MR-RAPS, which models the exposure-effect error explicitly via profile likelihood and does not suffer the NOME failure.

MR-PRESSO majority-outlier breakdown

Trigger: More than 50% of instruments are pleiotropic (UHP), e.g. when instrument set was loosely selected (genome-wide significant but unfiltered).

Mechanism: MR-PRESSO's global RSS-out statistic and outlier detection both assume a majority-valid set; outliers are defined relative to that majority. With a pleiotropic majority, PRESSO removes the valid minority.

Symptom: PRESSO-corrected estimate is similar in magnitude (and sign) to the uncorrected estimate even after dropping nominally "outlier" SNPs; distortion-test p-value paradoxically non-significant; few or no outliers detected despite obvious global-test significance.

Fix: Do not trust PRESSO corrected estimate. Re-examine instrument selection (drop loose p-thresholds, prune LD harder); switch to CAUSE or LHC-MR; consider weighted-mode estimator which is plurality-valid rather than majority-valid.

MR-PRESSO false negative under CHP

Trigger: Strong shared heritable confounder (high rg) producing CHP. Confirmed by significant LDSC rg or LCV gcp.

Mechanism: Correlated pleiotropy is a population-level mean shift in alpha conditional on gamma; it is not an outlier pattern. PRESSO's RSS-out distance is invariant under such a mean shift, so the global test is not powered against CHP.

Symptom: PRESSO global p > 0.05 (no detected pleiotropy) while a CHP-aware method (CAUSE, LHC-MR) returns a substantially different (often null) causal estimate.

Fix: When CHP is plausible, ALWAYS run CAUSE or LHC-MR in addition to PRESSO; do not rely on PRESSO global non-significance as evidence of no pleiotropy.

MR-Egger underpowered with few SNPs

Trigger: Fewer than 10 instruments.

Mechanism: Egger's intercept variance is driven by the spread of beta.exposure across instruments; with few SNPs the intercept CI is so wide that even strongly pleiotropic data give non-significant intercepts.

Symptom: Non-significant Egger intercept p-value alongside obviously discordant IVW and weighted-median estimates.

Fix: Report intercept point estimate and CI rather than a binary "pleiotropy present / absent" verdict; do not use Egger as the only sensitivity method when SNP count is low; weight evidence toward weighted-median, weighted-mode, and CAUSE / LHC-MR.

Steiger filter inverted by exposure measurement error (Hemani & Tilling 2022 IJE)

Trigger: Exposure is imprecisely measured (lower heritability ascertained in the exposure GWAS) and outcome is well-measured.

Mechanism: Steiger compares r^2_GX vs r^2_GY per SNP. Measurement error in the exposure underestimates r^2_GX; well-measured outcome captures r^2_GY accurately. Per-SNP, the inequality can flip even when the true causal direction is exposure -> outcome.

Symptom: A large fraction of instruments fail Steiger (steiger_dir == FALSE) in a direction that conflicts with biological plausibility.

Fix: Interpret Steiger as one signal among many, not a hard gate; cross-check with bidirectional MR; verify exposure GWAS heritability and sample size; switch to LHC-MR which models both directions jointly and accounts for heritability.

CAUSE underpowered with few significant SNPs

Trigger: Fewer than 100 genome-wide-significant instruments (p < 5e-8) after harmonization and LD pruning.

Mechanism: CAUSE fits a Bayesian mixture model over a shared-factor (CHP) component, a shared-causal component, and a null component. Posterior identification of the mixture weights requires substantial signal across many SNPs.

Symptom: CAUSE delta_ELPD CI crosses zero; Pareto-k diagnostic flags unstable points; posterior intervals on q (CHP fraction) span [0, 1].

Fix: Use LCV gcp for genome-wide directional inference (does not require many significant SNPs); use LHC-MR if heritability and sumstats are available; or report CAUSE alongside an explicit caveat about its underpowered regime.

LCV gcp under non-Gaussian effect distributions

Trigger: Highly polygenic trait with substantial sparsity in true effects (mixture of large-effect and zero-effect loci).

Mechanism: LCV assumes a bivariate normal model for effect sizes after LDSC adjustment. Sparse architectures (e.g. immune traits with HLA dominance) violate this and bias gcp estimates.

Symptom: LCV gcp point estimate appears extreme but heritability LDSC z-scores are modest; partitioned heritability shows extreme HLA enrichment.

Fix: Exclude HLA region from LDSC inputs; complement with CAUSE / LHC-MR; report gcp with awareness of the polygenicity caveat.

Quantitative Thresholds

Metric	Threshold	Source / rationale
F-statistic per SNP	`>=10` strong; `<10` weak	Burgess 2011 IJE 40:755 (rule of thumb); weak-IV bias toward observational confounded estimate
I^2_GX (NOME)	`>=0.9` Egger reliable	Bowden 2016 IJE 45:1961
I^2_GX (NOME) intermediate	0.6-0.9 SIMEX-corrected Egger	Bowden 2016 IJE
I^2_GX (NOME) severe	`<0.6` drop Egger; use MR-RAPS or CAUSE	Bowden 2016 IJE; SIMEX unreliable below 0.6
Egger min SNP count for adequate power	`>=10`	Bowden 2015 IJE 44:512
Cochran Q significance	p < 0.05 indicates heterogeneity	Greco 2015 Stat Med 34:2926
MR-PRESSO NbDistribution	1000 exploratory; `>=5000` publication; `>=10000` stringent	Verbanck 2018 Nat Genet 50:693 (Methods)
MR-PRESSO global test p	< 0.05 -> heterogeneity / outliers present	Verbanck 2018 Nat Genet
MR-PRESSO distortion test p	< 0.05 -> outliers materially shifted estimate; if `>= 0.05` report uncorrected IVW	Verbanck 2018 Nat Genet
MR-PRESSO min instruments	`>=4` to run; `>=10` for non-degenerate global test	Verbanck 2018 Nat Genet
MR-PRESSO SignifThreshold	0.05 default	Verbanck 2018 Nat Genet
MR-PRESSO majority-valid breakdown	Fails when `>50%` instruments pleiotropic	Verbanck 2018 Nat Genet (theoretical limit)
Weighted median validity	Robust to `<=50%` invalid IVs	Bowden 2016 Genet Epidemiol 40:304
Weighted mode validity	Plurality-valid (largest valid subset is most common estimate)	Hartwig 2017 IJE 46:1985
CAUSE min #SNPs	`>=100` p < 5e-8 SNPs after pruning	Morrison 2020 Nat Genet 52:740 (Supplement)
CAUSE delta_ELPD criterion	one-sided p < 0.05; z = delta_elpd / se(delta_elpd); z > 1.96 standard; z > 3.0 stringent	Morrison 2020 Nat Genet
LDSC rg suggesting CHP	`>= 0.3` flags need for CAUSE / LHC-MR	Operational rule; see causal-genomics/genetic-correlation
Steiger r^2 difference	Reverse-causal flag at any per-SNP r2_GY > r2_GX	Hemani 2017 PLoS Genet 13:e1007081
Standard sensitivity battery	IVW + Egger + median + mode + PRESSO + Steiger + LOO	Hemani 2018 eLife 7:e34408 / STROBE-MR 2021

LCV gcp interpretation thresholds (0, 0.5, 0.6, 1) are tabulated in usage-guide.md.

Standard Sensitivity Battery (Working Reference)

Goal: Run the canonical UHP-focused MR sensitivity suite on harmonized two-sample data.

Approach: Compute IVW + Egger + median + mode side-by-side; test Egger intercept and heterogeneity; run MR-PRESSO with >=5000 distributions for publication or >=10000 for stringent reporting; apply Steiger filter; leave-one-out; report all estimates.

library(TwoSampleMR)
library(MRPRESSO)

methods <- c('mr_ivw', 'mr_egger_regression', 'mr_weighted_median', 'mr_weighted_mode')
res_mr <- mr(dat, method_list = methods)

het <- mr_heterogeneity(dat)
pleio <- mr_pleiotropy_test(dat)
loo <- mr_leaveoneout(dat)
steiger <- directionality_test(dat)

isq <- Isq(dat$beta.exposure, dat$se.exposure)
nome_pass <- isq >= 0.9

presso <- mr_presso(
    BetaOutcome='beta.outcome', BetaExposure='beta.exposure',
    SdOutcome='se.outcome', SdExposure='se.exposure',
    OUTLIERtest=TRUE, DISTORTIONtest=TRUE,
    data=dat, NbDistribution=10000, SignifThreshold=0.05)

global_p <- presso$`MR-PRESSO results`$`Global Test`$Pvalue
outlier_p <- presso$`MR-PRESSO results`$`Outlier Test`$Pvalue
distortion_p <- presso$`MR-PRESSO results`$`Distortion Test`$Pvalue
n_outliers <- sum(outlier_p < 0.05, na.rm=TRUE)

Full working pipeline incl SIMEX, MR-RAPS, contamination mixture, and STROBE-MR table: examples/sensitivity_battery.R.

CAUSE for CHP-Aware Estimation

Goal: Distinguish causal from shared-factor (correlated horizontal pleiotropy) explanations of an exposure-outcome association.

Approach: Fit nuisance parameters (LD pruning + rho_GWAS sample-overlap correction) on a random SNP set; fit the sharing and causal posterior; compare ELPD (expected log predictive density) via Pareto-k smoothed importance sampling.

library(cause)
params <- est_cause_params(dat_cause, variants = pruned_subset_snps)
res_cause <- cause(X = dat_cause, variants = pruned_snps, param_ests = params)
elpd <- summary(res_cause)$tab

Full posterior extraction + reporting: examples/cause_analysis.R.

Interpreting CAUSE output:

q: posterior CHP fraction; 0 = no CHP, 1 = all instruments operate via the shared factor
eta: shared-factor effect on Y (the "confounder pathway" magnitude)
gamma: posterior causal effect of E on Y after partialling out CHP; report median + 95% credible interval
delta_ELPD (sharing - causal): negative -> causal model preferred; z = delta_elpd / se(delta_elpd); z > 1.96 standard, z > 3.0 stringent; one-sided p reported alongside posterior gamma
Pareto-k > 0.7 indicates unstable posterior on those points; if more than 10% of points are unstable, treat the posterior as unreliable; remediation: add more SNPs (loosen p-threshold one notch then re-prune in LD) or re-fit excluding flagged outliers

CAUSE requires sumstats from both exposure and outcome GWAS in matched effect-allele coding. The pruning step typically retains 100-5000 signature SNPs at LD r^2 < 0.01 in a 1 Mb window; nuisance estimation should use a larger random SNP subset (>= 100,000 genome-wide SNPs) to fit rho (sample overlap) stably.

MR-RAPS Loss Function and Overdispersion

Trigger: Weak instruments (mean F < 20) and/or suspected UHP requiring outlier-resistant estimation.

over.dispersion = TRUE always for MR (horizontal-pleiotropy variance is real, not noise; turning this off underestimates SE)
loss.function = 'huber' (default; outlier-resistant; suited to mild to moderate UHP)
loss.function = 'tukey' (more aggressive; downweights extreme outliers more; choose when many obvious outliers suspected)
loss.function = 'l2' (non-robust; equivalent to weighted least squares; do not use when UHP suspected)

Tukey is preferable when leave-one-out reveals 2+ SNPs single-handedly shifting the IVW estimate by > 1 SE.

MR-Clust for Mechanism Heterogeneity

Goal: When a single causal estimate is misleading because instruments operate through multiple causal mechanisms (e.g. LDL on CHD via multiple lipoprotein subfractions), identify clusters of instruments with similar per-SNP Wald ratios.

library(mrclust)
ratio_hat <- dat$beta.outcome / dat$beta.exposure
ratio_se <- abs(dat$se.outcome / dat$beta.exposure)
res_mc <- mr_clust_em(theta=ratio_hat, theta_se=ratio_se,
                     bx=dat$beta.exposure, by=dat$beta.outcome,
                     bxse=dat$se.exposure, byse=dat$se.outcome,
                     obs_names=dat$SNP)
per_cluster <- res_mc$results$best

Clusters with cluster_class = 'null' are pleiotropy-only instruments. Per-cluster IVW estimates may differ substantially; biological annotation of the SNPs in each cluster (pathway, target gene) is the interpretation step.

LHC-MR Workflow

Goal: Jointly estimate forward causal effect, reverse causal effect, and the heritable-confounder contribution from genome-wide sumstats (not just significant SNPs).

library(lhcMR)
ld <- list(ld_score_path='ldsc/eur_w_ld_chr/', hapmap_snp_path='ldsc/w_hm3.snplist')
sp_list <- sp_estimate(trait.df=list(X=df_x, Y=df_y), ld=ld)
res_lhc <- lhc_mr(sp_list, sample_overlap=overlap_matrix, n_cores=4)

LHC-MR is computationally heavy (hours on full sumstats) but among the most rigorous CHP-aware estimators when both GWAS are well-powered. Output includes axx, ayy, hxy (confounder effect on each trait), and bidirectional alpha_xy, alpha_yx.

Choosing CAUSE vs LHC-MR (Darrous 2021):

Condition	Preferred method
`>= 100` genome-wide significant SNPs after pruning	CAUSE (Bayesian; CHP-explicit; mature posterior diagnostics)
Polygenic exposure with few significant loci	LHC-MR (uses genome-wide signal, not just significant SNPs)
Severe sample overlap between exposure and outcome GWAS	LHC-MR (jointly models overlap); CAUSE's rho correction is exposed to misspecification at high overlap
Bidirectionality of central interest	LHC-MR (jointly estimates alpha_xy and alpha_yx); CAUSE only models forward
Limited compute / quick turnaround	CAUSE (minutes to hours); LHC-MR may be > 24h on full sumstats

When both apply, report both with the agreement / disagreement explicit in the discussion.

Bidirectional MR Procedure

Forward MR: instrument exposure E, test effect on outcome Y (primary)
Reverse MR: instrument outcome Y, test effect on exposure E (using outcome-direction instruments)
Steiger pre-filter both directions: steiger_filtering(dat); drop SNPs where outcome r^2 > exposure r^2 before primary IVW
Compare estimates: null reverse + significant forward strengthens the forward causal claim; bidirectional significance flags feedback / shared confounder / reciprocal causation
LCV gcp orthogonal check: genome-wide directional inference independent of the instrument set

Working code: examples/bidirectional_mr.R.

Interpretation cheat-sheet:

Forward p	Reverse p	Reading
significant	non-significant	Forward causal claim strengthened
non-significant	significant	Re-examine instrument-exposure assignment; the "outcome" may causally drive the "exposure"
significant	significant	Feedback loop, shared confounder, or reciprocal causation; resolve with LHC-MR
non-significant	non-significant	No evidence of causation in either direction

When forward and reverse both clear Steiger and both IVW p < 0.05, run LHC-MR jointly rather than reporting two univariable estimates.

LCV (Latent Causal Variable)

LCV uses LDSC-merged genome-wide sumstats and reports gcp (genetic causality proportion) on [-1, 1]. It is a complement to, not a replacement for, MR; gcp ~ 0 with high LDSC rg implies pure genetic correlation without partial causation.

source('LCV/R/RunLCV.R')
res_lcv <- RunLCV(ldscores$L2, x$Z, y$Z)
# res_lcv$gcp; res_lcv$pval.gcpzero.2tailed

Full gcp interpretation table is in usage-guide.md.

Required Supplementary Tables

Instrument table (one row per SNP retained for primary analysis):

Column	Content
rsID, chr, pos	Variant identifier and genome position
EA, OA, EAF	Effect allele, other allele, effect allele frequency in exposure GWAS
beta_E, se_E, p_E, F	Exposure-side estimate, SE, p-value, per-SNP F-statistic
beta_Y, se_Y, p_Y	Outcome-side estimate, SE, p-value (harmonized to EA)
harmonization_action	1=kept, 2=flipped, 3=dropped (palindromic ambiguity)
palindromic_flag	TRUE / FALSE; tracked per Hartwig 2016 IJE 45:1717
Steiger_direction	forward / reverse / inconclusive
Steiger_p	per-SNP directionality p-value

Sensitivity-battery table (one row per method):

Column	Content
method	IVW / Egger / WM / WMode / PRESSO (raw + corrected) / RAPS / CAUSE
estimate, se, p, 95% CI	Point estimate and inference
n_SNPs_used	Post-harmonization, post-Steiger SNP count
heterogeneity_p	Q for IVW; Q' for Egger; global p for PRESSO
intercept_p	Egger only (directional UHP test)
ELPD_delta + z	CAUSE only (sharing - causal; negative +

Reconciliation Across Methods

Pattern	Likely cause	Action
IVW and Egger agree (small Egger intercept); median and mode agree	Likely true causal; minimal pleiotropy	Report all; STROBE-MR; emphasize agreement
IVW significant; Egger non-significant with similar slope; PRESSO global p > 0.05	Egger underpowered (few SNPs) OR Egger NOME violated	Check I^2_GX; SIMEX-correct if 0.6 < I^2_GX < 0.9
IVW shifted relative to Egger / median / mode; Egger intercept significant	Directional UHP	Trust Egger slope; PRESSO-corrected IVW; mode estimator
IVW + Egger + median + mode + PRESSO all agree but CAUSE delta_ELPD non-significant or in opposite direction	CHP via shared confounder	Trust CAUSE; report all five UHP methods alongside but flag the discordance; check LDSC rg
All UHP methods agree; LCV gcp ~ 0	Genetic correlation only, not causation	Causation evidence is weak; report rg explicitly; consider colocalization (cis-MR) instead
MR-Clust shows >=2 distinct non-null clusters	Heterogeneous mechanisms	Report per-cluster estimates; do not summarize as a single effect
Steiger fails on a substantial fraction of instruments	Reverse causation OR exposure measurement error	Run bidirectional MR; check exposure GWAS heritability; LHC-MR
MR-PRESSO global p < 0.05 but corrected estimate similar to uncorrected	>50% pleiotropic OR CHP masquerading as UHP	Re-prune instruments; switch to weighted-mode / CAUSE / LHC-MR

Operational rule for publication: Report IVW (primary), Egger slope + intercept, weighted median, weighted mode, MR-PRESSO global + distortion + corrected, Cochran Q, Steiger directionality, F-statistic distribution, I^2_GX (for Egger validity), and at least one CHP-aware method (CAUSE or LHC-MR) when rg >= 0.3 or biology suggests shared upstream. Failure to report a CHP-aware result when CHP is plausible is a reviewer-flagged red flag since 2020.

Anticipated Reviewer Pushback

Pushback	Standard response
"Did you check for CHP?"	LDSC rg reported (causal-genomics/genetic-correlation); if rg > 0.3, CAUSE or LHC-MR ran; q posterior reported
"Why CAUSE and not LHC-MR?"	CAUSE preferred when `>= 100` significant SNPs available (Morrison 2020). LHC-MR preferred when significant-SNP set is small or polygenic, using genome-wide sumstats (Darrous 2021)
"Egger NOME?"	I^2_GX computed; if 0.6 <= I^2_GX < 0.9, SIMEX correction applied; if < 0.6, Egger dropped in favor of MR-RAPS
"PRESSO doesn't catch CHP?"	Confirmed (Morrison 2020); CAUSE / LHC-MR reported alongside PRESSO for that reason
"Steiger filter applied pre-MR or post?"	Pre-MR: SNPs failing per-SNP Steiger directionality dropped before primary IVW
"Why no replication cohort?"	Two-sample design uses independent exposure and outcome cohorts; if same biobank, MRlap used or noted as a limitation
"Why not just trust the IVW?"	IVW assumes balanced UHP and no CHP; both violated routinely; sensitivity battery is the standard since STROBE-MR 2021
"Effect size is implausibly large"	Re-examine F-statistic distribution for weak IV bias; check Winner's curse; consider Wald ratio at a single strong instrument as sanity check

STROBE-MR Reporting (Skrivankova 2021)

Item	Required content
1-3	Title / abstract / background indicates this is an MR study; pre-registered protocol
4-7	Study design, data sources, instrument selection criteria (p-threshold, LD clumping, MAF)
8-11	Harmonization, palindrome handling, allele alignment
12-14	F-statistic distribution; weak-instrument bias mitigation
15-17	Primary MR method + all sensitivity methods + CHP-aware method when relevant
18-19	Pleiotropy tests, Steiger, heterogeneity
20	Discussion of remaining assumption violations; limitations

Sub-items (30 total) detail per-method reporting. The full statement (JAMA 326:1614) and explanation (BMJ 375:n2233) are now reviewer-required at most cardiovascular and psychiatric journals since 2022.

Common Errors

Error / symptom	Cause	Solution
MR-PRESSO crashes with `Not enough intrumental variables`	Fewer than 4 SNPs	Need >=4 for PRESSO; for cis-MR with few SNPs use colocalization
Egger intercept p < 0.05 but I^2_GX = 0.5	NOME violated; intercept is artifactually inflated	SIMEX-correct or do not trust Egger; use MR-RAPS instead
`Isq()` not found	TwoSampleMR version where Isq is unexported	Compute manually: Q_GX = sum((beta_GX/se_GX)^2); I2 = (Q_GX - (n-1))/Q_GX, clipped to [0,1]
MR-RAPS `package not found` after CRAN install	CRAN-archived 2025-03-01	`remotes::install_github('qingyuanzhao/mr.raps')`; call via `TwoSampleMR::mr_raps()` wrapper (MendelianRandomization does NOT export `mr_raps`)
CAUSE delta_ELPD CI spans zero; Pareto-k > 0.7	<100 sig SNPs OR severe sample overlap	Use LHC-MR; or report CAUSE with the explicit caveat
Steiger labels most instruments reverse-causal	Exposure GWAS imprecise OR sample size mismatch	Treat as one signal; cross-check with bidirectional MR
LHC-MR runtime > 24h	Default n_cores=1 on full sumstats	Use n_cores >= 4; restrict to LDSC-overlapping SNPs first
MR-PRESSO outliers all on same chromosome	Genome-wide LD not properly pruned; clumping window too narrow	Re-clump at r^2 < 0.001 in 10 Mb window
MR-Mix returns NA	Few SNPs OR no variation in mixture support	Need >=20 SNPs; default mixture grid may need tuning

References

Bowden J et al 2015 Int J Epidemiol 44:512 (MR-Egger; InSIDE)
Bowden J et al 2016 Int J Epidemiol 45:1961 (NOME, I^2_GX, SIMEX)
Cook JR & Stefanski LA 1994 JASA 89:1314 (SIMEX framework)
Burgess S 2020 Stat Med 39:711 (contamination mixture)
Burgess S & Thompson SG 2021 (Mendelian Randomization 2nd ed)
Darrous L et al 2021 Nat Commun 12:7274 (LHC-MR)
Foley CN et al 2020 Bioinformatics 37:531 (MR-Clust)
Hartwig FP et al 2017 Int J Epidemiol 46:1985 (weighted mode)
Hemani G et al 2017 PLoS Genet 13:e1007081 (Steiger orientation)
Hemani G et al 2018 eLife 7:e34408 (TwoSampleMR framework)
Hemani G & Tilling K 2022 Int J Epidemiol (Steiger limitations under measurement error)
Morrison J et al 2020 Nat Genet 52:740 (CAUSE; CHP)
O'Connor LJ & Price AL 2018 Nat Genet 50:1728 (LCV)
Sanderson E et al 2022 Nat Rev Methods Primers 2:6 (MR Primer)
Skrivankova VW et al 2021 JAMA 326:1614 (STROBE-MR statement); BMJ 375:n2233 (explanation)
Verbanck M et al 2018 Nat Genet 50:693 (MR-PRESSO)
Zhao Q et al 2020 Ann Stat 48:1742 (MR-RAPS)

Related Skills

causal-genomics/mendelian-randomization - Primary causal estimation that this sensitivity battery validates
causal-genomics/genetic-correlation - LDSC rg required for Step 1 of the decision flow; CHP escalation trigger
causal-genomics/colocalization-analysis - Required for cis-MR drug-target signals where instruments are too few for Egger
causal-genomics/fine-mapping - Identify causal variants underlying instrument loci
causal-genomics/mediation-analysis - Multivariable MR for mediator-adjusted causal estimates
population-genetics/association-testing - GWAS summary statistics underlying MR instruments
clinical-biostatistics/effect-measures - Translate MR estimates to clinical effect measures

name	bio-causal-genomics-pleiotropy-detection
description	Detect and adjust for horizontal pleiotropy in two-sample Mendelian randomization by distinguishing uncorrelated (UHP) from correlated (CHP) pleiotropy and choosing among Egger, MR-PRESSO, MR-RAPS, CAUSE, LHC-MR, LCV, MR-Clust, MR-Mix, and contamination-mixture methods. Use when validating an MR causal claim, running the STROBE-MR sensitivity battery, suspecting a shared heritable confounder, working under weak-instrument or polygenic-exposure regimes, or reconciling discordant estimates across robust methods.
tool_type	r
primary_tool	TwoSampleMR