| name | dowhy |
| description | Causal inference framework for answering "does X cause Y?" beyond correlation. DoWhy (Microsoft Research) provides the identify-estimate-refute loop: define a causal graph (DAG), identify the causal effect using backdoor/frontdoor/instrumental variable criteria, estimate treatment effects with multiple estimators, and validate results with automated refutation tests. Use when: distinguishing causation from correlation, estimating treatment effects (ATE, ATT, CATE), designing and analyzing A/B tests with confounders, using instrumental variables, performing counterfactual reasoning ("what would have happened if..."), validating causal claims with sensitivity analysis, working with observational data where randomization is impossible, or any analysis where the question is "what is the CAUSAL effect of X on Y" rather than just "how do X and Y relate?" |
DoWhy — Causal Inference
DoWhy answers the question every analyst actually wants answered: "Does X cause Y, or is it just correlated?" Correlation is everywhere. Causation requires structure — a causal graph that encodes which variables influence which. DoWhy's workflow is three steps: Identify (is the effect estimable from this graph?) → Estimate (compute the effect) → Refute (is this estimate robust?).
Core Mental Model
CORRELATION: X and Y move together. Could be:
X → Y (X causes Y)
Y → X (Y causes X)
X ← C → Y (C confounds both — spurious!)
CAUSATION: We need to know WHY they move together.
A causal graph (DAG) encodes our assumptions.
Then math tells us: "Given this graph,
CAN we estimate the causal effect from data?"
→ If yes: which variables to control for?
→ If no: what additional data do we need?
When to Use
- "Does this ad campaign actually increase sales, or do people who see ads already buy more?"
- "Does smoking cause cancer?" (observational data, can't randomize)
- "What would revenue have been if we hadn't changed the pricing?" (counterfactual)
- Any analysis where confounders exist and you have a theory about the causal structure.
When NOT to use: Pure prediction (use sklearn). Randomized controlled trials with no confounders (simple A/B test suffices). When you have no theory about the causal structure — you need at least a hypothesis about the DAG.
Reference Documentation
DoWhy docs: https://dowhy.readthedocs.io/en/latest/
GitHub: https://github.com/py-why/dowhy
Causal graph tutorials: https://dowhy.readthedocs.io/en/latest/tutorials.html
Search patterns: CausalModel, identify_effect, estimate_effect, refute_estimate
Core Principles
The Causal Graph (DAG)
A Directed Acyclic Graph where arrows mean "causes". A → B means A is a cause of B. This is your assumptions about the world — not learned from data. You draw it based on domain knowledge. The graph is what makes causal inference possible.
The Identify-Estimate-Refute Loop
- Identify: Given the DAG, is the causal effect of treatment on outcome estimable from observational data? Which variables must be controlled? (Backdoor criterion, frontdoor criterion, instrumental variables.)
- Estimate: Compute the effect using the identified strategy and a statistical method.
- Refute: Test robustness — would the estimate survive if our assumptions were wrong?
Confounders Are the Enemy
A confounder C is a common cause of both treatment and outcome: T ← C → Y. It creates spurious correlation. The entire point of causal inference is to block these backdoor paths.
Treatment Effect Types
- ATE (Average Treatment Effect): Effect across the whole population.
- ATT (Average Treatment Effect on the Treated): Effect on those who actually received treatment.
- CATE (Conditional ATE): Effect varies by subgroup — heterogeneous treatment effects.
Quick Reference
Installation
pip install dowhy networkx matplotlib pandas scikit-learn
Standard Imports
import dowhy
from dowhy import CausalModel
import pandas as pd
import numpy as np
Basic Pattern — Full Causal Pipeline
import numpy as np
import pandas as pd
from dowhy import CausalModel
np.random.seed(42)
n = 2000
C = np.random.randn(n)
T = (C + np.random.randn(n) > 0).astype(int)
Y = 2.0 * T + 1.5 * C + np.random.randn(n)
data = pd.DataFrame({'T': T, 'Y': Y, 'C': C})
graph = """
digraph {
C -> T;
C -> Y;
T -> Y;
}
"""
model = CausalModel(
data=data,
treatment=['T'],
outcome='Y',
graph=graph
)
identified_effect = model.identify_effect()
print(identified_effect)
estimate = model.estimate_effect(
identified_effect,
estimation_method='backdoor.linear_regression'
)
print(f"Estimated causal effect of T on Y: {estimate.value:.3f}")
refutation = model.refute_estimate(
estimate,
refutation_method='refute_placebo_treatment'
)
print(refutation)
Critical Rules
✅ DO
- Draw the graph BEFORE looking at data — The DAG encodes your causal assumptions. It must come from domain knowledge, not from data patterns. Drawing it after seeing correlations defeats the purpose.
- Include ALL known confounders in the graph — Missing a confounder = biased estimate. When in doubt, include it.
- Always run at least 2 refutation tests — Placebo treatment + random common cause is the minimum. A single estimate without refutation is untrustworthy.
- Use multiple estimators and compare — If backdoor.linear_regression and backdoor.propensity_score give wildly different answers, your model assumptions may be wrong.
- Check positivity (overlap) — Propensity score methods fail if treatment and control groups don't overlap in covariate space. Visualize distributions.
- Report confidence intervals, not just point estimates — Use
estimate.get_confidence_intervals().
- Interpret ATE in context — An ATE of 0.5 means nothing without knowing the scale of Y.
❌ DON'T
- Don't assume correlation = causation and skip the graph — This is the entire reason DoWhy exists.
- Don't use DoWhy without a causal hypothesis — If you have no theory about why X might cause Y, you cannot draw a valid DAG. Go do qualitative research first.
- Don't ignore refutation failures — If placebo treatment gives a non-zero effect, your estimate is suspect. Investigate, don't just report.
- Don't confuse "not statistically significant" with "no causal effect" — Underpowered studies fail to detect real effects. Sample size matters.
- Don't use backdoor adjustment when backdoor criterion isn't satisfied —
identify_effect() tells you which strategy is valid. Trust it.
- Don't assume linearity —
backdoor.linear_regression assumes linear relationships. If the truth is nonlinear, use backdoor.propensity_score_weighting or backdoor.propensity_score_matching.
Anti-Patterns (NEVER)
import numpy as np
import pandas as pd
from dowhy import CausalModel
data = pd.DataFrame({'T': [0,1,0,1], 'Y': [1,3,2,4], 'C': [0.5,1.2,0.8,1.5]})
graph = "digraph { C -> T; C -> Y; T -> Y; }"
model = CausalModel(data=data, treatment=['T'], outcome='Y', graph=graph)
identified = model.identify_effect()
estimate = model.estimate_effect(identified, estimation_method='backdoor.linear_regression')
print(f"Effect = {estimate.value}")
est_lr = model.estimate_effect(identified, estimation_method='backdoor.linear_regression')
est_psm = model.estimate_effect(identified, estimation_method='backdoor.propensity_score_matching')
est_psw = model.estimate_effect(identified, estimation_method='backdoor.propensity_score_weighting')
print(f"Linear Regression: {est_lr.value:.3f}")
print(f"Propensity Matching: {est_psm.value:.3f}")
print(f"Propensity Weighting: {est_psw.value:.3f}")
model.refute_estimate(est_lr, refutation_method='refute_placebo_treatment')
model.refute_estimate(est_lr, refutation_method='refute_random_common_cause')
model.refute_estimate(est_lr, refutation_method='refute_data_subset')
identified = model.identify_effect()
identified = model.identify_effect()
if identified.get_backdoor_variables():
estimate = model.estimate_effect(identified, estimation_method='backdoor.linear_regression')
elif identified.get_instrumental_variables():
estimate = model.estimate_effect(identified, estimation_method='iv.instrumental_variable')
else:
print("Effect is NOT identifiable from this graph. Need more data or stronger assumptions.")
Causal Graphs — Building Blocks
DAG Syntax
from dowhy import CausalModel
graph_simple = """
digraph {
C -> T;
C -> Y;
T -> Y;
}
"""
graph_multi = """
digraph {
C1 -> T; C1 -> Y;
C2 -> T; C2 -> Y;
C3 -> Y;
T -> Y;
}
"""
graph_mediator = """
digraph {
T -> M;
M -> Y;
T -> Y;
C -> T; C -> Y;
}
"""
graph_collider = """
digraph {
T -> S;
Y -> S;
T -> Y;
}
"""
graph_iv = """
digraph {
Z -> T;
T -> Y;
C -> T; C -> Y;
}
"""
Graph Visualization
from dowhy import CausalModel
import pandas as pd
import numpy as np
data = pd.DataFrame({'T': [0,1]*100, 'Y': np.random.randn(200),
'C': np.random.randn(200), 'Z': np.random.randn(200)})
graph = "digraph { C -> T; C -> Y; Z -> T; T -> Y; }"
model = CausalModel(data=data, treatment=['T'], outcome='Y', graph=graph)
model.view_graph()
Identification Strategies
Backdoor Criterion — Control for Confounders
import numpy as np
import pandas as pd
from dowhy import CausalModel
np.random.seed(0)
n = 3000
C1 = np.random.randn(n)
C2 = np.random.randn(n)
T = (0.5*C1 + 0.3*C2 + np.random.randn(n) > 0).astype(int)
Y = 3.0*T + 1.0*C1 - 0.5*C2 + np.random.randn(n)
data = pd.DataFrame({'T': T, 'Y': Y, 'C1': C1, 'C2': C2})
graph = "digraph { C1 -> T; C1 -> Y; C2 -> T; C2 -> Y; T -> Y; }"
model = CausalModel(data=data, treatment=['T'], outcome='Y', graph=graph)
identified = model.identify_effect()
print("Backdoor variables:", identified.get_backdoor_variables())
estimate = model.estimate_effect(
identified,
estimation_method='backdoor.linear_regression'
)
print(f"Causal effect (backdoor): {estimate.value:.3f}")
print(f"95% CI: {estimate.get_confidence_intervals()}")
Frontdoor Criterion — When You Can't Control Confounders
import numpy as np
import pandas as pd
from dowhy import CausalModel
np.random.seed(42)
n = 5000
U = np.random.randn(n)
T = (U + np.random.randn(n) > 0).astype(int)
M = 2.0 * T + np.random.randn(n)
Y = 1.5 * M + U + np.random.randn(n)
data = pd.DataFrame({'T': T, 'Y': Y, 'M': M})
graph = """
digraph {
U -> T;
U -> Y;
T -> M;
M -> Y;
}
"""
model = CausalModel(data=data, treatment=['T'], outcome='Y', graph=graph)
identified = model.identify_effect()
print("Frontdoor variables:", identified.get_frontdoor_variables())
estimate = model.estimate_effect(
identified,
estimation_method='frontdoor.two_stage_linear_regression'
)
print(f"Causal effect (frontdoor): {estimate.value:.3f}")
Instrumental Variables — Exogenous Variation
import numpy as np
import pandas as pd
from dowhy import CausalModel
np.random.seed(42)
n = 5000
Z = np.random.randn(n)
U = np.random.randn(n)
T = 1.0*Z + 0.8*U + np.random.randn(n)
Y = 2.5*T + 1.2*U + np.random.randn(n)
data = pd.DataFrame({'T': T, 'Y': Y, 'Z': Z})
graph = """
digraph {
Z -> T;
U -> T;
U -> Y;
T -> Y;
}
"""
model = CausalModel(data=data, treatment=['T'], outcome='Y', graph=graph)
identified = model.identify_effect()
print("Instrumental variables:", identified.get_instrumental_variables())
estimate = model.estimate_effect(
identified,
estimation_method='iv.instrumental_variable',
method_params={'instrument_variables': ['Z']}
)
print(f"Causal effect (IV): {estimate.value:.3f}")
Estimation Methods
from dowhy import CausalModel
est = model.estimate_effect(identified, estimation_method='backdoor.linear_regression')
est = model.estimate_effect(identified, estimation_method='backdoor.propensity_score_matching')
est = model.estimate_effect(identified, estimation_method='backdoor.propensity_score_weighting')
est = model.estimate_effect(identified, estimation_method='iv.instrumental_variable',
method_params={'instrument_variables': ['Z']})
def compare_estimators(model, identified, methods):
"""Run multiple estimators, return comparison table."""
import pandas as pd
results = []
for method in methods:
try:
est = model.estimate_effect(identified, estimation_method=method)
ci = est.get_confidence_intervals()
results.append({
'method': method.split('.')[-1],
'estimate': round(est.value, 4),
'ci_lower': round(ci[0][0], 4) if ci else None,
'ci_upper': round(ci[0][1], 4) if ci else None,
})
except Exception as e:
results.append({'method': method.split('.')[-1], 'error': str(e)})
return pd.DataFrame(results)
Refutation Tests — Validating Estimates
Refutation is the third pillar. Each test asks: "Would our estimate survive if assumption X were wrong?"
from dowhy import CausalModel
refute_placebo = model.refute_estimate(
estimate,
refutation_method='refute_placebo_treatment'
)
print(refute_placebo)
refute_random = model.refute_estimate(
estimate,
refutation_method='refute_random_common_cause',
method_params={
'num_simulations': 100,
'confidence_level': 0.95
}
)
print(refute_random)
refute_subset = model.refute_estimate(
estimate,
refutation_method='refute_data_subset',
method_params={
'fraction_removed': 0.2,
'num_simulations': 50
}
)
print(refute_subset)
refute_random_treat = model.refute_estimate(
estimate,
refutation_method='refute_treatment_regression_discontinuity'
)
Counterfactual Reasoning
import numpy as np
import pandas as pd
from dowhy import CausalModel
np.random.seed(42)
n = 1000
C = np.random.randn(n)
T = (C + np.random.randn(n) > 0).astype(int)
Y = 2.0 * T + 1.5 * C + np.random.randn(n)
data = pd.DataFrame({'T': T, 'Y': Y, 'C': C})
graph = "digraph { C -> T; C -> Y; T -> Y; }"
model = CausalModel(data=data, treatment=['T'], outcome='Y', graph=graph)
identified = model.identify_effect()
estimate = model.estimate_effect(identified, estimation_method='backdoor.linear_regression')
treated_mask = data['T'] == 1
Y_factual = data.loc[treated_mask, 'Y']
Y_counterfactual = Y_factual - estimate.value * 1
individual_effect = Y_factual.values - Y_counterfactual.values
print(f"Average Treatment Effect on Treated (ATT): {individual_effect.mean():.3f}")
print(f"ATT std across treated units: {individual_effect.std():.3f}")
print(f"Min / Max individual effect: {individual_effect.min():.3f} / {individual_effect.max():.3f}")
Y_all_treated = data['Y'] + estimate.value * (1 - data['T'])
revenue_lift = Y_all_treated.mean() - data['Y'].mean()
print(f"\nEstimated revenue lift from universal treatment: {revenue_lift:.3f}")
Heterogeneous Treatment Effects (CATE)
import numpy as np
import pandas as pd
from dowhy import CausalModel
from sklearn.tree import DecisionTreeRegressor
np.random.seed(42)
n = 3000
age = np.random.uniform(18, 70, n)
C = np.random.randn(n)
T = (C + np.random.randn(n) > 0).astype(int)
true_effect = np.where(age > 40, 5.0, 1.0)
Y = true_effect * T + 1.5 * C + np.random.randn(n)
data = pd.DataFrame({'T': T, 'Y': Y, 'C': C, 'age': age})
graph = "digraph { C -> T; C -> Y; T -> Y; age -> Y; }"
model = CausalModel(data=data, treatment=['T'], outcome='Y', graph=graph)
identified = model.identify_effect()
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import cross_val_predict
features = ['C', 'age']
X = data[features].values
treated_mask = data['T'] == 1
model_1 = GradientBoostingRegressor(random_state=42)
model_0 = GradientBoostingRegressor(random_state=42)
model_1.fit(X[treated_mask], data.loc[treated_mask, 'Y'])
model_0.fit(X[~treated_mask], data.loc[~treated_mask, 'Y'])
Y_hat_1 = model_1.predict(X)
Y_hat_0 = model_0.predict(X)
CATE = Y_hat_1 - Y_hat_0
data['CATE'] = CATE
print("CATE by age group:")
print(data.groupby(pd.cut(data['age'], bins=[18, 30, 40, 50, 60, 70]))['CATE'].mean())
top_quartile = data.nlargest(int(0.25 * n), 'CATE')
print(f"\nTop 25% CATE: mean effect = {top_quartile['CATE'].mean():.2f}")
print(f"Bottom 75% CATE: mean effect = {data.nsmallest(int(0.75*n), 'CATE')['CATE'].mean():.2f}")
Practical Workflows
1. Full Causal Analysis — Marketing Attribution
import numpy as np
import pandas as pd
from dowhy import CausalModel
def causal_marketing_analysis(data: pd.DataFrame,
treatment: str,
outcome: str,
confounders: list[str],
graph: str) -> dict:
"""
Full causal pipeline for marketing:
identify → estimate (3 methods) → refute (3 tests) → report.
"""
model = CausalModel(data=data, treatment=[treatment], outcome=outcome, graph=graph)
identified = model.identify_effect()
print(f"Backdoor variables: {identified.get_backdoor_variables()}")
methods = [
'backdoor.linear_regression',
'backdoor.propensity_score_matching',
'backdoor.propensity_score_weighting'
]
estimates = {}
for method in methods:
est = model.estimate_effect(identified, estimation_method=method)
name = method.split('.')[-1]
ci = est.get_confidence_intervals()
estimates[name] = {
'value': round(est.value, 4),
'ci': (round(ci[0][0], 4), round(ci[0][1], 4)) if ci else None
}
print(f" {name:35s} effect = {est.value:.4f}")
primary_est = model.estimate_effect(identified, estimation_method='backdoor.linear_regression')
print("\nRefutation tests:")
refutations = {}
ref_placebo = model.refute_estimate(primary_est, refutation_method='refute_placebo_treatment')
refutations['placebo'] = ref_placebo
print(f" Placebo treatment: {ref_placebo}")
ref_random = model.refute_estimate(primary_est, refutation_method='refute_random_common_cause',
method_params={'num_simulations': 50})
refutations['random_cause'] = ref_random
print(f" Random common cause: {ref_random}")
ref_subset = model.refute_estimate(primary_est, refutation_method='refute_data_subset',
method_params={'fraction_removed': 0.2, 'num_simulations': 30})
refutations['data_subset'] = ref_subset
print(f" Data subset: {ref_subset}")
effect = primary_est.value
ci = primary_est.get_confidence_intervals()
print(f"\n{'='*60}")
print(f" CAUSAL EFFECT OF {treatment} ON {outcome}")
print(f"{'='*60}")
print(f" Estimate: {effect:.4f}")
if ci: print(f" 95% CI: [{ci[0][0]:.4f}, {ci[0][1]:.4f}]")
print(f" Direction: {'Positive ↑' if effect > 0 else 'Negative ↓'}")
agreement = len(set(round(v['value'], 1) for v in estimates.values())) == 1
print(f" Estimator agreement: {'✅ All methods agree' if agreement else '⚠️ Methods disagree — investigate'}")
return {
'effect': effect,
'ci': ci,
'estimates': estimates,
'refutations': refutations,
'model': model
}
2. A/B Test with Confounders
import numpy as np
import pandas as pd
from dowhy import CausalModel
def causal_ab_test(data: pd.DataFrame,
variant_col: str,
outcome_col: str,
confounders: list[str],
graph: str) -> dict:
"""
A/B test where assignment was NOT perfectly random —
e.g., users self-selected, or assignment leaked through confounders.
Standard A/B test would be biased. Causal adjustment fixes it.
"""
data = data.copy()
data['T'] = (data[variant_col] == 'B').astype(int)
model = CausalModel(data=data, treatment=['T'], outcome=outcome_col, graph=graph)
identified = model.identify_effect()
naive_diff = data.groupby('T')[outcome_col].mean()
naive_effect = naive_diff[1] - naive_diff[0]
causal_est = model.estimate_effect(identified, estimation_method='backdoor.propensity_score_weighting')
print(f"Naive A/B difference (biased): {naive_effect:.4f}")
print(f"Causal effect (adjusted): {causal_est.value:.4f}")
print(f"Bias from confounding: {naive_effect - causal_est.value:.4f}")
model.refute_estimate(causal_est, refutation_method='refute_placebo_treatment')
return {
'naive_effect': naive_effect,
'causal_effect': causal_est.value,
'bias': naive_effect - causal_est.value
}
3. Instrumental Variable — Natural Experiment
import numpy as np
import pandas as pd
from dowhy import CausalModel
def iv_analysis(data: pd.DataFrame,
treatment: str,
outcome: str,
instrument: str,
graph: str) -> dict:
"""
IV analysis for when confounders are unobserved.
Classic example: "Does education cause higher income?"
Instrument: distance to nearest university (affects education, not income directly)
"""
model = CausalModel(data=data, treatment=[treatment], outcome=outcome, graph=graph)
identified = model.identify_effect()
iv_vars = identified.get_instrumental_variables()
print(f"Identified instruments: {iv_vars}")
if not iv_vars:
print("ERROR: No valid instrument found. Check the graph.")
return None
estimate = model.estimate_effect(
identified,
estimation_method='iv.instrumental_variable',
method_params={'instrument_variables': iv_vars}
)
print(f"\nIV Causal Effect of {treatment} on {outcome}: {estimate.value:.4f}")
from sklearn.linear_model import LinearRegression
import numpy as np
first_stage = LinearRegression()
first_stage.fit(data[[instrument]], data[treatment])
r2 = first_stage.score(data[[instrument]], data[treatment])
print(f"First stage R²: {r2:.4f} ({'Strong' if r2 > 0.1 else 'WEAK — estimates unreliable'})")
model.refute_estimate(estimate, refutation_method='refute_placebo_treatment')
return {'effect': estimate.value, 'first_stage_r2': r2}
4. Sensitivity Analysis — How Robust Is the Estimate?
import numpy as np
import pandas as pd
from dowhy import CausalModel
def sensitivity_analysis(data: pd.DataFrame,
treatment: str,
outcome: str,
graph: str,
n_simulations: int = 100) -> pd.DataFrame:
"""
Systematic sensitivity analysis:
How much would a hidden confounder need to bias our estimate to zero?
"""
model = CausalModel(data=data, treatment=[treatment], outcome=outcome, graph=graph)
identified = model.identify_effect()
estimate = model.estimate_effect(identified, estimation_method='backdoor.linear_regression')
original_effect = estimate.value
print(f"Original causal estimate: {original_effect:.4f}")
print("\n1. Sensitivity to unobserved confounders:")
ref = model.refute_estimate(
estimate,
refutation_method='refute_random_common_cause',
method_params={
'num_simulations': n_simulations,
'confidence_level': 0.95
}
)
print(f" {ref}")
print("\n2. Stability across data subsets:")
ref_sub = model.refute_estimate(
estimate,
refutation_method='refute_data_subset',
method_params={
'fraction_removed': 0.3,
'num_simulations': 50
}
)
print(f" {ref_sub}")
print("\n3. Placebo treatment test:")
ref_placebo = model.refute_estimate(
estimate,
refutation_method='refute_placebo_treatment',
method_params={'num_simulations': 50}
)
print(f" {ref_placebo}")
print(f"\n{'='*50}")
print(f" SENSITIVITY SUMMARY")
print(f" Original effect: {original_effect:.4f}")
print(f" If all refutations PASS → estimate is robust")
print(f" If any FAIL → revisit causal assumptions")
return {'original_effect': original_effect, 'model': model}
Common Pitfalls and Solutions
Collider Bias — Conditioning Opens Bad Paths
graph_bad = "digraph { T -> S; Y -> S; T -> Y; }"
Graph Specification Errors
graph_bad = "digraph { T <-> Y; }"
graph_good = "digraph { U -> T; U -> Y; T -> Y; }"
graph_bad = "digraph { T -> Y; }"
graph_good = "digraph { age -> T; age -> Y; T -> Y; }"
Effect Not Identifiable
from dowhy import CausalModel
graph = "digraph { U -> T; U -> Y; T -> Y; }"
DoWhy's power is the structure: you don't just throw data at a model and hope. You articulate your causal assumptions as a graph, let the math tell you whether the effect is identifiable, estimate it with the right method, and then systematically test whether your answer would survive if your assumptions were wrong. The identify → estimate → refute loop is the closest thing science has to a rigorous protocol for extracting causation from observational data.