| name | aer-identification |
| description | Use when selecting, implementing, or stress-testing the causal identification strategy for an empirical economics manuscript — difference-in-differences (including staggered designs), instrumental variables (including weak-IV-robust inference), regression discontinuity, synthetic control, or shift-share / Bartik. Apply before writing the introduction or results. |
AER Identification
Overview
In modern AER-track empirical economics, identification is the paper. A weak design cannot be rescued by clever writing, more controls, or a larger sample. This skill walks through the five canonical design-based strategies, the modern defaults that have replaced naive textbook implementations, and the referee-anticipating tests each demands.
If the identification strategy is fragile, return to aer-topic-selection. There is no point polishing an indefensible empirical strategy.
When to Use
- Designing the empirical strategy for a new project
- The current strategy is TWFE / first-stage F / naive RDD and the referee will flag it
- A prior submission was rejected on identification grounds and the design needs rebuilding
- Choosing between two candidate identification strategies for the same question
Master Decision Tree
Is treatment assignment plausibly random conditional on observables?
├── Yes, by design (RCT, lottery) → run the RCT analysis; register PAP via AEA RCT Registry
└── No → identification must come from variation
├── Sharp threshold in a running variable → RDD (sharp or fuzzy)
├── Discrete policy change in some units, not others, over time → DiD
│ ├── Single treatment date → canonical 2×2 DiD
│ └── Staggered adoption → Callaway-Sant'Anna or Borusyak-Jaravel-Spiess
├── Endogenous regressor + plausibly exogenous shifter → IV
│ ├── Shifter × pre-existing exposure shares → shift-share / Bartik
│ └── Single instrument → weak-IV-robust inference if F < 50
├── One treated unit / aggregate intervention → synthetic control
└── None of the above → reconsider the question
Difference-in-Differences
Canonical 2×2 (single treatment date, two groups)
Use TWFE if and only if:
- Treatment timing is simultaneous for all treated units
- The control group is never treated
- Treatment-effect heterogeneity is implausible
Otherwise, TWFE produces biased and often sign-flipped estimates.
Staggered Adoption (most modern applications)
Do not use TWFE. Use one of:
- Callaway and Sant'Anna (2021) —
csdid (Stata), did (R). Identifies group-time average treatment effects (ATT(g,t)); estimands are doubly robust; supports event-study aggregation.
- Borusyak, Jaravel, and Spiess (2024) — imputation estimator.
- de Chaisemartin and D'Haultfœuille (2020) —
did_multiplegt.
- Sun and Abraham (2021) — interaction-weighted estimator for event studies.
Required diagnostics:
- Goodman-Bacon decomposition to show the share of weight from "forbidden" comparisons under TWFE
- Event-study plot with the imputation or Callaway-Sant'Anna estimator
- Pre-trends test reported as the joint test, not just the visual
- Heterogeneity by treatment cohort
Pre-Trends
A flat pre-trend is necessary but not sufficient. Report:
- Visual event-study plot with 95% confidence intervals
- Formal joint test of pre-period coefficients (p-value)
- Honest DiD (Rambachan-Roth 2023) sensitivity bounds for the post-period
Instrumental Variables
Weak Instruments
The first-stage F > 10 rule is obsolete. Modern conventions:
- For just-identified models: report Anderson-Rubin (AR) confidence sets as the primary inference. AR has correct size regardless of instrument strength.
- For F < 50: 2SLS confidence intervals are unreliable; AR is required, not optional.
- Stock-Yogo critical values for TSLS bias assume homoskedasticity and are rarely valid in modern clustered settings.
Use weakivtest (Stata), ivDiag (R), or the Olea-Pflueger effective F statistic.
Exclusion Restriction
The IV's credibility depends on a story, not a test. State the exclusion restriction in one sentence in the introduction and defend it with:
- Institutional narrative (one paragraph)
- A placebo regression where the instrument predicts an outcome it should not affect
- Sensitivity analysis: how much exclusion-restriction violation would overturn the result (Conley et al. 2012)
Shift-Share / Bartik
Two valid sources of identification, with very different implications:
- Exogenous shares (Goldsmith-Pinkham, Sorkin, Swift 2020) — argue that pre-existing exposure shares are conditionally exogenous; report the Rotemberg weights and inspect the top-5 industries driving identification.
- Exogenous shocks (Borusyak, Hull, Jaravel 2022; Adão, Kolesár, Morales 2019) — argue that aggregate shocks are as-good-as-random; report shock-level inference.
Pick one explicitly. Do not hand-wave between the two.
Regression Discontinuity
Modern Defaults
- Local linear regression with a triangular kernel. Polynomials of order > 1 are discouraged (Gelman-Imbens 2019).
- MSE-optimal bandwidth (Calonico-Cattaneo-Titiunik 2014) with the robust bias-corrected confidence interval. Use
rdrobust.
- Donut RDD if bunching near the cutoff is a concern.
- Covariate adjustment for efficiency; main result must hold without it.
Required Diagnostics
- McCrary (2008) / Cattaneo-Jansson-Ma (2020) density test for manipulation of the running variable
- Balance tests on predetermined covariates at the cutoff
- Placebo cutoffs away from the true threshold
- Bandwidth sensitivity — show the estimate across at least three bandwidths
- Visual RD plot using
rdplot with the binning method explicitly stated
Synthetic Control
When Appropriate
- One (or few) treated units
- Long pre-treatment outcome series (≥ 10 periods)
- A large donor pool of plausibly comparable untreated units
- Aggregate intervention (policy at the country, state, city level)
Modern Extensions
- Generalized synthetic control (Xu 2017) for multiple treated units
- Augmented synthetic control (Ben-Michael, Feller, Rothstein 2021) for bias correction
- Synthetic DiD (Arkhangelsky et al. 2021) combining SCM and DiD weighting
Required Diagnostics
- Placebo (in-time): apply SCM to pre-treatment fake intervention dates
- Placebo (in-space): apply SCM to every donor as if it were treated; report the distribution of placebo effects
- Permutation inference / Fisher exact p-value
- Weight vector reported in the appendix; donors with > 10% weight discussed
Field Experiments and RCTs
If the paper uses a field experiment:
- Register with AEA RCT Registry before the intervention begins. AEA journals require this prior to submission.
- Pre-analysis plan (PAP) posted before unblinding. Per Olken and others, keep the PAP moderate in scope — pre-specify primary outcomes and the analysis specification, leave exploratory work clearly labeled as such.
- Power calculations in the manuscript or appendix.
- Multiple-hypothesis correction if more than one primary outcome.
- Attrition documented and tested for differential attrition by treatment arm.
Mechanism vs. Identification
A common confusion: identification answers whether X causes Y; mechanism answers why. Mechanism evidence should not weaken the identification of the main effect. Run:
- Subgroup heterogeneity (does the effect concentrate where theory predicts?)
- Mediation analysis only if the mediator is itself plausibly exogenous (rare)
- Auxiliary outcomes consistent with the proposed channel
Red Flags for Referees
- TWFE on staggered data with no Goodman-Bacon decomposition
- First-stage F = 12 cited as evidence of instrument strength
- RDD with a polynomial of order 4
- Synthetic control with no placebo inference
- DiD with a "control group" of eventually-treated units
- IV exclusion restriction defended only by "we control for X"
- Quoting an Angrist-Pischke citation as a substitute for showing the diagnostic
Repository Resources
When working from the AER-skills repository or plugin bundle, load only the relevant resource:
- Staggered DiD implementation:
templates/stata/03_main_did.do, templates/r/03_main_did.R, or templates/python/main_did.py
- Classic design examples:
examples/aer-exemplars.md
Handoff
STRATEGY: <DiD | IV | RDD | SCM | shift-share | RCT>
MODERN ESTIMATOR USED: <yes / no / which>
REQUIRED DIAGNOSTICS REPORTED: <list>
INFERENCE METHOD: <robust / cluster-robust / AR / wild bootstrap / permutation>
WEAK-IV / TWFE / POLY-ORDER RED FLAGS: <list or "none">
NEXT SKILL: aer-robustness
Anti-Patterns
- Defending an old design ("the prior literature used TWFE") when modern estimators exist
- Reporting OLS-with-controls as the main specification and IV/RD as "robustness"
- Using more than one identification strategy as if they were independent confirmations when they share identifying variation
- Footnoting the identifying assumption instead of stating it in the introduction