Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

$pwd:

aer-identification

Name: Aer Identification
Author: brycewang-stanford

// Use when selecting, implementing, or stress-testing the causal identification strategy for an empirical economics manuscript — difference-in-differences (including staggered designs), instrumental variables (including weak-IV-robust inference), regression discontinuity, synthetic control, or shift-share / Bartik. Apply before writing the introduction or results.

Ejecutar en Manus

$ git log --oneline --stat

stars:2

forks:0

updated:25 de mayo de 2026, 06:13

Explorador de archivos

2 archivos

SKILL.md

readonly

name	aer-identification
description	Use when selecting, implementing, or stress-testing the causal identification strategy for an empirical economics manuscript — difference-in-differences (including staggered designs), instrumental variables (including weak-IV-robust inference), regression discontinuity, synthetic control, or shift-share / Bartik. Apply before writing the introduction or results.

AER Identification

Overview

In modern AER-track empirical economics, identification is the paper. A weak design cannot be rescued by clever writing, more controls, or a larger sample. This skill walks through the five canonical design-based strategies, the modern defaults that have replaced naive textbook implementations, and the referee-anticipating tests each demands.

If the identification strategy is fragile, return to aer-topic-selection. There is no point polishing an indefensible empirical strategy.

When to Use

Designing the empirical strategy for a new project
The current strategy is TWFE / first-stage F / naive RDD and the referee will flag it
A prior submission was rejected on identification grounds and the design needs rebuilding
Choosing between two candidate identification strategies for the same question

Master Decision Tree

Is treatment assignment plausibly random conditional on observables?
├── Yes, by design (RCT, lottery) → run the RCT analysis; register PAP via AEA RCT Registry
└── No → identification must come from variation
    ├── Sharp threshold in a running variable → RDD (sharp or fuzzy)
    ├── Discrete policy change in some units, not others, over time → DiD
    │     ├── Single treatment date → canonical 2×2 DiD
    │     └── Staggered adoption → Callaway-Sant'Anna or Borusyak-Jaravel-Spiess
    ├── Endogenous regressor + plausibly exogenous shifter → IV
    │     ├── Shifter × pre-existing exposure shares → shift-share / Bartik
    │     └── Single instrument → weak-IV-robust inference if F < 50
    ├── One treated unit / aggregate intervention → synthetic control
    └── None of the above → reconsider the question

Difference-in-Differences

Canonical 2×2 (single treatment date, two groups)

Use TWFE if and only if:

Treatment timing is simultaneous for all treated units
The control group is never treated
Treatment-effect heterogeneity is implausible

Otherwise, TWFE produces biased and often sign-flipped estimates.

Staggered Adoption (most modern applications)

Do not use TWFE. Use one of:

Callaway and Sant'Anna (2021) — csdid (Stata), did (R). Identifies group-time average treatment effects (ATT(g,t)); estimands are doubly robust; supports event-study aggregation.
Borusyak, Jaravel, and Spiess (2024) — imputation estimator.
de Chaisemartin and D'Haultfœuille (2020) — did_multiplegt.
Sun and Abraham (2021) — interaction-weighted estimator for event studies.

Required diagnostics:

Goodman-Bacon decomposition to show the share of weight from "forbidden" comparisons under TWFE
Event-study plot with the imputation or Callaway-Sant'Anna estimator
Pre-trends test reported as the joint test, not just the visual
Heterogeneity by treatment cohort

Pre-Trends

A flat pre-trend is necessary but not sufficient. Report:

Visual event-study plot with 95% confidence intervals
Formal joint test of pre-period coefficients (p-value)
Honest DiD (Rambachan-Roth 2023) sensitivity bounds for the post-period

Instrumental Variables

Weak Instruments

The first-stage F > 10 rule is obsolete. Modern conventions:

For just-identified models: report Anderson-Rubin (AR) confidence sets as the primary inference. AR has correct size regardless of instrument strength.
For F < 50: 2SLS confidence intervals are unreliable; AR is required, not optional.
Stock-Yogo critical values for TSLS bias assume homoskedasticity and are rarely valid in modern clustered settings.

Use weakivtest (Stata), ivDiag (R), or the Olea-Pflueger effective F statistic.

Exclusion Restriction

The IV's credibility depends on a story, not a test. State the exclusion restriction in one sentence in the introduction and defend it with:

Institutional narrative (one paragraph)
A placebo regression where the instrument predicts an outcome it should not affect
Sensitivity analysis: how much exclusion-restriction violation would overturn the result (Conley et al. 2012)

Shift-Share / Bartik

Two valid sources of identification, with very different implications:

Exogenous shares (Goldsmith-Pinkham, Sorkin, Swift 2020) — argue that pre-existing exposure shares are conditionally exogenous; report the Rotemberg weights and inspect the top-5 industries driving identification.
Exogenous shocks (Borusyak, Hull, Jaravel 2022; Adão, Kolesár, Morales 2019) — argue that aggregate shocks are as-good-as-random; report shock-level inference.

Pick one explicitly. Do not hand-wave between the two.

Regression Discontinuity

Modern Defaults

Local linear regression with a triangular kernel. Polynomials of order > 1 are discouraged (Gelman-Imbens 2019).
MSE-optimal bandwidth (Calonico-Cattaneo-Titiunik 2014) with the robust bias-corrected confidence interval. Use rdrobust.
Donut RDD if bunching near the cutoff is a concern.
Covariate adjustment for efficiency; main result must hold without it.

Required Diagnostics

McCrary (2008) / Cattaneo-Jansson-Ma (2020) density test for manipulation of the running variable
Balance tests on predetermined covariates at the cutoff
Placebo cutoffs away from the true threshold
Bandwidth sensitivity — show the estimate across at least three bandwidths
Visual RD plot using rdplot with the binning method explicitly stated

Synthetic Control

When Appropriate

One (or few) treated units
Long pre-treatment outcome series (≥ 10 periods)
A large donor pool of plausibly comparable untreated units
Aggregate intervention (policy at the country, state, city level)

Modern Extensions

Generalized synthetic control (Xu 2017) for multiple treated units
Augmented synthetic control (Ben-Michael, Feller, Rothstein 2021) for bias correction
Synthetic DiD (Arkhangelsky et al. 2021) combining SCM and DiD weighting

Required Diagnostics

Placebo (in-time): apply SCM to pre-treatment fake intervention dates
Placebo (in-space): apply SCM to every donor as if it were treated; report the distribution of placebo effects
Permutation inference / Fisher exact p-value
Weight vector reported in the appendix; donors with > 10% weight discussed

Field Experiments and RCTs

If the paper uses a field experiment:

Register with AEA RCT Registry before the intervention begins. AEA journals require this prior to submission.
Pre-analysis plan (PAP) posted before unblinding. Per Olken and others, keep the PAP moderate in scope — pre-specify primary outcomes and the analysis specification, leave exploratory work clearly labeled as such.
Power calculations in the manuscript or appendix.
Multiple-hypothesis correction if more than one primary outcome.
Attrition documented and tested for differential attrition by treatment arm.

Mechanism vs. Identification

A common confusion: identification answers whether X causes Y; mechanism answers why. Mechanism evidence should not weaken the identification of the main effect. Run:

Subgroup heterogeneity (does the effect concentrate where theory predicts?)
Mediation analysis only if the mediator is itself plausibly exogenous (rare)
Auxiliary outcomes consistent with the proposed channel

Red Flags for Referees

TWFE on staggered data with no Goodman-Bacon decomposition
First-stage F = 12 cited as evidence of instrument strength
RDD with a polynomial of order 4
Synthetic control with no placebo inference
DiD with a "control group" of eventually-treated units
IV exclusion restriction defended only by "we control for X"
Quoting an Angrist-Pischke citation as a substitute for showing the diagnostic

Repository Resources

When working from the AER-skills repository or plugin bundle, load only the relevant resource:

Staggered DiD implementation: templates/stata/03_main_did.do, templates/r/03_main_did.R, or templates/python/main_did.py
Classic design examples: examples/aer-exemplars.md

Handoff

STRATEGY: <DiD | IV | RDD | SCM | shift-share | RCT>
MODERN ESTIMATOR USED: <yes / no / which>
REQUIRED DIAGNOSTICS REPORTED: <list>
INFERENCE METHOD: <robust / cluster-robust / AR / wild bootstrap / permutation>
WEAK-IV / TWFE / POLY-ORDER RED FLAGS: <list or "none">
NEXT SKILL: aer-robustness

Anti-Patterns

Defending an old design ("the prior literature used TWFE") when modern estimators exist
Reporting OLS-with-controls as the main specification and IV/RD as "robustness"
Using more than one identification strategy as if they were independent confirmations when they share identifying variation
Footnoting the identifying assumption instead of stating it in the introduction

related-skills.json

mismo repositorio

aer-introduction.md

from "brycewang-stanford/AER-Skills"

Use when drafting or rewriting the introduction of an economics manuscript targeted at AER, AER:Insights, or an AEJ, or when compressing an abstract to the mandatory 100-word limit. Implements the Keith Head / Bellemare five-paragraph formula and AER-specific formatting conventions.

2026-05-252

aer-rebuttal.md

from "brycewang-stanford/AER-Skills"

Use when responding to a Revise & Resubmit decision from AER, AER:Insights, or an AEJ, and a point-by-point response letter plus aligned manuscript revisions are needed. Handles triage, the concede / clarify / push-back decision, and the response-letter format that editors actually read.

2026-05-252

aer-replication.md

from "brycewang-stanford/AER-Skills"

Use when assembling the AEA Data and Code Availability deposit for an AER, AER:Insights, or AEJ acceptance, writing the README, or auditing a replication package before the AEA Data Editor review. Implements the current AEA policy, including the February 2026 Data and Code Availability Policy.

2026-05-252

aer-tables-figures.md

from "brycewang-stanford/AER-Skills"

Use when constructing or revising regression tables, descriptive statistics tables, or figures for an AER, AER:Insights, or AEJ manuscript. Implements AER booktabs house style, the standard regression-table layout, and the figure-notes convention.

2026-05-252

aer-robustness.md

from "brycewang-stanford/AER-Skills"

Use when the main empirical results exist but the manuscript lacks the robustness, heterogeneity, mechanism, and placebo checks that AER referees will demand. Apply after aer-identification and before aer-introduction so that the value-added paragraph can reference these tests.

2026-05-252

aer-submission.md

from "brycewang-stanford/AER-Skills"

Use when running the final pre-submission audit for an AER, AER:Insights, or AEJ manuscript — length, format, cover letter, per-author disclosure statements, file packaging, and routing among the AEA journal family. Apply immediately before clicking submit.

2026-05-252

package.json

"author": "brycewang-stanford"

"repository": "brycewang-stanford/AER-Skills"

Abrir repositorio de GitHub Ver repositorios del creador

$ install --global

$ download --local

Ejecutar en Manus

$ useful --forSOC

Especialistas en riesgo financieroOperaciones empresariales y financieras13-2054L4

name	aer-identification
description	Use when selecting, implementing, or stress-testing the causal identification strategy for an empirical economics manuscript — difference-in-differences (including staggered designs), instrumental variables (including weak-IV-robust inference), regression discontinuity, synthetic control, or shift-share / Bartik. Apply before writing the introduction or results.

AER Identification

Overview

If the identification strategy is fragile, return to aer-topic-selection. There is no point polishing an indefensible empirical strategy.

When to Use

Designing the empirical strategy for a new project
The current strategy is TWFE / first-stage F / naive RDD and the referee will flag it
A prior submission was rejected on identification grounds and the design needs rebuilding
Choosing between two candidate identification strategies for the same question

Master Decision Tree

Is treatment assignment plausibly random conditional on observables?
├── Yes, by design (RCT, lottery) → run the RCT analysis; register PAP via AEA RCT Registry
└── No → identification must come from variation
    ├── Sharp threshold in a running variable → RDD (sharp or fuzzy)
    ├── Discrete policy change in some units, not others, over time → DiD
    │     ├── Single treatment date → canonical 2×2 DiD
    │     └── Staggered adoption → Callaway-Sant'Anna or Borusyak-Jaravel-Spiess
    ├── Endogenous regressor + plausibly exogenous shifter → IV
    │     ├── Shifter × pre-existing exposure shares → shift-share / Bartik
    │     └── Single instrument → weak-IV-robust inference if F < 50
    ├── One treated unit / aggregate intervention → synthetic control
    └── None of the above → reconsider the question

Difference-in-Differences

Canonical 2×2 (single treatment date, two groups)

Use TWFE if and only if:

Treatment timing is simultaneous for all treated units
The control group is never treated
Treatment-effect heterogeneity is implausible

Otherwise, TWFE produces biased and often sign-flipped estimates.

Staggered Adoption (most modern applications)

Do not use TWFE. Use one of:

Callaway and Sant'Anna (2021) — csdid (Stata), did (R). Identifies group-time average treatment effects (ATT(g,t)); estimands are doubly robust; supports event-study aggregation.
Borusyak, Jaravel, and Spiess (2024) — imputation estimator.
de Chaisemartin and D'Haultfœuille (2020) — did_multiplegt.
Sun and Abraham (2021) — interaction-weighted estimator for event studies.

Required diagnostics:

Goodman-Bacon decomposition to show the share of weight from "forbidden" comparisons under TWFE
Event-study plot with the imputation or Callaway-Sant'Anna estimator
Pre-trends test reported as the joint test, not just the visual
Heterogeneity by treatment cohort

Pre-Trends

A flat pre-trend is necessary but not sufficient. Report:

Visual event-study plot with 95% confidence intervals
Formal joint test of pre-period coefficients (p-value)
Honest DiD (Rambachan-Roth 2023) sensitivity bounds for the post-period

Instrumental Variables

Weak Instruments

The first-stage F > 10 rule is obsolete. Modern conventions:

For just-identified models: report Anderson-Rubin (AR) confidence sets as the primary inference. AR has correct size regardless of instrument strength.
For F < 50: 2SLS confidence intervals are unreliable; AR is required, not optional.
Stock-Yogo critical values for TSLS bias assume homoskedasticity and are rarely valid in modern clustered settings.

Use weakivtest (Stata), ivDiag (R), or the Olea-Pflueger effective F statistic.

Exclusion Restriction

The IV's credibility depends on a story, not a test. State the exclusion restriction in one sentence in the introduction and defend it with:

Institutional narrative (one paragraph)
A placebo regression where the instrument predicts an outcome it should not affect
Sensitivity analysis: how much exclusion-restriction violation would overturn the result (Conley et al. 2012)

Shift-Share / Bartik

Two valid sources of identification, with very different implications:

Exogenous shares (Goldsmith-Pinkham, Sorkin, Swift 2020) — argue that pre-existing exposure shares are conditionally exogenous; report the Rotemberg weights and inspect the top-5 industries driving identification.
Exogenous shocks (Borusyak, Hull, Jaravel 2022; Adão, Kolesár, Morales 2019) — argue that aggregate shocks are as-good-as-random; report shock-level inference.

Pick one explicitly. Do not hand-wave between the two.

Regression Discontinuity

Modern Defaults

Local linear regression with a triangular kernel. Polynomials of order > 1 are discouraged (Gelman-Imbens 2019).
MSE-optimal bandwidth (Calonico-Cattaneo-Titiunik 2014) with the robust bias-corrected confidence interval. Use rdrobust.
Donut RDD if bunching near the cutoff is a concern.
Covariate adjustment for efficiency; main result must hold without it.

Required Diagnostics

McCrary (2008) / Cattaneo-Jansson-Ma (2020) density test for manipulation of the running variable
Balance tests on predetermined covariates at the cutoff
Placebo cutoffs away from the true threshold
Bandwidth sensitivity — show the estimate across at least three bandwidths
Visual RD plot using rdplot with the binning method explicitly stated

Synthetic Control

When Appropriate

One (or few) treated units
Long pre-treatment outcome series (≥ 10 periods)
A large donor pool of plausibly comparable untreated units
Aggregate intervention (policy at the country, state, city level)

Modern Extensions

Generalized synthetic control (Xu 2017) for multiple treated units
Augmented synthetic control (Ben-Michael, Feller, Rothstein 2021) for bias correction
Synthetic DiD (Arkhangelsky et al. 2021) combining SCM and DiD weighting

Required Diagnostics

Placebo (in-time): apply SCM to pre-treatment fake intervention dates
Placebo (in-space): apply SCM to every donor as if it were treated; report the distribution of placebo effects
Permutation inference / Fisher exact p-value
Weight vector reported in the appendix; donors with > 10% weight discussed

Field Experiments and RCTs

If the paper uses a field experiment:

Register with AEA RCT Registry before the intervention begins. AEA journals require this prior to submission.
Pre-analysis plan (PAP) posted before unblinding. Per Olken and others, keep the PAP moderate in scope — pre-specify primary outcomes and the analysis specification, leave exploratory work clearly labeled as such.
Power calculations in the manuscript or appendix.
Multiple-hypothesis correction if more than one primary outcome.
Attrition documented and tested for differential attrition by treatment arm.

Mechanism vs. Identification

A common confusion: identification answers whether X causes Y; mechanism answers why. Mechanism evidence should not weaken the identification of the main effect. Run:

Subgroup heterogeneity (does the effect concentrate where theory predicts?)
Mediation analysis only if the mediator is itself plausibly exogenous (rare)
Auxiliary outcomes consistent with the proposed channel

Red Flags for Referees

TWFE on staggered data with no Goodman-Bacon decomposition
First-stage F = 12 cited as evidence of instrument strength
RDD with a polynomial of order 4
Synthetic control with no placebo inference
DiD with a "control group" of eventually-treated units
IV exclusion restriction defended only by "we control for X"
Quoting an Angrist-Pischke citation as a substitute for showing the diagnostic

Repository Resources

When working from the AER-skills repository or plugin bundle, load only the relevant resource:

Staggered DiD implementation: templates/stata/03_main_did.do, templates/r/03_main_did.R, or templates/python/main_did.py
Classic design examples: examples/aer-exemplars.md

Handoff

STRATEGY: <DiD | IV | RDD | SCM | shift-share | RCT>
MODERN ESTIMATOR USED: <yes / no / which>
REQUIRED DIAGNOSTICS REPORTED: <list>
INFERENCE METHOD: <robust / cluster-robust / AR / wild bootstrap / permutation>
WEAK-IV / TWFE / POLY-ORDER RED FLAGS: <list or "none">
NEXT SKILL: aer-robustness

Anti-Patterns

Defending an old design ("the prior literature used TWFE") when modern estimators exist
Reporting OLS-with-controls as the main specification and IV/RD as "robustness"
Using more than one identification strategy as if they were independent confirmations when they share identifying variation
Footnoting the identifying assumption instead of stating it in the introduction

aer-identification

AER Identification

Overview

When to Use

Master Decision Tree

Difference-in-Differences

Canonical 2×2 (single treatment date, two groups)

Staggered Adoption (most modern applications)

Pre-Trends

Instrumental Variables

Weak Instruments

Exclusion Restriction

Shift-Share / Bartik

Regression Discontinuity

Modern Defaults

Required Diagnostics

Synthetic Control

When Appropriate

Modern Extensions

Required Diagnostics

Field Experiments and RCTs

Mechanism vs. Identification

Red Flags for Referees

Repository Resources

Handoff

Anti-Patterns

Más de este repositorio

Más de este repositorio

AER Identification

Overview

When to Use

Master Decision Tree

Difference-in-Differences

Canonical 2×2 (single treatment date, two groups)

Staggered Adoption (most modern applications)

Pre-Trends

Instrumental Variables

Weak Instruments

Exclusion Restriction

Shift-Share / Bartik

Regression Discontinuity

Modern Defaults

Required Diagnostics

Synthetic Control

When Appropriate

Modern Extensions

Required Diagnostics

Field Experiments and RCTs

Mechanism vs. Identification

Red Flags for Referees

Repository Resources

Handoff

Anti-Patterns