| name | aer-replication |
| description | Use when assembling the AEA Data and Code Availability deposit for an AER, AER:Insights, or AEJ acceptance, writing the README, or auditing a replication package before the AEA Data Editor review. Implements the current AEA policy, including the February 2026 Data and Code Availability Policy. |
AER Replication
Overview
Prior to acceptance, every empirical, simulation, or experimental paper submitted to an AEA journal must provide data, code, and a complete README sufficient for an independent researcher to reproduce every reported result. The AEA Data Editor's Office now performs a computational reproducibility check before final acceptance, and a failed check delays publication by weeks or months.
This skill produces an AEA-compliant deposit on the first try.
When to Use
- A conditional acceptance arrived and the deposit deadline is set
- Drafting the README at any point in the project (recommended: from day one)
- An AEA Data Editor report flagged the deposit
- Preparing a deposit for openICPSR
Current AEA Data and Code Availability Policy (February 2026)
Three pillars:
- Data deposit ā all data the paper uses, or full provenance documentation for data that cannot be public
- Code deposit ā all code that produces every reported number, table, and figure
- README ā a single document instructing a replicator on how to run everything
Materials must be deposited in an openly accessible trusted repository. The strongly encouraged repository is the AEA Data and Code Repository at openICPSR, which gives the Data Editor automatic access to draft deposits.
Forms Required
- Data and Code Information Form
- Data and Code Availability Form (signed)
- Data and Code Archive Agreement Form (signed)
These forms are provided by the editorial office at the appropriate stage. Do not invent substitute forms.
Repository Structure
replication-package/
āāā README.md (or README.pdf)
āāā LICENSE (commonly MIT or CC-BY for code; data per source license)
āāā data/
ā āāā raw/ (original files as obtained, never modified)
ā āāā intermediate/ (cleaned analytic datasets)
ā āāā codebook/ (variable definitions, source mapping)
āāā code/
ā āāā 00_setup.do (or .R, .py)
ā āāā 01_clean.do
ā āāā 02_analysis.do
ā āāā 03_tables.do
ā āāā 04_figures.do
āāā output/
ā āāā tables/
ā āāā figures/
āāā docs/
āāā data_appendix.pdf
āāā computing_environment.txt
README Required Sections
The AEA Data Editor's office publishes a template. Required sections:
1. Overview
One paragraph: what the package does, what paper it accompanies, what software is required.
2. Data Availability and Provenance Statement
For every dataset used, state:
- Source (with URL or institution)
- Citation
- License or terms of access
- Date accessed
- Whether the data is included in the deposit, and if not, why not
If data cannot be deposited (proprietary, restricted, IRB), the author must:
- Commit to preserving data and code for ā„ 5 years
- Commit to assisting reasonable replication requests
- Make the code publicly available
- Document the data source thoroughly enough for an independent researcher to obtain it
3. Dataset List
A table of every file in data/, with columns: Filename | Description | Source | Notes.
4. Computational Requirements
Operating system, software version, packages, estimated runtime, peak memory. Example:
Software: Stata 18.0 MP, R 4.4.1
Stata packages: reghdfe (6.12), ftools (2.49), did (1.3.1), rdrobust (9.2)
R packages: fixest (0.12.1), did (2.1.2), modelsummary (2.1.1)
Hardware: standard laptop (8 GB RAM); full run completes in ~25 minutes.
5. Description of Programs / Code
What each script does and the order to run them. State the master script (e.g., run_all.do).
6. Instructions to Replicators
A literal step-by-step:
1. Download this archive and unzip to a working directory.
2. Open Stata 18; install dependencies via `setup/install_packages.do`.
3. Edit line 12 of `00_setup.do` to point to the working directory.
4. Run `do run_all.do`. Total runtime: ~25 minutes.
5. All tables and figures are written to `output/`.
7. List of Tables and Programs
A second table mapping each exhibit in the published paper to the script that produces it:
| Table/Figure | Script | Line in Script | Output File |
|---|
| Table 1 | 03_tables.do | 12 | output/tables/tab1.tex |
| Table 2 | 03_tables.do | 67 | output/tables/tab2.tex |
| Figure 1 | 04_figures.do | 8 | output/figures/fig1.pdf |
Coding Discipline for Reproducibility
Adopt from day one. Retrofitting is expensive.
General
- One master script that runs the entire pipeline end-to-end
- Relative paths only ā
data/raw/foo.csv, never /Users/yourname/...
- Set the random seed explicitly in any stochastic procedure
- Version-pin packages: in Stata,
setup script with ssc install ... , version; in R, use renv or groundhog; in Python, requirements.txt with exact versions
- Log files for every run, saved to
logs/
Stata
version 18.0 at the top of every do-file
set seed before any randomization
- Avoid
set more off and other display tweaks in shared scripts
- Use
reghdfe over xtreg, fe for performance and clarity
- Comment heavily ā Julian Reif's Stata Coding Guide is a reasonable standard
R
set.seed() at the top of every analysis script
renv::snapshot() to lock the package environment
- Avoid loading data with
read.csv and writing with write.csv ā explicit data.table::fread / fwrite with column types
Python
requirements.txt with == pins
numpy.random.default_rng(seed=...) for new-style RNG
- Jupyter notebooks acceptable only if accompanied by an equivalent
.py script
AEA Data Editor ā What They Actually Check
- Does the README exist and follow the template?
- Does the master script run end-to-end without errors?
- Do the produced numbers match the published paper?
- Are all data sources documented with provenance?
- Is the data deposit complete, or properly justified if restricted?
If yes to all: Approved, paper proceeds to publication.
If no: Revisions required, with a public report.
Common failure modes flagged in published Data Editor reports:
- Hardcoded absolute paths
- Missing package versions
- Master script that errors on a clean machine
- Tables in the deposit that differ from the published version
- Restricted-data papers with no provenance documentation
- README that says "see paper" instead of explaining the steps
openICPSR Workflow
- Create deposit on https://www.openicpsr.org/openicpsr/aea
- Upload files so the data and code are unzipped inside the repository; avoid one opaque top-level ZIP except for rare repository-approved exceptions
- Fill metadata: citation, abstract, keywords, geographic coverage, time period
- Submit; the AEA Data Editor's office reviews
- Address feedback in the deposit and resubmit
- Approval triggers DOI assignment and publication
Restricted Data Special Cases
If data is restricted:
- Provide access instructions in the README ā exact dataset name, agency, application process, approximate cost, expected turnaround
- Provide code that would run if the data were available
- Provide sample synthetic data with the same schema if possible, so the code can be smoke-tested
- Consider depositing intermediate aggregates (cell-suppressed counts) that allow partial replication
Repository Resources
When working from the AER-skills repository or plugin bundle, load only the relevant resource:
- Deposit skeleton:
examples/replication-package-skeleton/
- Language template:
templates/stata/, templates/r/, or templates/python/
- Classic replication examples and repository links:
examples/aer-exemplars.md
Pre-Deposit Checklist
Handoff
DEPOSIT TYPE: <public-data | restricted-data | hybrid>
README STATUS: <complete | incomplete>
REPRODUCIBILITY CHECK: <self-tested clean / not yet>
RESTRICTED DATA HANDLING: <n/a | provenance documented>
NEXT SKILL: aer-submission
Anti-Patterns
- Waiting until acceptance to start the deposit
- Treating the README as documentation for collaborators, not for an unknown replicator
- Depositing the JEL appendix tables but not the regressions that produced them
- "All code available from the authors upon request" ā does not satisfy the policy
- Uploading one giant ZIP whose contents the repository cannot display
- A deposit that runs only with packages from 2018 ā pin versions but use current stable releases