| name | StatsPAI_skill |
| description | Use when the user asks to run a full empirical / causal analysis in Python — by default in the style of an applied economics paper (AER / QJE / JPE / ReStud / AEJ) with DID / RD / IV / SCM / DML / matching, written-out estimating equation + identifying assumption, Table 1 / Table 2 / event-study figure / robustness gauntlet — OR in epidemiology / public health style (target-trial emulation, IPTW + g-formula + TMLE triplet, Mendelian randomization, KM/AFT survival, E-value sensitivity, STROBE/TRIPOD reporting) — OR in ML causal inference style (DML, S/T/X/R/DR meta-learners, causal forest, Dragonnet/TARNet/CEVAE, BCF, CATE distribution, policy learning, conformal causal, fairness audit, causal discovery). Also covers exporting multi-column regression tables to Word / Excel / LaTeX (Stata outreg2 / esttab / R modelsummary equivalent) and bundling an entire replication appendix into one .docx / .xlsx / .tex file. Triggers on keywords "StatsPAI", "statspai", "AER empirical analysis", "applied micro pipeline", "Table 1 balance", "event study", "first-stage F", "Oster bound", "honest_did", "spec_curve", "callaway_santanna", "dragonnet", "text as treatment", "outreg2 in Python", "regression table to Word/Excel", "sp.regtable", "sp.collect", "sp.paper_tables", "sp.feols", "summary_col", "modelsummary", "AER style table", "QJE style table", "epidemiology pipeline", "target trial emulation", "g-formula", "IPTW", "TMLE", "Mendelian randomization", "STROBE", "TRIPOD", "公共健康", "流行病学", "DML", "double machine learning", "causal forest", "meta-learner", "CATE", "conformal causal", "policy learning", "因果机器学习", "ML causal". |
| triggers | ["causal inference in python","applied microeconomics pipeline","AER empirical analysis","QJE style robustness","DID IV RD SCM","callaway_santanna","synthetic control","double machine learning","causal forest","event study plot","first stage F-statistic","Oster bound","honest_did","spec_curve","estimand-first DSL","LLM-assisted DAG discovery","text as treatment","export regression table to Word","export regression table to Excel","regression table docx","regression table xlsx","outreg2 in Python","summary_col equivalent","modelsummary equivalent","AER house style table","QJE house style table","journal template regression","Stata collect equivalent","replication bundle","sp.regtable","sp.collect","sp.paper_tables","sp.feols","sp.cite","high-dim fixed effects","two-way clustering","StatsPAI","statspai","fmt auto regression table","magnitude-adaptive coefficient formatting","mixed magnitude coefficients","sumstats by_labels","Control Treated auto labels","epidemiology pipeline","public health causal inference","target trial emulation","g-formula","IPTW marginal structural model","TMLE doubly robust","HAL-TMLE","Mendelian randomization","MR-Egger weighted median","STROBE TRIPOD reporting","E-value sensitivity","Kaplan-Meier AFT survival","流行病学","公共健康","ML causal inference","double machine learning DML","meta-learner S T X R DR","causal forest GRF","Dragonnet TARNet CEVAE","Bayesian causal forest BCF","CATE distribution","policy tree","off-policy evaluation","conformal causal prediction","fairness audit","causal discovery PC NOTEARS","因果机器学习"] |
StatsPAI: Agent-Native Causal Inference & AER-Style Empirical Workflow
StatsPAI is the agent-native Python package for causal inference and applied econometrics: one import statspai as sp, 900+ functions behind a self-describing API, and CausalResult objects that export to LaTeX / Word / Excel / BibTeX.
This skill drives StatsPAI through the canonical pipeline of an applied AER empirical paper. Each step maps to a section of the published paper and emits a paper-ready artifact (Table 1, event-study figure, Table 2 main results, robustness panel, replication stamp).
Why for Agents
- Self-describing:
sp.list_functions() / sp.describe_function(name) / sp.function_schema(name) — every public symbol is discoverable without doc lookup.
- Unified result: every estimator returns
CausalResult with .summary(), .plot(), .diagnostics, .to_latex(), .to_word(), .cite().
- One import, full pipeline: data contract → Table 1 → estimand-first DSL → identification graphs → main table → heterogeneity → mechanisms → robustness → replication package.
- Estimand-first:
sp.causal_question(...).identify() forces the "DID vs RD vs IV?" decision before estimation, with the identifying assumption written down — the way a referee expects to read it.
The AER-style empirical pipeline
The skill mirrors the canonical sections of an applied AER / QJE / AEJ paper. Each step below is one paper section and one set of artifacts on disk.
Paper section Step StatsPAI moves
─────────────────────────── ───── ────────────────────────────────────────────────
Pre-Analysis Plan −1 sp.power.* + freeze IdentificationPlan to disk
§1. Data 0 data_contract + sample-construction log (footnote 4)
§1.1 Descriptives (Table 1) 1 sp.sumstats · sp.balance_table · sp.describe
§2. Empirical Strategy 2 write equation + identifying assumption + sp.causal_question
(LLM-DAG addendum) 2.5 sp.llm_dag_propose · validate · constrained
§3. Identification graphics 3 event-study · first-stage F · McCrary · love plot
§4. Main Results (Table 2) 4 progressive controls + FE (sp.regtable / sp.causal)
§5. Heterogeneity (Table 3) 5 sp.subgroup_analysis · sp.continuous_did · CATE
§6. Mechanisms 6 sp.mediation · sp.decompose
§7. Robustness gauntlet 7 placebo · Oster · honest_did · E-value · 2-way / Conley SE · spec_curve
§8. Replication package 8 .to_latex() · .plot() · reproducibility stamp
All code blocks below share one running example (training → wage, with worker_id / firm_id / year / age / edu / tenure) purely for readability. Column names, population, estimand, and design values are illustrative — substitute the user's actual columns and research question. Only sp.* function names and argument shapes are normative.
Three domain modes (default = AER econ; alternates = epi & ML-causal)
The default playbook above is AER-style applied econometrics — the AEA convention: written-out estimating equation, identifying assumption table, design horse-race, full robustness gauntlet. The skill also ships two parallel sub-pipelines for the other two big causal-inference traditions, each reusing the same export stack (sp.regtable / sp.collect / sp.paper_tables) and result objects:
| Mode | Reader convention | Identification stack | Reporting stack | Jump to |
|---|
| Default — Applied Econ (AER / QJE / AEJ) | "Show the equation + identifying assumption + design horse-race; controls visible; clustered SE" | DID / IV / RD / SCM / matching / feols HDFE | AER house-style multi-column regtable + 8-section paper layout | §−1 → §8 (entire playbook above) |
| Mode A — Epidemiology / Public Health | "STROBE / TRIPOD-AI; target trial protocol; doubly-robust estimand; absolute & relative risk; KM survival" | Target-trial emulation · IPTW · g-formula · TMLE · Mendelian randomization · KM/AFT | Same regtable + collect, with risk-difference / hazard-ratio / E-value rows | §A. Epidemiology pipeline |
| Mode B — ML Causal Inference | "DML / meta-learners / causal forest / DR-learner; CATE distribution; policy value" | DML · S/T/X/R/DR-Learner · GRF causal forest · Dragonnet/TARNet/CEVAE · BCF · matrix completion | regtable ML horse-race + cate_plot + policy-value table + conformal_causal PI | §B. ML causal pipeline |
How to invoke a non-default mode (Claude / agent picks this up from the user's wording):
| User says... | Mode the skill switches to |
|---|
| "Run a DID / IV / RD / event study", "AER table", "applied micro" | Default (AER econ) |
| "Target trial emulation", "g-formula", "IPTW", "TMLE", "Mendelian randomization", "STROBE / TRIPOD", "公共健康 / 流行病学", "epi pipeline", "RWE study", "cohort study", "case-control" | Mode A (Epi) |
| "DML", "double machine learning", "causal forest", "meta-learner", "CATE", "Dragonnet", "BCF", "policy learning", "conformal causal", "ML causal", "uplift modeling", "因果机器学习" | Mode B (ML causal) |
| "Mix" (e.g. "estimate DID + then ML CATE on the heterogeneity") | Default + Mode B in sequence — every estimator returns the same CausalResult, drop them all into one sp.regtable(...) for the horse-race column |
The three modes share the same export stack, the same CausalResult interface, and the same sp.causal_question(...).identify() estimand-first DSL — switching modes only changes which Step 4 estimators you reach for, not the surrounding scaffolding. If you only want descriptive stats / Table 1 / a balance check, the AER sp.sumstats / sp.mean_comparison / sp.collect calls work in all three modes.
Paper-ready figure & table inventory (what to produce by section)
A modern AER paper has 5–7 figures and 3–5 main tables + an appendix robustness table. Every step below should leave at least one numbered artifact on disk. Default file names assume parallel .tex / .docx / .xlsx exports (the agent should produce all three so co-authors can edit in Word / Excel and the build system can use LaTeX):
| § | Artifact | StatsPAI primitive | Filenames (write all three) |
|---|
| §1 | Figure 1: raw trends / treatment rollout | sp.parallel_trends_plot · sp.treatment_rollout_plot | figures/fig1_trends.png |
| §1 | Table 1: summary stats (full / treated / control + Δ) | sp.sumstats + sp.mean_comparison(...).to_word()/.to_excel() (or sp.collect().add_summary().add_balance()) | tables/table1_summary.{tex,docx,xlsx} |
| §3 | Figure 2: identification graphic (event-study / first-stage / McCrary / RD scatter / SCM trajectory) | sp.enhanced_event_study_plot · sp.binscatter · sp.rdplot · sp.rddensity().plot() · sp.synthdid_plot | figures/fig2_identification.png |
| §4 | Table 2: main results — progressive controls | rt = sp.regtable(M1...M5, template="aer"); rt.to_word(...); rt.to_excel(...) | tables/table2_main.{tex,docx,xlsx} |
| §4 | Table 2-bis: design horse-race (OLS / IV / DID / DML) | sp.regtable(ols, iv, did, dml, ...).to_word/.to_excel | tables/table2b_designs.{tex,docx,xlsx} |
| §4 | Figure 3 (optional): coefficient plot across specs | sp.coefplot(M1, M2, M3, M4) | figures/fig3_coef.png |
| §5 | Table 3: heterogeneity by subgroup | sp.regtable(g_full, g_male, g_fem, g_q1...q4).to_word/.to_excel | tables/table3_heterogeneity.{tex,docx,xlsx} |
| §5 | Figure 4: dose-response / CATE | sp.dose_response(...).plot() · sp.cate_plot · sp.cate_group_plot | figures/fig4_cate.png |
| §6 | Table 4: mechanisms (mediation / decomposition) | sp.regtable(total, direct, indirect).to_word/.to_excel | tables/table4_mechanisms.{tex,docx,xlsx} |
| §7 | Table A1: robustness master (one row per check) | sp.regtable(rob1...robN, panel_labels=[...]).to_word/.to_excel — or sp.paper_tables(robustness=[...]).to_docx() | tables/tableA1_robustness.{tex,docx,xlsx} |
| §7 | Figure 5: spec curve | sp.spec_curve(...).plot() | figures/fig5_spec_curve.png |
| §7 | Figure 6: sensitivity dashboard / Cinelli–Hazlett contour | sp.sensitivity_dashboard · sp.sensitivity_plot | figures/fig6_sensitivity.png |
| §8 | Replication bundle: all tables in one Word/Excel/LaTeX file | sp.collect("Paper").add_summary(...).add_regression(...)...save("paper.{docx,xlsx,tex}") — or sp.paper_tables(main=, heterogeneity=, robustness=, placebo=).to_docx/.to_xlsx | replication/paper.{docx,xlsx,tex} |
Every CausalResult and OLS model can be passed straight into sp.regtable(...), sp.coefplot(...), and sp.collect(). Don't hand-roll LaTeX, and don't render Word/Excel from pandas — the export functions apply book-tab borders, AER-style stars, and the right SE label automatically.
Export cookbook — Word / Excel / LaTeX in one line
StatsPAI's export stack is the agent-native equivalent of Stata's outreg2 / esttab / collect and R's modelsummary / gtsummary. Three tiers, picked by scope of what you're exporting:
| Tier | Use when | API | Hot kwargs |
|---|
1. Single multi-column table (the outreg2 / summary_col equivalent) | Exporting one Table 2 / Table 3 / Table A1 with progressive columns | rt = sp.regtable(M1, M2, ..., template="aer", title=...) (default: all coefs incl. intercept)
rt.to_word("table2.docx")
rt.to_excel("table2.xlsx")
rt.to_latex() · rt.to_markdown() | template, coef_labels, model_labels, panel_labels, dep_var_labels, stats, stars, add_rows; opt-in filters: drop=["Intercept"] (suppress constant), keep=[focal] (focal-only) |
| 2. Multi-panel paper format (Tables 2 + 3 + A1 + A2 in one file) | Producing the paper-tables block — main + heterogeneity + robustness + placebo as a single document | pt = sp.paper_tables(main=[M1...M5], heterogeneity=[H1,H2,H3], robustness=[R1...Rn], placebo=[P1,P2], template="aer")
pt.to_docx("paper_tables.docx")
pt.to_xlsx("paper_tables.xlsx")
pt.to_latex(...) | main, heterogeneity, robustness, placebo, template, coef_labels, model_labels_<panel>, keep |
3. Full session bundle (Stata 15 collect equivalent) | Replication appendix that mixes summary stats + balance + multiple regression tables + headings + prose in one file | c = sp.collect("Paper title", template="aer")
c.add_heading("§1. Descriptives")
c.add_summary(df, vars=...)
c.add_balance(df, treatment=, variables=...)
c.add_regression(M1, M2, ..., title="Table 2")
c.add_text("Notes ...")
c.save("paper.docx") (auto-detect by extension; .xlsx/.tex/.md/.html/.txt all work) | add_heading(level=), add_summary(stats=, labels=), add_balance(weights=, test=), add_regression(**regtable_kwargs), add_table(result), add_text(...) |
Journal templates (apply the right SE label, star levels, and notes automatically):
sp.list_journal_templates()
rt = sp.regtable(M1, M2, M3, template="qje")
rt.to_word("table2_qje.docx")
sp.get_journal_template("aer")
Inline citations in prose (drop a coefficient straight into a sentence):
sp.cite(M3, "training")
sp.cite(M3, "training", output="latex")
Naming gotcha: sp.regtable(..., output="docx") is invalid — the enum is {"text", "latex", "tex", "html", "markdown", "md", "qmd", "quarto", "word", "excel"}. Use output="word" / "excel", or — simpler — drop output= and call .to_word(filename) / .to_excel(filename) on the result.
Notebook setup — CJK fonts + retina DPI
Run once at the top of every analysis script / notebook, before any matplotlib-backed plot (sp.regtable.to_* exporters do not need this — only .savefig / sp.coefplot / sp.binscatter / sp.cate_plot / etc.). Two failures it fixes in one shot:
- CJK labels render as ▢▢▢ tofu — the matplotlib default
DejaVu Sans carries no Chinese / Japanese / Korean glyphs, so ax.set_title("教育回报") silently degrades into squares.
- Plots look fuzzy on hi-DPI displays — matplotlib's default
figure.dpi=100 is half the density of a Retina / 4K screen.
Drop-in snippet
import matplotlib as mpl
import matplotlib.pyplot as plt
def setup_plot(retina: bool = True) -> None:
"""One-shot matplotlib boilerplate: CJK font fallback + retina DPI.
Idempotent — safe to call multiple times. Call BEFORE any plotting.
"""
mpl.rcParams["font.sans-serif"] = [
"PingFang SC", "Heiti SC", "Hiragino Sans GB",
"Microsoft YaHei", "SimHei", "SimSun",
"Noto Sans CJK SC", "Source Han Sans SC",
"WenQuanYi Micro Hei",
"Arial Unicode MS",
"DejaVu Sans",
]
mpl.rcParams["axes.unicode_minus"] = False
if retina:
mpl.rcParams["figure.dpi"] = 144
mpl.rcParams["savefig.dpi"] = 300
try:
from IPython import get_ipython
ipy = get_ipython()
if ipy is not None:
ipy.run_line_magic("config", "InlineBackend.figure_format = 'retina'")
except Exception:
pass
setup_plot()
Smoke test (5 seconds, run once after setup_plot())
fig, ax = plt.subplots(figsize=(4, 2.5))
ax.plot([0, 1, 2], [-1, 0, 1])
ax.set_title("中文标题测试 — Card (1995) 教育回报")
ax.set_xlabel("受教育年数 (years)")
fig.tight_layout()
fig.savefig("figures/_font_smoke_test.png", dpi=300)
If the saved PNG shows Chinese characters cleanly and the y-axis tick -1 is a real minus sign (not a square), the setup is good. Otherwise see troubleshooting below.
Troubleshooting
| Symptom | Fix |
|---|
Title still shows ▢▢▢ tofu after setup_plot() | Host has none of the listed fonts. Install one — macOS: pre-installed (no action). Linux: sudo apt install fonts-noto-cjk (Debian/Ubuntu) or sudo dnf install google-noto-sans-cjk-fonts (Fedora/RHEL). Windows: pre-installed. Then clear matplotlib's font cache: rm -rf ~/.cache/matplotlib (Linux/macOS) / %LOCALAPPDATA%\matplotlib (Windows), and restart the Python / Jupyter kernel. |
| Negative numbers render as ▢ | axes.unicode_minus = False was overridden by a later plt.style.use(...) or mpl.rcParams.update(...). Re-call setup_plot() after any style change. |
Plot blurry inside VSCode .ipynb | VSCode's notebook UI ignores figure.dpi for inline rendering. Either switch the cell output to "Open in Image Viewer", or use %matplotlib inline before setup_plot(). The saved .png (driven by savefig.dpi=300) is sharp regardless. |
sp.<plot>(...) output still shows tofu | The sp.* plotters honor global rcParams, so this only happens when setup_plot() was called after the plot was drawn. Move the call to the very top of the script. |
| Need to verify which font matplotlib picked | mpl.font_manager.findfont(mpl.font_manager.FontProperties(family=mpl.rcParams["font.sans-serif"])) returns the resolved file path — if it ends in DejaVuSans.ttf despite Chinese labels, no CJK font is installed. |
Persist as project default (optional)
Drop the same rcParams into a project-level matplotlibrc next to pyproject.toml so co-authors and CI runners pick it up without calling setup_plot():
# matplotlibrc — committed to the repo
font.sans-serif: PingFang SC, Heiti SC, Microsoft YaHei, SimHei, Noto Sans CJK SC, Arial Unicode MS, DejaVu Sans
axes.unicode_minus: False
figure.dpi: 144
savefig.dpi: 300
The setup_plot() function above is the in-script fallback when a project matplotlibrc is not present.
Step −1 — Pre-Analysis Plan (pre-data; AEA RCT Registry style)
sp.power(design, n=..., effect_size=..., power_target=...) is a unified dispatcher — leave one argument None to solve for it (sample size, MDE, or power). Convenience wrappers: sp.power_rct, sp.power_did, sp.power_rd, sp.power_iv, sp.power_cluster_rct, sp.power_ols.
sp.power("rct", effect_size=0.3, power_target=0.80)
sp.power("did", n=200, effect_size=0.15, power_target=0.80,
n_periods=4, n_treated_periods=2)
sp.power("cluster_rct", cluster_size=50, icc=0.05,
effect_size=0.2, power_target=0.80)
sp.pretrends_power(result)
Persist the PowerResult next to data_contract.json and empirical_strategy.md — a referee will ask whether the design was powered before data collection, not after.
Step 0 — Sample construction & data contract (Section "Data")
An AER §1 Data section has three jobs: (a) describe sources, (b) document every sample restriction (the "footnote 4" sample log), (c) lock the panel structure. StatsPAI assumes an analysis-ready DataFrame — do ETL (imputation, type coercion, merges, transforms) in pandas first, then run the 5-check contract.
0.1 Sample-construction log (footnote 4)
sample_log = []
df0 = df_raw.copy(); sample_log.append(("0. raw", len(df0)))
df1 = df0.dropna(subset=["wage"]); sample_log.append(("1. drop missing wage", len(df1)))
df2 = df1[df1["age"].between(18, 65)]; sample_log.append(("2. drop age outside 18-65", len(df2)))
df3 = df2[df2["industry"].isin(MANUF_CODES)]; sample_log.append(("3. keep manufacturing", len(df3)))
df = df3
import json; json.dump(sample_log, open("artifacts/sample_construction.json", "w"), indent=2)
Paste this log verbatim as footnote 4 of your paper. AER reviewers use it to reconstruct the analysis sample.
0.2 Five-check data contract (go / no-go gate)
import pandas as pd, numpy as np, statspai as sp
def data_contract(df, *, y, treatment, id=None, time=None, covariates=()):
"""Return a go/no-go dict. Stop the pipeline if any required check fails."""
keys = [y, treatment] + ([id, time] if id and time else []) + list(covariates)
c = {
"n_obs": len(df),
"dtypes": df[keys].dtypes.astype(str).to_dict(),
"n_missing": df[keys].isna().sum().to_dict(),
"n_dupes_on_keys": 0,
"panel_balanced": None,
"cohort_sizes": None,
}
if id and time:
c["n_dupes_on_keys"] = int(df.duplicated([id, time]).sum())
balanced = sp.balance_panel(df, entity=id, time=time)
c["panel_balanced"] = len(balanced) == len(df)
c["n_dropped_by_balance"] = len(df) - len(balanced)
if "first_treat_year" in df.columns:
c["cohort_sizes"] = (
df.drop_duplicates(id).groupby("first_treat_year").size().to_dict()
)
c["y_range"] = (float(df[y].min()), float(df[y].max()))
c["treatment_share"] = float(df[treatment].mean())
from scipy import stats
miss_y = df[y].isna()
c["mcar_hint"] = "likely MCAR (listwise OK)"
if miss_y.any() and (~miss_y).any():
for cov in covariates:
if df[cov].dtype.kind in "fi":
_, p = stats.ttest_ind(df.loc[miss_y, cov].dropna(),
df.loc[~miss_y, cov].dropna(),
equal_var=False)
if p < 0.05:
c["mcar_hint"] = f"NOT MCAR (y-miss differs on {cov}, p={p:.3f}) → use MI / IPW"
break
return c
contract = data_contract(df, y="wage", treatment="training",
id="worker_id", time="year",
covariates=["age", "edu", "tenure"])
assert contract["n_dupes_on_keys"] == 0, "duplicate (id, time) — fix before panel methods"
assert all(v == 0 for v in contract["n_missing"].values()), \
f"NaNs on keys: {contract['n_missing']}"
If any assertion fires, stop and fix it in pandas — StatsPAI estimators silently drop NaN rows, the most common source of "mysterious sample-size shrinkage" bugs. Persist:
import json; json.dump(contract, open("artifacts/data_contract.json", "w"), indent=2, default=str)
Step 1 — Descriptive statistics (Table 1)
The signature AER Table 1 has three column blocks plus a difference column:
| | (1) Full | (2) Treated | (3) Control | (4) Δ (t-test) |
The Imbens–Rubin rule of thumb: a normalized difference |Δ| / √((s²₁+s²₀)/2) > 0.25 flags substantive imbalance and should trigger matching / reweighting before you trust an OLS comparison.
print(sp.sumstats(df, vars=["wage","edu","exp","tenure","age"],
by="training", output="text"))
mc = sp.mean_comparison(df,
["age","edu","tenure","firm_size"],
group="training",
test="ttest",
title="Table 1. Summary statistics by treatment status")
mc.to_word ("tables/table1_summary.docx")
mc.to_excel("tables/table1_summary.xlsx")
open("tables/table1_summary.tex", "w").write(mc.to_latex())
sp.describe(df).to_markdown("references/codebook.md")
1.1 Multi-panel Table 1 (AER convention)
Group rows into Panel A: Outcomes, Panel B: Treatment intensity, Panel C: Controls, Panel D: Sample composition. The cleanest path is to push each panel into a sp.collect() bundle — one .save("file.docx") call then writes the whole multi-panel Table 1 with AER book-tab borders, in Word and Excel and LaTeX from one source.
panels = {
"A. Outcomes": ["wage", "log_wage", "weeks_employed"],
"B. Treatment": ["training", "training_hours"],
"C. Demographic controls": ["age", "edu", "female", "married"],
"D. Labor market": ["tenure", "firm_size", "industry_id"],
}
c1 = sp.collect("Table 1. Summary statistics", template="aer")
for label, vs in panels.items():
c1.add_heading(f"Panel {label}", level=2)
c1.add_summary(df, vars=vs, stats=["mean", "sd", "n"])
c1.save("tables/table1_summary.docx")
c1.save("tables/table1_summary.xlsx")
c1.save("tables/table1_summary.tex")
import io; buf = io.StringIO()
for label, vs in panels.items():
buf.write(f"\n% Panel {label}\n")
buf.write(sp.sumstats(df, vars=vs, by="training",
stats=["mean", "sd", "n"], output="latex"))
open("tables/table1_summary_flat.tex", "w").write(buf.getvalue())
1.2 Figure 1 — raw trends / treatment rollout
For DID / event-study designs, the first figure of an applied paper is almost always either (a) raw treated-vs-control means over time, or (b) the staggered rollout heat-strip showing which units are treated when. Both are one-liners:
sp.parallel_trends_plot(df, y="wage", time="year", treat="training",
treat_time=2015, ci=True,
labels={"treated":"Trained", "control":"Untrained"})\
.savefig("figures/fig1a_raw_trends.png", dpi=300)
sp.treatment_rollout_plot(df, time="year", treat="training", id="worker_id",
sort_by="first_treat_year",
title="Figure 1. Treatment timing")\
.savefig("figures/fig1b_rollout.png", dpi=300)
For matching designs, also produce a love plot of standardized differences pre/post matching (Step 3.4).
Step 2 — Empirical strategy (Section "Identification")
This is the heart of an AER paper. Before any code, write down the equation explicitly and state the identifying assumption. Vague identification language is the single most common reason a referee rejects an applied paper.
2.1 Equation × identifying assumption table
| Design | Estimating equation | Identifying assumption |
|---|
| 2×2 DID | Y_it = α_i + λ_t + β·D_it + X'γ + ε_it | parallel trends conditional on X |
| Event-study (CS / SA) | Y_it = α_i + λ_t + Σ_{e≠-1} β_e · 1{t-G_i = e} + ε_it | no anticipation + group-time PT |
| 2SLS | Y_i = α + β·D_i + X'γ + ε_i; D_i = π·Z_i + X'δ + u_i | exclusion + relevance + monotonicity |
| Sharp RD | Y_i = α + β·1{X_i ≥ c} + f(X_i) + ε_i (local poly) | continuity of E[Y(0)|X] at c, no manipulation |
| SCM | Ŷ_1t(0) = Σ_j ŵ_j Y_jt, τ_t = Y_1t − Ŷ_1t(0) for t≥T_0 | pre-period fit + interpolation validity |
| DML / unconfoundedness | Y_i = m(X_i) + β·D_i + ε_i (Robinson partialling-out) | unconfoundedness | X + overlap |
2.2 Design picker
When design="auto" is too opaque, use this decision tree:
┌─ running var + cutoff ───────────────── RDD (sp.rdrobust)
│
├─ exogenous instrument Z ─────────────── IV (sp.ivreg, sp.dml)
data + question ─┤
├─ pre/post × treat/control ─┬ 2 periods ── 2×2 DID (sp.did)
│ └ staggered ── CS / SA (sp.callaway_santanna)
│
├─ 1 treated unit + donor pool + long pre ── SCM (sp.synth, sp.sdid)
│
├─ high-dim X, selection-on-observables ── DML / Causal Forest
│
└─ none of the above ──────────────────── matching + E-value (sp.match, sp.evalue)
2.3 Estimand-first DSL = pre-registration
sp.causal_question declares the five-tuple (population, treatment, outcome, estimand, design) and .identify() picks the estimator with its assumptions written down. Treat the IdentificationPlan as your pre-registration artifact — freeze it before running q.estimate() so the analysis plan is a dated document, not a post-hoc rationalization.
q = sp.causal_question(
treatment="training", outcome="wage", data=df,
population="manufacturing workers, 2010–2020",
estimand="ATT",
design="auto",
time_structure="panel", time="year", id="worker_id",
covariates=["age", "edu", "tenure"],
)
plan = q.identify()
print(plan.summary())
print(plan.identification_story)
from pathlib import Path
bullets = lambda xs: "\n".join(f"- {x}" for x in xs) if xs else "- (none)"
Path("artifacts/empirical_strategy.md").write_text(
f"# Empirical Strategy (pre-registration)\n\n"
f"**Population**: {q.population}\n"
f"**Treatment**: `{q.treatment}` **Outcome**: `{q.outcome}`\n"
f"**Estimand**: {plan.estimand}\n"
f"**Estimator**: `sp.{plan.estimator}`\n\n"
f"## Estimating equation (paste from §2.1 row matching `{plan.estimator}`)\n"
f"```\n<paste here>\n```\n\n"
f"## Identification story\n{plan.identification_story}\n\n"
f"## Identifying assumptions (must defend in §2)\n{bullets(plan.assumptions)}\n\n"
f"## Auto-flagged warnings\n{bullets(plan.warnings)}\n\n"
f"## Fallback estimators (Step 7 robustness)\n{bullets(plan.fallback_estimators)}\n"
)
Path("artifacts/causal_question.yaml").write_text(q.to_yaml())
result = q.estimate()
2.5 (Optional) LLM-assisted DAG addendum
Useful when the user wants an explicit DAG to defend in §2 or §7. Pipe the discovered DAG into sp.causal(..., dag=...).
proposal = sp.llm_dag_propose(
variables=df.columns.tolist(),
domain="labor economics: training, wages, tenure",
client=my_llm_client,
)
validation = sp.llm_dag_validate(proposal, df, alpha=0.05)
print(validation.edge_evidence)
discovered = sp.llm_dag_constrained(
df,
descriptions={"wage": "monthly wage USD", "training": "0/1 program"},
oracle=my_llm_client.suggest_edges,
max_iter=3,
)
Step 3 — Identification graphics (Section "Identification, graphical evidence")
AER convention: the identification figure precedes the regression table. The reader should see graphical evidence that PT holds / first stage is strong / RD jumps cleanly before you ask them to trust your point estimate.
3.1 Event-study plot + numerical pre-trends test (DID identification)
Pre-period coefficients ≈ 0 (with the −1 reference period normalized to zero) is the visual evidence for parallel trends. Pair the figure with a numerical pre-trends test so reviewers don't have to eyeball it.
es = sp.event_study(df, y="wage", treat_time="first_treat_year",
time="year", unit="worker_id",
window=(-4, 4), ref_period=-1,
covariates=["age", "edu"])
sp.enhanced_event_study_plot(
es, shade_pre=True,
title="Figure 2a. Event-study coefficients (95% CI; ref. period = −1)")\
.savefig("figures/fig2a_event_study.png", dpi=300)
print(sp.pretrends_summary(es))
bd = sp.bacon_decomposition(df, y="wage", treat="training",
time="year", id="worker_id")
sp.bacon_plot(bd, title="Figure 2a-bis. Goodman-Bacon weights")\
.savefig("figures/fig2a2_bacon.png", dpi=300)
cs = sp.callaway_santanna(df, y="wage", g="first_treat_year",
t="year", i="worker_id",
x=["age", "edu"])
sp.did_summary_plot(cs, title="Figure 2a-ter. Dynamic ATT (Callaway–Sant'Anna)")\
.savefig("figures/fig2a3_csdid.png", dpi=300)
sp.bjs_pretrend_joint(cs, df, y="wage", group="first_treat_year",
time="year", first_treat="first_treat_year",
controls=["age", "edu"])
3.2 First-stage F-statistic + scatter (IV identification)
Rule of thumb: first-stage F ≥ 10 for OLS-style inference; F ≥ 23 for AR-equivalent inference (Stock–Yogo / Lee 2022).
iv = sp.ivreg("wage ~ (training ~ Z1 + Z2) + age + edu", df, cluster="firm_id")
print(iv.summary())
sp.binscatter(df, y="training", x="Z1",
controls=["age", "edu"],
n_bins=20, ci=True)\
.savefig("figures/fig_first_stage.png", dpi=300)
3.3 RD: McCrary density + canonical RD plot + binscatter
The signature RD figure is sp.rdplot (CCT-style binned scatter with local-polynomial fit on each side), paired with the McCrary manipulation test. Together they answer: (a) is there a visual jump? (b) is the density continuous at the cutoff?
sp.rdplot(df, y="y", x="running_var", c=0,
p=4, kernel="triangular", binselect="esmv",
shade_ci=True, ci_level=0.95)\
.savefig("figures/fig2b_rdplot.png", dpi=300)
sp.rddensity(df, x="running_var", c=0).plot()\
.savefig("figures/fig2b2_mccrary.png", dpi=300)
sp.binscatter(df, y="age", x="running_var", n_bins=40, ci=True)\
.savefig("figures/fig2b3_cov_binscatter.png", dpi=300)
3.4 Matching: love plot (standardized differences)
m = sp.match(df, y="wage", treat="training",
covariates=["age", "edu", "tenure"], method="nearest")
m.plot()\
.savefig("figures/fig2c_love_plot.png", dpi=300)
3.5 SCM: synthetic-control trajectory + gap plot
For synthetic-control designs the canonical Figure 2 is the treated-vs-synthetic time-series with treatment time annotated. synthdid_plot does this in one line.
sc = sp.synth(df, outcome="y", unit="unit", time="time",
treated_unit=1, treatment_time=2000)
sc.plot().savefig("figures/fig2d_synth_trajectory.png", dpi=300)
sd = sp.sdid(df, outcome="y", unit="unit", time="time",
treated_unit=1, treatment_time=2000)
sp.synthdid_plot(sd, title="Figure 2d. Synthetic DID")\
.savefig("figures/fig2d2_sdid.png", dpi=300)
3.6 Generic pre-flight (identification-independent)
sp.diagnose(df, y="wage", x=["age", "edu", "tenure"])
Identification-specific checks (PT for DID, weak-IV F, density for RD, common support for matching) are also auto-run inside sp.causal(...) in Step 4 — don't duplicate the numerics here, but DO produce the figures: a referee scans the figures first.
Step 4 — Main results (multi-regression tables, AER style)
This is the densest section of an applied paper. A modern AER §4 typically contains 2–3 multi-regression tables and one coefficient plot:
- Table 2 (main): progressive controls, 4–6 columns
- Table 2-bis (design horse race): same coefficient under OLS / 2SLS / DID / DML
- Table 2-ter (multi-outcome): same treatment, several outcomes side-by-side
- Figure 3 (coefplot): visual summary of β̂ and 95% CI across specs
Estimator routing (memorize this — getting it wrong silently produces nonsense):
- No FE →
sp.regress("y ~ x1 + x2", df, cluster="firm_id")
- High-dim FE →
sp.feols("y ~ x1 + x2 | fe1 + fe2", df, vcov={"CRV1":"firm_id"})
- Two-way cluster →
sp.feols(..., vcov={"CRV1":"firm_id+year"})
- 2SLS / IV →
sp.ivreg("y ~ (x ~ z) + controls", df, cluster=...)
- DID / event-study →
sp.callaway_santanna(...) / sp.sun_abraham(...)
Never write sp.regress("y ~ x | firm_id") — sp.regress does not parse | and silently treats x | firm_id as a single variable name. Use sp.feols for any formula containing |.
sp.regtable(*models, ...) is the workhorse. Useful kwargs:
keep : list of coef names to display (e.g. ["training"])
drop : list of coef names to suppress (controls)
model_labels : column labels ["(1) Baseline", "(2) +Demog", ...]
dep_var_labels : dep-var-row labels (for multi-outcome tables)
panel_labels : panel-A / panel-B layout for stacked tables
coef_labels : pretty-print names for coefficients
stars : "aer" → * 0.10 ** 0.05 *** 0.01 (or "default", "none")
stats : footer rows ["N","R2","Cluster","FE","DV mean", ...]
output : "latex" | "html" | "markdown" | "text"
filename : path to write the table
4.1 Pattern A — Progressive controls (the canonical Table 2)
Stable β̂ across columns ⇒ less concern that selection on observables is driving the estimate (Oster 2019 selection-stability logic; quantified in Step 7.5). sp.regtable(*models) is the StatsPAI equivalent of Stata outreg2 / esttab and R modelsummary::msummary / summary_col — it consolidates N models into ONE table with one column per model.
| (1) Baseline | (2) +Demographics | (3) +Labor-market | (4) +Region×Industry FE | (5) +Worker FE |
|---|
| Controls | none | age, edu | + tenure, firm_size | high-dim FE | individual FE |
M1 = sp.regress("wage ~ training", df, cluster="firm_id")
M2 = sp.regress("wage ~ training + age + edu", df, cluster="firm_id")
M3 = sp.regress("wage ~ training + age + edu + tenure + firm_size", df, cluster="firm_id")
M4 = sp.feols ("wage ~ training + age + edu + tenure + firm_size | region + industry + year",
df, vcov={"CRV1": "firm_id"})
M5 = sp.feols ("wage ~ training + age + edu + tenure + firm_size | worker_id + year",
df, vcov={"CRV1": "firm_id"})
rt = sp.regtable(M1, M2, M3, M4, M5,
template="aer",
coef_labels={"training": "Job training"},
model_labels=["(1) Baseline", "(2) +Demog.", "(3) +Labor-mkt",
"(4) Region×Ind. FE", "(5) Worker FE"],
stats=["N", "R2", "Cluster", "FE", "DV mean"],
title="Table 2. Effect of training on wages")
rt.to_word ("tables/table2_main.docx")
rt.to_excel("tables/table2_main.xlsx")
open("tables/table2_main.tex", "w").write(rt.to_latex())
4.2 Pattern B — Design horse race (Table 2-bis)
Show the same coefficient of interest under multiple identification strategies. This is the AER credibility move: convergent evidence across designs each making different identifying assumptions.
ols = sp.feols ("wage ~ training + age + edu + tenure | industry + year",
df, vcov={"CRV1": "firm_id"})
ivr = sp.ivreg("wage ~ (training ~ Z1 + Z2) + age + edu + tenure",
df, cluster="firm_id")
did = sp.callaway_santanna(df, y="wage", g="first_treat_year",
t="year", i="worker_id",
x=["age","edu","tenure"])
dml = sp.dml(df, y="wage", treat="training",
covariates=["age","edu","tenure","firm_size"], model="plr")
mtch = sp.match(df, y="wage", treat="training",
covariates=["age","edu","tenure"], method="nearest")
rt = sp.regtable(ols, ivr, did, dml, mtch,
template="aer",
coef_labels={"training": "Job training (β̂)"},
model_labels=["(1) OLS+FE", "(2) 2SLS", "(3) CS-DID",
"(4) DML-PLR", "(5) PSM"],
stats=["Estimator", "Identifying assumption",
"N", "R2 / Pseudo-R2", "Cluster"],
title="Table 2-bis. Convergent evidence across designs")
rt.to_word ("tables/table2b_design_race.docx")
rt.to_excel("tables/table2b_design_race.xlsx")
open("tables/table2b_design_race.tex", "w").write(rt.to_latex())
4.3 Pattern C — Multi-outcome table (same X, several Y's)
A single treatment, several outcomes. Use dep_var_labels so each column carries the Y name.
ys = ["wage", "log_wage", "weeks_employed", "left_firm", "promoted"]
multi_y = [sp.feols(f"{y} ~ training + age + edu + tenure | industry + year",
df, vcov={"CRV1": "firm_id"})
for y in ys]
rt = sp.regtable(*multi_y,
template="aer",
dep_var_labels=ys,
model_labels=["(1)","(2)","(3)","(4)","(5)"],
stats=["N","R2","DV mean","Cluster"],
title="Table 2-ter. Effect of training on multiple outcomes")
rt.to_word ("tables/table2c_multi_outcome.docx")
rt.to_excel("tables/table2c_multi_outcome.xlsx")
open("tables/table2c_multi_outcome.tex", "w").write(rt.to_latex())
4.4 Pattern D — Stacked Panel A / Panel B table
Same model family, two horizons (short-run / long-run) or two samples (pre-2015 / post-2015) stacked vertically. Use panel_labels.
panelA = [sp.feols("wage_t1 ~ training + X | industry + year", df, vcov={"CRV1":"firm_id"}),
sp.feols("wage_t1 ~ training + X | worker_id + year", df, vcov={"CRV1":"firm_id"})]
panelB = [sp.feols("wage_t5 ~ training + X | industry + year", df, vcov={"CRV1":"firm_id"}),
sp.feols("wage_t5 ~ training + X | worker_id + year", df, vcov={"CRV1":"firm_id"})]
rt = sp.regtable(*panelA, *panelB,
template="aer",
panel_labels=["Panel A. Short-run (1 year)",
"Panel A. Short-run (1 year)",
"Panel B. Long-run (5 years)",
"Panel B. Long-run (5 years)"],
model_labels=["(1) Industry FE","(2) Worker FE"]*2,
stats=["N","R2"],
title="Table 2-quater. Short- vs long-run effects")
rt.to_word ("tables/table2d_horizons.docx")
rt.to_excel("tables/table2d_horizons.xlsx")
open("tables/table2d_horizons.tex", "w").write(rt.to_latex())
4.5 Pattern E — IV reporting triplet (first-stage / reduced-form / 2SLS)
The textbook AER IV table presents the first stage, the reduced form, and the 2SLS in three columns so the reader can verify Wald-ratio = RF / FS.
fs = sp.feols ("training ~ Z + age + edu | industry + year", df, vcov={"CRV1":"firm_id"})
rf = sp.feols ("wage ~ Z + age + edu | industry + year", df, vcov={"CRV1":"firm_id"})
iv = sp.ivreg ("wage ~ (training ~ Z) + age + edu | industry+year",
df, cluster="firm_id")
rt = sp.regtable(fs, rf, iv,
template="aer",
keep=["Z", "training"],
dep_var_labels=["training", "wage", "wage"],
model_labels=["(1) First stage", "(2) Reduced form", "(3) 2SLS"],
stats=["First-stage F", "N", "R2", "Cluster"],
title="Table 2-quinto. IV reporting triplet")
rt.to_word ("tables/table2e_iv_triplet.docx")
rt.to_excel("tables/table2e_iv_triplet.xlsx")
open("tables/table2e_iv_triplet.tex", "w").write(rt.to_latex())
4.6 Pattern F — Causal-design main via sp.causal(...)
For DID / IV / RD / SCM mains, the sp.causal(...) orchestrator returns a CausalResult plus diagnostics and an automatic robustness preview. Pipe .result into regtable:
w = sp.causal(df, y="wage", treatment="training",
id="worker_id", time="year", design="did",
covariates=["age", "edu", "tenure"],
dag=discovered.dag)
print(w.diagnostics)
print(w.recommendation)
print(w.result.summary())
print(w.robustness_findings)
4.7 Figure 3 — coefficient plot of the main table
Replace one of the wall-of-numbers tables with a coefplot in the body, push the table to the appendix. Modern AER papers increasingly do this.
sp.coefplot(M1, M2, M3, M4, M5,
model_names=["(1)","(2)","(3)","(4)","(5)"],
variables=["training"],
title="Figure 3. β̂ on training across specifications (95% CI)",
alpha=0.05)\
.savefig("figures/fig3_coefplot.png", dpi=300)
Reporting checklist for the Table 2 footnote (AER house style)
- Standard-error cluster level (and whether it's two-way / Conley)
- Fixed-effects absorbed —
regtable auto-adds one footer row per FE name (e.g. Industry FE: Yes / Year FE: Yes / Worker_id FE: No) whenever any column comes from sp.feols(... | fe1 + fe2 ...). Don't hand-roll these rows.
- Sample size and number of clusters
- Estimator (OLS / 2SLS / CS-DID / SCM / DML)
- Stars convention
* 0.10 ** 0.05 *** 0.01
- Mean of dependent variable in the estimation sample (so β̂ can be read as a % of the base rate)
Step 5 — Heterogeneity (Table 3 + Figure 4)
The AER §5 Heterogeneity combines (a) a subgroup regression table with one column per subgroup (binary moderators + interaction terms), and (b) a CATE / dose-response figure for continuous moderators. Both should appear; they answer different questions.
5.1 Pattern G — Subgroup regtable (Table 3)
One column per subgroup, with the same specification re-run on each slice. Clean, easy to read, expected by referees.
slices = {
"(1) All": df,
"(2) Female": df[df["female"] == 1],
"(3) Male": df[df["female"] == 0],
"(4) Low skill": df[df["skill_quartile"].isin([1, 2])],
"(5) High skill": df[df["skill_quartile"].isin([3, 4])],
"(6) Small firm": df[df["firm_size"] < 100],
"(7) Large firm": df[df["firm_size"] >= 100],
}
gmodels = [sp.feols("wage ~ training + age + edu + tenure | industry + year",
d, vcov={"CRV1": "firm_id"}) for d in slices.values()]
rt = sp.regtable(*gmodels,
template="aer",
coef_labels={"training": "Training"},
model_labels=list(slices),
stats=["N","R2","DV mean"],
title="Table 3. Heterogeneous effects of training")
rt.to_word ("tables/table3_heterogeneity.docx")
rt.to_excel("tables/table3_heterogeneity.xlsx")
open("tables/table3_heterogeneity.tex", "w").write(rt.to_latex())
5.2 Interaction-form heterogeneity (alternative Table 3)
Test moderation formally with interaction terms — referees often ask whether the gap between subgroups is statistically significant, which requires the interaction p-value.
H1 = sp.feols("wage ~ training*female + age + edu + tenure | industry + year",
df, vcov={"CRV1": "firm_id"})
H2 = sp.feols("wage ~ training*C(skill_quartile) + age + edu + tenure | industry + year",
df, vcov={"CRV1": "firm_id"})
H3 = sp.feols("wage ~ training*log_firm_size + age + edu + tenure | industry + year",
df, vcov={"CRV1": "firm_id"})
rt = sp.regtable(H1, H2, H3,
template="aer",
keep=["training", "training:female",
"training:C(skill_quartile)[T.2]",
"training:C(skill_quartile)[T.3]",
"training:C(skill_quartile)[T.4]",
"training:log_firm_size"],
model_labels=["(1) ×Female", "(2) ×Skill quartile", "(3) ×log(Firm size)"],
stats=["N","R2"],
title="Table 3-bis. Interaction-form heterogeneity")
rt.to_word ("tables/table3b_interactions.docx")
rt.to_excel("tables/table3b_interactions.xlsx")
open("tables/table3b_interactions.tex", "w").write(rt.to_latex())
5.3 Figure 4 — dose-response (continuous treatment)
dr = sp.dose_response(df, y="wage", treat="training_hours",
covariates=["age","edu","tenure","firm_size"],
n_dose_points=20)
dr.plot(title="Figure 4a. Dose-response: training hours → wage")\
.savefig("figures/fig4a_dose_response.png", dpi=300)
sp.continuous_did(df, y="wage", dose="training_hours",
time="year", id="worker_id").plot()\
.savefig("figures/fig4a2_continuous_did.png", dpi=300)
5.4 Figure 4-bis — CATE distribution (DR-Learner / causal forest)
The CATE plotters need a result that exposes per-row conditional effects.
sp.causal_forest returns a summary result without .cate_estimates, so
for the CATE histogram and grouped bar chart use a meta-learner (or any
DR-/X-/R-learner) and pass its CATE table to cate_group_plot.
ml = sp.metalearner(df, y="wage", treat="training",
covariates=["age","edu","tenure","firm_size"], learner="dr")
sp.cate_plot(ml, kind="hist",
title="Figure 4b. Distribution of conditional ATE")\
.savefig("figures/fig4b_cate_hist.png", dpi=300)
g = sp.cate_by_group(ml, df, by="skill_quartile", n_groups=4)
sp.cate_group_plot(g, title="Figure 4c. CATE by skill quartile")\
.savefig("figures/fig4c_cate_by_group.png", dpi=300)
print(sp.cate_summary(ml))
print(g)
5.5 Subgroup-analysis dispatcher (one-liner)
sp.subgroup_analysis(df, formula="wage ~ training + age + edu + tenure",
x="training",
by={"gender": "female", "skill": "skill_quartile"},
robust="hc1")
For continuous moderators or many subgroups, prefer:
sp.continuous_did(...) — dose-response under DID
sp.metalearner(..., learner="dr") + sp.cate_plot / sp.cate_by_group — DR-Learner CATE (recommended for plotting)
sp.causal_forest(formula="wage ~ training | X", data=df) — CATE summary only (no per-row .cate_estimates)
Step 6 — Mechanisms / channels
sp.mediation(df, y="wage", d="training", m="hours_worked",
X=["age", "edu", "tenure"])
sp.decompose(...)
Step 7 — Robustness gauntlet (the AER referee gauntlet)
The seven canonical robustness blocks of an applied paper. A modern AER paper expects most of these in the body or appendix — assemble a Table A1-style robustness panel from the outputs.
7.1 Placebo tests
sp.rdplacebo(df, y="y", x="running_var", c=0,
placebo_cutoffs=[-2, -1, 1, 2])
sp.synth_time_placebo(df, outcome="y", unit="unit", time="time",
treated_unit=1, treatment_time=2000,
n_placebo_times=10)
sp.synthdid_placebo(...)
7.2 Alternative samples
result_no_outliers = sp.causal(df.query("wage < wage.quantile(0.99)"), ...)
result_drop_early = sp.causal(df.query("first_treat_year > 2008"), ...)
result_balanced = sp.causal(sp.balance_panel(df, entity="worker_id", time="year"), ...)
7.3 Alternative specifications (spec curve)
sp.spec_curve(df, y="wage", x="training",
controls=[["age"], ["age", "edu"], ["age", "edu", "tenure"]],
subsets={"all": None, "manuf": df["industry"].eq("manufacturing")})
7.4 Alternative standard errors
Cluster-level choice is itself a robustness check — show the result is not driven by an over-narrow cluster.
sp.twoway_cluster(M3, df, cluster1="firm_id", cluster2="year")
sp.conley(M3, df, lat="lat", lon="lon",
dist_cutoff=100, kernel="uniform")
sp.feols("y ~ x | firm_id + year", df,
vcov={"CRV1": "firm_id+year"})
7.5 Oster (2019) selection bound
"How big would unobserved selection have to be for β to flip sign / vanish?" The Oster δ tells you whether the bound on selection on unobservables, relative to selection on observables, has to exceed an implausible value to overturn the result.
sp.oster_bounds(data=df, y="wage", treat="training",
controls=["age", "edu", "tenure"],
r_max=1.3)
sp.oster_delta(data=df, y="wage",
x_base=["training"],
x_controls=["age", "edu", "tenure"],
r_max=1.3)
7.6 Honest DID — Rambachan–Roth (2023) PT sensitivity
honest_did only consumes a CS / SA / did_multiplegt event-study result
(or aggte(result, type='dynamic')). Pass the cs object built in §3.1,
not a generic OLS/FE main-table result:
sp.honest_did(cs, method="smoothness")
7.7 E-value & unified sensitivity (unmeasured confounding)
sp.evalue(estimate=result.params["training"],
ci=tuple(result.conf_int().loc["training"]),
measure="RR")
sp.unified_sensitivity(result, r2_treated=0.05,
r2_controlled=0.10,
include_oster=True)
sp.sensitivity_dashboard(result)
7.8 RD-specific bandwidth / kernel sensitivity
sp.rdbwsensitivity(df, y="y", x="running_var", c=0,
bw_grid=[0.5, 1.0, 1.5, 2.0])
7.9 TWFE diagnostic (staggered DID)
Goodman-Bacon decomposition flags when the TWFE estimate is contaminated by forbidden 2×2's (already-treated as control).
sp.bacon_decomposition(df, y="y", treat="training",
time="year", id="worker_id")
7.10 Sequential confounder blocks (Oster-style robustness table)
blocks = {
"M1 base": [],
"M2 +demographics": ["age", "edu"],
"M3 +labor-market": ["age", "edu", "tenure", "firm_size"],
"M4 +psychosocial": ["age", "edu", "tenure", "firm_size", "motivation"],
}
models = [sp.regress(f"wage ~ training + {' + '.join(c) or '1'}",
df, cluster="firm_id")
for c in blocks.values()]
rt = sp.regtable(*models,
template="aer",
model_labels=list(blocks),
title="Table 7. Selection-stability across confounder blocks")
rt.to_word ("tables/table_robust_blocks.docx")
rt.to_excel("tables/table_robust_blocks.xlsx")
open("tables/table_robust_blocks.tex", "w").write(rt.to_latex())
7.11 Pattern H — Robustness master table (Table A1, one row per check)
The canonical AER appendix Table A1 stacks every robustness specification next to the baseline so reviewers see at a glance that β̂ survives. sp.regtable accepts any mix of EconometricResults / CausalResult, so build the list dynamically:
baseline = sp.feols("wage ~ training + age + edu + tenure | industry + year",
df, vcov={"CRV1": "firm_id"})
rob = {
"(1) Baseline": baseline,
"(2) Drop top 1% wage": sp.feols("wage ~ training + age + edu + tenure | industry + year",
df.query("wage < wage.quantile(0.99)"),
vcov={"CRV1": "firm_id"}),
"(3) Balanced panel": sp.feols("wage ~ training + age + edu + tenure | industry + year",
sp.balance_panel(df, entity="worker_id", time="year"),
vcov={"CRV1": "firm_id"}),
"(4) Drop early cohorts": sp.feols("wage ~ training + age + edu + tenure | industry + year",
df.query("first_treat_year > 2008"),
vcov={"CRV1": "firm_id"}),
"(5) Worker FE": sp.feols("wage ~ training + age + edu + tenure | worker_id + year",
df, vcov={"CRV1": "firm_id"}),
"(6) 2-way cluster": sp.feols("wage ~ training + age + edu + tenure | industry + year",
df, vcov={"CRV1": "firm_id+year"}),
"(7) Conley spatial SE": sp.conley(baseline, df,
lat="lat", lon="lon", dist_cutoff=100),
"(8) Log outcome": sp.feols("log_wage ~ training + age + edu + tenure | industry + year",
df, vcov={"CRV1": "firm_id"}),
"(9) IHS outcome": sp.feols("ihs_wage ~ training + age + edu + tenure | industry + year",
df, vcov={"CRV1": "firm_id"}),
"(10) PSM-weighted": sp.match(df, y="wage", treat="training",
covariates=["age","edu","tenure","firm_size"],
method="nearest"),
"(11) Entropy balance": sp.ebalance(df, y="wage", treat="training",
covariates=["age","edu","tenure","firm_size"]),
"(12) DML-PLR": sp.dml(df, y="wage", treat="training",
covariates=["age","edu","tenure","firm_size"], model="plr"),
}
rt = sp.regtable(*rob.values(),
template="aer",
coef_labels={"training": "Training (β̂)"},
model_labels=list(rob),
stats=["N", "R2", "Cluster", "FE"],
title="Table A1. Robustness of the main estimate")
rt.to_word ("tables/tableA1_robustness.docx")
rt.to_excel("tables/tableA1_robustness.xlsx")
open("tables/tableA1_robustness.tex", "w").write(rt.to_latex())
sp.paper_tables(main=[M1, M2, M3, M4, M5],
robustness=list(rob.values()),
template="aer",
coef_labels={"training": "Training"},
model_labels_main=["(1)","(2)","(3)","(4)","(5)"],
model_labels_robustness=list(rob),
).to_docx("tables/paper_tables.docx")
7.12 Figure 5 — coefficient forest plot of all robustness specs
A single visual summary that an AER referee can parse in 5 seconds: every β̂ and 95% CI on one axis. Confirms the estimate is not knife-edge.
sp.coefplot(*rob.values(),
model_names=list(rob),
variables=["training"],
title="Figure 5. β̂ on training across robustness specifications",
alpha=0.05)\
.savefig("figures/fig5_robustness_forest.png", dpi=300)
7.13 Figure 5-bis — spec curve
The Simonsohn et al. (2020) specification curve plots β̂ across every combination of {controls × subsamples × outcome transforms × SE types}. Useful when you want to head off "what about specification X?" referee letters.
sc = sp.spec_curve(df, y="wage", x="training",
controls=[["age"], ["age","edu"], ["age","edu","tenure"],
["age","edu","tenure","firm_size"]],
se_types=["robust", "cluster_firm_id", "cluster_firm_id_year"],
y_transforms=["identity", "log", "ihs"],
subsets={"all": None,
"manuf": df["industry"].eq("manufacturing"),
"no99": df["wage"] < df["wage"].quantile(0.99)},
cluster_var="firm_id")
sc.plot(title="Figure 5-bis. Specification curve")\
.savefig("figures/fig5b_spec_curve.png", dpi=300)
7.14 Figure 6 — sensitivity dashboard
One-page Cinelli–Hazlett + Oster + E-value summary for the §7 closing argument.
sens = sp.unified_sensitivity(baseline,
r2_treated=0.05, r2_controlled=0.10,
include_oster=True)
sp.sensitivity_plot(sens.results,
original_estimate=baseline.params["training"],
original_ci=tuple(baseline.conf_int().loc["training"]),
title="Figure 6. Sensitivity to unobserved confounding")\
.savefig("figures/fig6_sensitivity.png", dpi=300)
sp.sensitivity_dashboard(baseline)\
.savefig("figures/fig6b_sensitivity_dashboard.png", dpi=300)
7.15 One-stop robustness reporter
sp.diagnose_result(result)
sp.robustness_report(df, formula="wage ~ training + age + edu",
x="training", cluster_var="firm_id")
sp.estat(result, test="all")
Step 8 — Replication package
The agent's job at §8 is to produce a single artifact a co-author can open in Word, Excel, or LaTeX without further StatsPAI calls. There are three packaging tiers, picked by what you need to ship:
8.1 Per-result export (one estimator → one Word/Excel file)
result.to_docx("tables/main_result.docx",
title="Table 2. Main result")
result.to_latex(caption="Main result", label="tab:main")
result.plot().savefig("figures/main.png", dpi=300)
print(sp.cite(result, "training"))
8.2 Per-table export (already covered in Steps 4 / 5 / 7)
Every sp.regtable(*models) returns a RegtableResult with .to_word() / .to_excel() / .to_latex() / .to_markdown() / .to_html(). Use these in §4–§7 so that by the time you reach §8 the tables/ folder already has parallel .docx / .xlsx / .tex for every numbered table.
8.3 Multi-panel paper-format (Tier 2 — Tables 2 + 3 + A1 + A2 in one file)
sp.paper_tables(
main = [M1, M2, M3, M4, M5],
heterogeneity = [g_full, g_fem, g_male],
robustness = list(rob.values()),
placebo = [pb1, pb2],
template = "aer",
coef_labels = {"training": "Training"},
keep = ["training"],
).to_docx("replication/paper_tables.docx")
8.4 Full session bundle (Tier 3 — the Stata collect equivalent)
The single most efficient §8 deliverable: descriptives + balance + main + heterogeneity + robustness + prose in one Word file. sp.collect() is the agent-native counterpart of Stata 15's collect and R's gtsave.
c = sp.collect("Effect of Training on Wages — Replication", template="aer")
c.add_heading("§1. Descriptive statistics", level=1)
c.add_summary(df, vars=["wage","age","edu","tenure"],
stats=["mean","sd","n"],
title="Table 1. Summary statistics")
c.add_balance(df, treatment="training",
variables=["age","edu","tenure","firm_size"],
title="Table 1b. Balance by treatment")
c.add_heading("§4. Main results", level=1)
c.add_regression(M1, M2, M3, M4, M5,
model_labels=["(1)","(2)","(3)","(4)","(5)"],
stats=["N","R2","Cluster","FE"],
title="Table 2. Effect of training on wages")
c.add_heading("§5. Heterogeneity", level=1)
c.add_regression(*gmodels,
model_labels=list(slices),
title="Table 3. Heterogeneous effects")
c.add_heading("§7. Robustness", level=1)
c.add_regression(*rob.values(),
model_labels=list(rob),
title="Table A1. Robustness")
c.add_text(
"Standard errors clustered at the firm level. *** p<0.01, ** p<0.05, * p<0.10. "
"Sample restrictions and full variable definitions are documented in "
"artifacts/sample_construction.json and artifacts/data_contract.json.",
title="Notes",
)
c.save("replication/paper.docx")
c.save("replication/paper.xlsx")
c.save("replication/paper.tex")
c.save("replication/paper.md")
Inspect the bundle before saving:
print(c)
print(c.list())
8.5 Reproducibility stamp
import json
json.dump({
"statspai": sp.__version__,
"seed": 42,
"n_obs": result.data_info["n_obs"],
"estimand": result.estimand,
"estimate": float(result.params["training"]),
"ci95": list(result.conf_int().loc["training"]),
"pre_registration": "artifacts/empirical_strategy.md",
"data_contract": "artifacts/data_contract.json",
"sample_log": "artifacts/sample_construction.json",
"paper_bundle": "replication/paper.docx",
}, open("artifacts/result.json", "w"), indent=2)
For full-draft generation (abstract + methods + results + bibliography), see sp.paper(result, ...) — out of scope for this skill; call it only when the user explicitly asks for a paper draft.
Regtable cookbook (one-page recipe index)
sp.regtable(*models, ...) is the single primitive behind every multi-regression table in an AER paper. The eight patterns above map to:
| Pattern | What varies across columns | Step |
|---|
| A. Progressive controls | covariate set / FE depth | 4.1 — Table 2 |
| B. Design horse race | identification strategy (OLS / 2SLS / DID / DML / PSM) | 4.2 — Table 2-bis |
| C. Multi-outcome | dependent variable Y | 4.3 — Table 2-ter |
| D. Stacked Panel A / B | horizon / sample (panel rows × spec columns) | 4.4 — Table 2-quater |
| E. IV reporting triplet | first stage / reduced form / 2SLS | 4.5 — Table 2-quinto |
F. sp.causal(...) orchestrator | 1 column, full diagnostics | 4.6 |
| G. Subgroup table | subsample (full / female / male / Q1…Q4) | 5.1 — Table 3 |
| H. Robustness master | every robustness check stacked | 7.11 — Table A1 |
Default sp.regtable settings for AER house style — and the export pipeline
(produce .docx + .xlsx + .tex from the same RegtableResult):
rt = sp.regtable(*models,
template="aer",
coef_labels={"training": "Training"},
model_labels=[...],
stats=["N", "R2", "Cluster", "FE", "DV mean"],
title="Table N. ...")
rt.to_word ("tables/tableN.docx")
rt.to_excel("tables/tableN.xlsx")
open("tables/tableN.tex", "w").write(rt.to_latex())
print(rt.to_text())
For pyfixest-style native output, sp.etable(*models, ...) is the alternative; for stacking many tables in one .docx, use sp.paper_tables(...) (Tier 2) or sp.collect() (Tier 3) — see Step 8.
Figure factory (the 12 standard AER figures)
| # | Figure | StatsPAI call | Section |
|---|
| 1a | Raw trends (DID Figure 1) | sp.parallel_trends_plot(df, y, time, treat, treat_time, ci=True) | §1 |
| 1b | Treatment rollout heatmap | sp.treatment_rollout_plot(df, time, treat, id) | §1 |
| 2a | Event-study coefficients | sp.enhanced_event_study_plot(sp.event_study(...)) | §3 |
| 2a' | Bacon weights | sp.bacon_plot(sp.bacon_decomposition(...)) | §3 |
| 2a'' | CS-DID dynamic effects | sp.did_summary_plot(sp.callaway_santanna(...)) | §3 |
| 2b | RD canonical plot | sp.rdplot(df, y, x, c) | §3 |
| 2b' | McCrary density | sp.rddensity(df, x, c).plot() | §3 |
| 2c | Matching love plot | sp.match(...).plot() | §3 |
| 2d | SCM trajectory | sp.synth(...).plot() · sp.synthdid_plot(sp.sdid(...)) | §3 |
| 3 | Coefficient plot of main specs | sp.coefplot(M1...M5, variables=["x"]) | §4 |
| 4a | Dose-response | sp.dose_response(...).plot() | §5 |
| 4b | CATE histogram | sp.cate_plot(ml, kind="hist") (ml = sp.metalearner(..., learner='dr')) | §5 |
| 4c | CATE by group bar | g = sp.cate_by_group(ml, df, by=..., n_groups=4); sp.cate_group_plot(g) | §5 |
| 5 | Robustness forest plot | sp.coefplot(*rob.values(), variables=["x"]) | §7 |
| 5b | Specification curve | sp.spec_curve(...).plot() | §7 |
| 6 | Sensitivity dashboard | sp.sensitivity_dashboard(result) · sp.sensitivity_plot(...) | §7 |
| 7 | Final result.plot() | result.plot() (estimator-specific) | §8 |
Every plotting function above accepts ax= so panels can be combined with matplotlib subplots, and returns a Figure that supports .savefig(path, dpi=300) for publication output.
§A. Epidemiology / public health pipeline (Mode A)
Convention: STROBE (observational) / TRIPOD-AI (prediction) reporting. The modern epi gold standard is target-trial emulation (Hernán & Robins) — write the protocol of the hypothetical RCT first, then emulate it with observational data using a doubly-robust estimator. Outcomes are commonly risk differences, risk ratios, hazard ratios, or restricted mean survival time, not just OLS coefficients. The skill mirrors the AER 8-section flow but swaps the Step-4 estimator stack and adds survival/MR-specific reporting rows.
Running example: statin_initiation → 5-yr_MACE in an EHR cohort (patient_id / index_date / age / sex / ldl_baseline / comorbidity_index / followup_days / event). The exposure is time-varying, confounders are time-varying, and competing-risk censoring matters — the canonical setting where naïve OLS / Cox-with-baseline-adjustment is biased.
A.0 Cohort construction & target-trial protocol
import statspai as sp
protocol = sp.target_trial.TargetTrialProtocol(
eligibility = "adults 40-75, LDL ≥ 130, no prior MI/stroke, no statin in 12mo washout",
treatment_strategies = ["initiate statin within 30d of index", "no statin within 30d"],
assignment = "observational; emulate randomization via IPTW + g-formula",
time_zero = "index_date (first eligible cardiology visit)",
followup_end = "first MACE / death / disenrollment / index_date + 5yr",
outcome = "first MACE (composite: MI, stroke, cardiovascular death)",
causal_contrast = "per-protocol risk difference at 5 years",
analysis_plan = "IPTW-MSM + g-formula + TMLE triplet; report all three with CIs",
baseline_covariates = ["age","sex","ldl_baseline","comorbidity_index","smoker"],
time_varying_covariates = ["ldl_current"],
)
cohort = sp.target_trial_emulate(df, protocol=protocol, id="patient_id", time="followup_days",
treat="statin_initiation", event="mace")
A.1 Table 1 — baseline characteristics by exposure
mc = sp.mean_comparison(cohort, ["age","sex","ldl_baseline","comorbidity_index","smoker"],
group="statin_initiation", test="ttest",
title="Table 1. Baseline characteristics by statin initiation")
mc.to_word ("tables/table1_epi.docx")
mc.to_excel("tables/table1_epi.xlsx")
A.2 Identification — DAG, propensity overlap, KM curves
dag = sp.dag(["age","sex","ldl_baseline","comorbidity_index","statin_initiation","mace"])
dag.add_edges([("age","ldl_baseline"),("age","statin_initiation"),
("ldl_baseline","statin_initiation"),("statin_initiation","mace"),
("ldl_baseline","mace"),("comorbidity_index","statin_initiation"),
("comorbidity_index","mace")])
adj = dag.adjustment_set(treatment="statin_initiation", outcome="mace")
ps = sp.propensity_score(cohort, treatment="statin_initiation",
covariates=["age","sex","ldl_baseline","comorbidity_index","smoker"],
method="logit")
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(6,4))
ax.hist(ps[cohort["statin_initiation"]==1], bins=40, alpha=0.5, label="Treated")
ax.hist(ps[cohort["statin_initiation"]==0], bins=40, alpha=0.5, label="Control")
ax.set_xlabel("Estimated propensity score"); ax.legend()
fig.savefig("figures/figA1_ps_overlap.png", dpi=300)
km = sp.kaplan_meier(cohort, duration="followup_days", event="mace", group="statin_initiation")
km.plot().savefig("figures/figA2_km.png", dpi=300)
A.3 Main estimate — IPTW · g-formula · TMLE triplet (the modern epi standard)
Report all three in one regtable so the reader sees convergent doubly-robust evidence — this is the epi equivalent of the AER design horse race:
iptw = sp.msm(cohort, y="mace", treat="statin_initiation",
id="patient_id", time="month",
time_varying=["ldl_current","comorbidity_index"],
baseline=["age","sex"])
gcomp = sp.gformula(cohort, y="mace", treat="statin_initiation",
covariates=["age","sex","ldl_baseline","comorbidity_index","smoker"],
time_varying=["ldl_current"],
intervention="always_treat", reference="never_treat")
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
sl_lib = [LogisticRegression(max_iter=1000),
GradientBoostingClassifier(),
RandomForestClassifier()]
tmle = sp.tmle(cohort, y="mace", treat="statin_initiation",
covariates=["age","sex","ldl_baseline","comorbidity_index","smoker"],
outcome_library=sl_lib, propensity_library=sl_lib)
hal = sp.hal_tmle(cohort, y="mace", treat="statin_initiation",
covariates=["age","sex","ldl_baseline","comorbidity_index","smoker"],
variant="ate")
rt = sp.regtable(iptw, gcomp, tmle, hal,
model_labels=["(1) IPTW-MSM","(2) g-formula","(3) TMLE","(4) HAL-TMLE"],
stats=["N","Effect type","Risk diff. (RD)","Risk ratio (RR)"],
title="Table 2. Effect of statin initiation on 5-yr MACE — convergent estimators")
rt.to_word ("tables/table2_epi.docx"); rt.to_excel("tables/table2_epi.xlsx")
A.4 Survival outcomes — KM / AFT / restricted mean
aft = sp.aft("Surv(followup_days, mace) ~ statin_initiation + age + sex + ldl_baseline",
cohort, family="weibull")
rt_surv = sp.regtable(aft,
stats=["N","Events","Median survival","RMST (5yr)","HR (PH)"],
title="Table 3. Survival analysis (Weibull AFT)")
rt_surv.to_word("tables/table3_survival.docx")
A.5 Mendelian randomization (genetic IV — when relevant)
ivw = sp.mr_ivw (beta_exposure, beta_outcome, se_exposure, se_outcome)
egger = sp.mr_egger (beta_exposure, beta_outcome, se_exposure, se_outcome)
median = sp.mr_median(beta_exposure, beta_outcome, se_exposure, se_outcome, penalized=True)
rt_mr = sp.regtable(ivw, egger, median,
model_labels=["IVW","MR-Egger","Weighted median"],
title="Table 4. Mendelian randomization — sensitivity stack")
rt_mr.to_word("tables/table4_mr.docx")
A.6 Robustness — E-value, bounds, principal stratification
ev = sp.evalue(estimate=tmle.point_estimate, ci=tmle.ci, measure="RR")
bds = sp.bounds(cohort, y="mace", treat="statin_initiation", method="manski")
ps_strat = sp.principal_strat(cohort, y="mace", treat="statin_initiation",
instrument="zip_pharmacy_density",
strata="compliance_type")
A.7 Reporting checklist (epi-specific footer for notes=)
When producing the Table-2 footer, include — in addition to the AER stars/SE language:
- Cohort size, person-years of follow-up, event count
- Adjustment set (variables in the back-door set, not just "controls")
- Positivity diagnostic (PS truncation rule, % of cohort with extreme weights)
- E-value for the main effect and its CI bound
- For survival: proportional-hazards check (Schoenfeld residuals p-value) or "PH violated, RMST reported instead"
- STROBE checklist completion (cite as a supplementary file)
Output path stays identical: every estimator above returns a CausalResult and slots straight into sp.regtable(...) / sp.collect(...) / sp.paper_tables(...). Doubly-robust estimators (TMLE, HAL-TMLE, AIPW) are preferred over single-robust IPTW or g-formula alone — report all three for transparency, but treat TMLE as the primary.
§B. ML causal inference pipeline (Mode B)
Convention: estimand-first, doubly-robust, ML-nuisance-learned, with CATE distribution + policy value as first-class outputs (not just a single ATE). The skill mirrors the AER skeleton but the Step-4 estimator stack is DML + meta-learners + causal forest + neural-causal + BCF, and Step-5 always reports a CATE distribution. Uncertainty is quantified by conformal prediction (sp.conformal_causal), not just normal-approximation SE.
Running example: a marketing uplift study — treatment = personalized_offer, outcome = revenue_30d, with 80+ covariates including text features (prior_browsing_text).
B.0 Prep + nuisance super-learner
import statspai as sp
from sklearn.model_selection import train_test_split
train, holdout = train_test_split(df, test_size=0.2, stratify=df["treatment"], random_state=42)
from sklearn.linear_model import LogisticRegression, LassoCV
from sklearn.ensemble import GradientBoostingRegressor, GradientBoostingClassifier, RandomForestRegressor, RandomForestClassifier
sl_outcome = sp.super_learner(X=train[X_cols].values, y=train["revenue_30d"].values,
library=[LassoCV(), GradientBoostingRegressor(), RandomForestRegressor()],
n_folds=5, task="regression")
sl_treat = sp.super_learner(X=train[X_cols].values, y=train["treatment"].values,
library=[LogisticRegression(max_iter=1000),
GradientBoostingClassifier(), RandomForestClassifier()],
n_folds=5, task="binary")
B.1 Estimand & DAG learning (Step 2 + 2.5 in ML key)
q = sp.causal_question(treatment="treatment", outcome="revenue_30d",
population="marketed users", estimand="ate")
plan = q.identify(strategy="ignorability_under_X", X=X_cols)
proposed = sp.llm_dag_propose(variables=X_cols + ["treatment","revenue_30d"],
domain="e-commerce uplift")
constrained = sp.pc_algorithm(train[X_cols + ["treatment","revenue_30d"]],
variables=X_cols + ["treatment","revenue_30d"], alpha=0.05)
validated = sp.llm_dag_validate(dag=proposed, data=train, alpha=0.05)
B.2 Estimator stack — DML / meta-learner / GRF / neural / Bayesian
dml = sp.dml(train, y="revenue_30d", d="treatment", X=X_cols,
model="plr",
ml_g=sl_outcome, ml_m=sl_treat, n_folds=5)
ml_dr = sp.metalearner(train, y="revenue_30d", treat="treatment", covariates=X_cols,
learner="dr",
outcome_model="xgb", propensity_model="xgb")
cf = sp.causal_forest("revenue_30d ~ treatment | " + " + ".join(X_cols),
train, n_estimators=4000, honest=True)
dn = sp.dragonnet(train, y="revenue_30d", treat="treatment", covariates=X_cols,
repr_layers=(200,100), head_layers=(100,))
tar = sp.tarnet (train, y="revenue_30d", treat="treatment", covariates=X_cols)
bcf = sp.bcf(train, y="revenue_30d", treat="treatment", covariates=X_cols,
n_trees_mu=200, n_trees_tau=50)
mc = sp.matrix_completion(panel_df, y="revenue", d="treatment", unit="user_id", time="week")
rt = sp.regtable(dml, ml_dr, cf, dn, bcf,
model_labels=["(1) DML-PLR","(2) DR-Learner","(3) Causal forest",
"(4) Dragonnet","(5) BCF"],
stats=["N","ATE","CATE 5–95% range","Cross-fit folds","Nuisance R²"],
title="Table 2. ATE — ML estimator horse race")
rt.to_word ("tables/table2_ml.docx"); rt.to_excel("tables/table2_ml.xlsx")
B.3 CATE distribution & subgroup view (the ML-causal headline)
sp.cate_plot(ml_dr, kind="hist",
title="Figure B1. CATE distribution — DR-Learner") \
.savefig("figures/figB1_cate_dist.png", dpi=300)
g = sp.cate_by_group(ml_dr, train, by="customer_value_quartile", n_groups=4)
sp.cate_group_plot(g, title="Figure B2. CATE by customer-value quartile") \
.savefig("figures/figB2_cate_group.png", dpi=300)
cf.local_effects().plot(...).savefig("figures/figB3_local.png", dpi=300)
B.4 Policy learning + off-policy evaluation
pol_tree = sp.policy_tree(train, y="revenue_30d", d="treatment", X=X_cols, max_depth=3)
pol_tree.plot().savefig("figures/figB4_policy.png", dpi=300)
safe = sp.offline_safe_policy(holdout, state=X_cols, action="treatment",
reward="revenue_30d", cost="offer_cost", cost_threshold=2.50)
import numpy as np
X_test = holdout[X_cols].values
A_test = holdout["treatment"].values
R_test = holdout["revenue_30d"].values
pi_b = sl_treat.predict_proba(X_test)[:, 1]
pi_e = pol_tree.predict(X_test)
opv = sp.ope.doubly_robust(X_test, A_test, R_test, pi_b=pi_b, pi_e=pi_e,
reward_model=sl_outcome)
print(f"Policy value (DR): {opv.value:.3f} ± {opv.se:.3f}")
B.5 Uncertainty + fairness + robustness
cp = sp.conformal_causal.conformal_cate(train, y="revenue_30d", treat="treatment",
covariates=X_cols, alpha=0.10)
holdout = holdout.assign(pred=ml_dr.predict(holdout[X_cols]))
fair = sp.fairness.fairness_audit(holdout, predictions="pred",
protected="gender", labels="revenue_30d",
threshold=0.10)
sd = sp.sensitivity_dashboard(dml, train,
dimensions=["unmeasured_confounding","positivity","model_misspec"])
sd.plot().savefig("figures/figB5_sensitivity.png", dpi=300)
sc = sp.spec_curve(train, y="revenue_30d", x="treatment",
controls=[["age"],["age","gender"],X_cols],
se_types=["robust","cluster"])
sc.plot().savefig("figures/figB6_spec_curve.png", dpi=300)
B.6 Reporting checklist (ML-causal-specific footer)
When producing the Table-2 footer, include — in addition to the AER stars/SE language:
- Nuisance learners used (e.g., "outcome: SuperLearner[xgb, rf, lasso, nn]; treatment: same")
- Cross-fitting: number of folds, sample-splitting scheme
- Overlap diagnostic: PS distribution range,
% trimmed
- CATE summary: mean / 5–95% range / share with CATE > 0
- Policy value: off-policy DR value vs. random / vs. always-treat baselines
- Conformal coverage: empirical coverage of nominal 1−α PI on holdout
- Fairness audit: subgroup CATE gaps vs. acceptable thresholds
Doubly-robust DML / DR-Learner / TMLE are preferred over single-robust S- or T-learner alone. Report S- or T-learner only as a baseline in the horse race. Always check overlap before reporting any IPW-flavored estimator.
Method Catalog
Classical
Choose by FE structure:
- No FE / single low-cardinality FE →
sp.regress (statsmodels OLS wrapper)
- High-dim FE absorption (
y ~ x | fe1 + fe2) → sp.feols (pyfixest backend, AER workhorse)
- Two-way panel (entity × time) →
sp.panel(...) (linearmodels backend, standard panel diagnostics)
sp.regress("y ~ x1 + x2", df, cluster="firm_id")
sp.feols ("y ~ x1 + x2 | firm_id + year", df, vcov={"CRV1":"firm_id"})
sp.feols ("y ~ x1 + x2 | firm_id", df, vcov={"CRV1":"firm_id+year"})
sp.fepois ("count ~ x1 + x2 | firm_id", df, vcov={"CRV1":"firm_id"})
sp.feglm ("y ~ x1 + x2 | firm_id", df, family="logit", vcov={"CRV1":"firm_id"})
sp.ivreg ("y ~ (x1 ~ z1 + z2) + x2", df, cluster="state")
sp.panel (df, "y ~ x1 + x2", entity="firm", time="year", method="fe")
sp.heckman(df, y="wage", x=["age", "edu"],
select="in_labor_force", z=["marital", "kids"])
sp.qreg (df, formula="y ~ x1 + x2", quantile=0.5)
sp.regress does NOT parse | as a FE separator — it forwards the formula to statsmodels which treats edu | firm_id as a single garbage variable name. Use sp.feols (or sp.panel) whenever your formula has |. Models from sp.regress, sp.feols, sp.ivreg, sp.panel, sp.fepois, sp.feglm, sp.qreg, sp.heckman all flow through sp.regtable / sp.coefplot / sp.collect / sp.paper_tables — mix freely in the same table.
Difference-in-Differences
sp.did(df, y="y", treat="treated", time="post")
sp.callaway_santanna(df, y="y", g="first_treat_year", t="year", i="firm_id")
sp.sun_abraham(df, y="y", g="first_treat_year", t="year", i="firm_id")
sp.bacon_decomposition(df, y="y", treat="treated", time="year", id="firm_id")
sp.continuous_did(df, y="y", dose="dose", time="year", id="firm_id")
sp.honest_did(cs_result, method="smoothness")
sp.event_study(df, y="y", treat_time="first_treat_year",
time="year", unit="firm_id", window=(-4, 4))
Regression Discontinuity
sp.rdrobust(df, y="y", x="running_var", c=0)
sp.rdrobust(df, y="y", x="running_var", c=0, fuzzy="treatment")
sp.rddensity(df, x="running_var", c=0)
sp.rdmc(df, y="y", x="running_var", cutoffs=[0, 5, 10])
sp.rkd(df, y="y", x="running_var", c=0)
sp.rdplacebo(df, y="y", x="running_var", c=0,
placebo_cutoffs=[-2, -1, 1, 2])
sp.rdbwsensitivity(df, y="y", x="running_var", c=0,
bw_grid=[0.5, 1.0, 1.5, 2.0])
Matching & Reweighting
sp.match(df, y="wage", treat="training", covariates=["age", "edu"], method="nearest")
sp.match(df, y="wage", treat="training", covariates=["age", "edu"], method="cem")
sp.ebalance(df, y="wage", treat="training", covariates=["age", "edu"])
Synthetic Control
sp.synth(df, outcome="y", unit="unit", time="time",
treated_unit=1, treatment_time=2000)
sp.sdid(df, outcome="y", unit="unit", time="time",
treated_unit=1, treatment_time=2000)
sp.synth_time_placebo(df, outcome="y", unit="unit", time="time",
treated_unit=1, treatment_time=2000,
n_placebo_times=10)
ML Causal
sp.dml(df, y="wage", treat="training", covariates=["age", "edu"], model="plr")
sp.causal_forest(formula="wage ~ training | age + edu", data=df)
sp.metalearner(df, y="wage", treat="training", covariates=["age", "edu"], learner="dr")
sp.tmle(df, y="wage", treat="training", covariates=["age", "edu"])
sp.aipw(df, y="wage", treat="training", covariates=["age", "edu"])
Neural Causal
sp.tarnet(df, y="wage", treat="training", covariates=["age", "edu"])
sp.cfrnet(df, y="wage", treat="training", covariates=["age", "edu"])
sp.dragonnet(df, y="wage", treat="training", covariates=["age", "edu"])
Text Causal (v1.6 P1, experimental)
sp.causal_text.text_treatment_effect(
df, text_col="doc", outcome="y", treatment="t",
covariates=["age", "edu"], embedder="hash", n_components=20)
sp.causal_text.llm_annotator_correct(
annotations_llm=df["t_llm"],
annotations_human=df["t_true"],
outcome=df["y"], covariates=df[["age", "edu"]],
method="hausman")
Mechanisms / Decomposition
sp.mediation(df, y="wage", d="training", m="hours_worked",
X=["age", "edu"])
sp.decompose(...)
Robustness, Sensitivity & Inference
sp.spec_curve(df, y="wage", x="training",
controls=[["age"], ["age", "edu"], ["age", "edu", "tenure"]])
sp.robustness_report(df, formula="wage ~ training + age + edu",
x="training", cluster_var="firm_id")
sp.subgroup_analysis(df, formula="wage ~ training + age + edu",
x="training", by={"gender": "female", "age_bin": "age_quartile"})
sp.oster_bounds(df, y="wage", treat="training",
controls=["age", "edu"], r_max=1.3)
sp.unified_sensitivity(result, r2_treated=0.05, r2_controlled=0.10,
include_oster=True)
sp.sensitivity_dashboard(result)
sp.evalue(estimate=..., ci=(..., ...), measure="RR")
sp.twoway_cluster(result, df, cluster1="firm_id", cluster2="year")
sp.conley(result, df, lat="lat", lon="lon", dist_cutoff=100)
fig = result.plot()
sp.interactive(fig)
Common Mistakes
| Anti-pattern | Correct form |
|---|
| Reporting Table 2 without writing the estimating equation | Step 2 — write the equation + identifying assumption to artifacts/empirical_strategy.md before estimating |
| Skipping the event-study figure and going straight to the DID coefficient | Step 3.1 — sp.event_study(...) + sp.enhanced_event_study_plot(...) precedes the regression table |
| Reporting IV without first-stage F | Step 3.2 — iv.summary() reports first-stage F; bench-mark F ≥ 10 (≥ 23 for AR-equivalent inference) |
| Reporting RD without McCrary + binscatter | Step 3.3 — sp.rddensity + sp.binscatter |
| Single-spec main result with no robustness panel | Step 7 — placebo, Oster, honest_did, alt-SE, spec_curve are expected, not optional |
| Cluster at observation level when treatment is at firm/state level | Cluster at the level of treatment assignment; use sp.twoway_cluster if multi-dim |
| Raw panel → staggered DID without balance check | Run Step 0 data_contract; inspect sp.balance_panel output and cohort sizes |
spec_curve(controls=["a","b","c"]) (flat list) | controls=[["a"], ["a","b"], ["a","b","c"]] — each inner list = one spec |
sp.rdrobust(..., cutoff=0) | Kwarg is c=0 across rdrobust / rkd / rdplacebo / rdbwsensitivity |
sp.evalue(result) | sp.evalue(estimate=<point>, ci=(lo, hi), measure="RR") |
sp.match(df, treat="t", y="y", ...) | Signature is (df, y, treat, covariates, ...) — y before treat |
sp.sun_abraham(df, y, g, t) — no unit id | Staggered DID requires i=<unit_id> |
sp.synth(..., treated_period=2000) | Kwarg is treatment_time= (singular) |
sp.panel(df, formula, fe=True) | Kwarg is method="fe" |
sp.robustness_report(result, ...) | Takes (data, formula, x, ...) — not a result object |
sp.mediation(df, y, treat, mediator) | Kwargs are (df, y, d, m, X) — d for treatment, m for mediator |
Pre-computed embeddings to text_treatment_effect | Pass text_col=<column_name>; control vectorisation via embedder= |
llm_annotator_correct(df) | Takes aligned pd.Series (not DataFrame); NaN for unlabelled rows |
sp.callaway_santanna(..., covariates=[...]) | Kwarg is x=[...], not covariates= |
sp.subgroup_analysis(..., cluster=...) | Kwarg is robust='hc1' (or 'hc0'/'hc2'/'hc3'); no cluster slot |
sp.oster_delta(..., treat=, controls=, r_max=) | Real signature: (data, y, x_base, x_controls, r_max) |
sp.power_did(..., power_target=...) | Wrappers don't auto-solve. Use dispatcher: sp.power('did', ..., power_target=..., n_periods=, n_treated_periods=) |
sp.power_cluster_rct(n_clusters=..., power_target=...) | Use dispatcher: sp.power('cluster_rct', cluster_size=, icc=, effect_size=, power_target=) |
sp.cate_group_plot(forest, group=...) | Takes a DataFrame: g = sp.cate_by_group(ml, df, by=..., n_groups=4); sp.cate_group_plot(g). Forest result lacks per-row CATEs — use sp.metalearner(..., learner='dr') |
sp.cate_plot(causal_forest_result, ...) | Same — needs metalearner (or any X/DR/R-learner) result that exposes .cate_estimates |
sp.bjs_pretrend_joint(es) | Real signature: (cs_or_sa_result, data, y=, group=, time=, first_treat=, controls=) — NOT event_study() output |
sp.honest_did(ols_result, ...) | Only accepts CS / SA / did_multiplegt / aggte(..., 'dynamic') results — pass a callaway_santanna object |
sp.sumstats(df, groups={...}, ...) | No groups= kwarg; loop sp.sumstats(vars=v_panel, ...) per panel and concat |
sp.sumstats(..., by="treat") always shows numeric "0" / "1" panel headers | Binary 0/1 by= auto-renders as Control / Treated (no kwarg needed). For non-binary or alternative wording, pass by_labels={0:"Untrained", 1:"Trained"} |
Fixing fmt="%.0f" (or any fixed format) on a regtable that mixes dollar-magnitude (~$1500) and elasticity-magnitude (~0.09) coefficients | Silently rounds the elasticities to 0 while stars survive — the LaLonde precision trap. Use fmt="auto" for magnitude-adaptive precision: thousands separator for ≥1000, integer for ≥100, 1 dp for ≥10, 2 dp for ≥1, 3 dp below |
plan.population / plan.equation / plan.threats | Not exposed on IdentificationPlan. Available: assumptions / estimand / estimator / fallback_estimators / identification_story / warnings / summary(). Use q.population / q.treatment / q.outcome from the CausalQuestion |
sp.regtable(..., output="docx") / output="xlsx" | Enum is {"text","latex","tex","html","markdown","md","qmd","quarto","word","excel"}. Either use output="word"/"excel" or — preferred — drop output= and call .to_word(filename) / .to_excel(filename) on the result |
sp.sumstats(..., output="docx") returns plain text | sumstats doesn't natively emit binary docx/xlsx. For Word/Excel use sp.collect().add_summary(...).save("file.docx") or convert via sp.mean_comparison(...).to_word(...) |
Hand-rolling Word from pandas.DataFrame.to_string() / writing LaTeX manually | RegtableResult.to_word/.to_excel/.to_latex/.to_markdown/.to_html already apply book-tab borders, AER stars, and the right SE label. sp.collect() bundles many such tables into one file |
Forgetting template="aer" (or qje/econometrica/restat/jf/jpe/restud/aeja) on regtable | Without template=, you lose the journal-correct SE label, star levels, and notes. List presets via sp.list_journal_templates() |
Saving each regression to its own .tex and stitching by hand in LaTeX | Use sp.paper_tables(main=, heterogeneity=, robustness=, placebo=) for a single multi-panel .docx / .xlsx, or sp.collect() for a full Word/Excel/Markdown bundle (Step 8.4) |
sp.regtable(..., keep=[focal_var]) (or drop=["Intercept"]) as the default for every table | AER convention is to show every estimated parameter verbatim — controls AND the intercept so the reader can verify the full spec. regtable() does this when you pass NEITHER keep= NOR drop=. Reserve drop=["Intercept"] for when you actively want to suppress the constant; reserve keep=[focal] for intentionally focal-only tables (IV first-stage triplet, interaction-form heterogeneity) — each with a comment explaining why |
sp.regress("y ~ x | firm_id", df, cluster="firm_id") for FE | Silently produces wrong numbers — sp.regress is a thin statsmodels OLS wrapper that does NOT parse | as a FE separator; it interprets x | firm_id as a single garbage variable name. Use sp.feols("y ~ x | firm_id", df, vcov={"CRV1":"firm_id"}) for any formula containing |. Two-way cluster: vcov={"CRV1":"firm_id+year"} |
sp.feols(..., cluster="firm_id") | feols uses pyfixest convention: vcov={"CRV1":"firm_id"} (one-way) or vcov={"CRV1":"firm_id+year"} (two-way). The cluster= kwarg is for sp.regress / sp.ivreg (statsmodels) only |
sp.twoway_cluster(feols_result, df, cluster1=, cluster2=) | sp.twoway_cluster consumes statsmodels-backed results only. For feols, pass two-way directly: sp.feols(..., vcov={"CRV1":"firm_id+year"}) |
| Trusting SEs without checking convergence / weak-IV / overlap | Always read result.summary() warnings and result.diagnostics |
Agent Integration Pattern
import statspai as sp
sp.list_functions()
info = sp.describe_function("callaway_santanna")
schema = sp.function_schema("callaway_santanna")
result = sp.callaway_santanna(df, y="y",
g="first_treat_year", t="year", i="firm_id")
print(result.summary())
result.to_latex("tables/did_results.tex")
When to Use StatsPAI vs Alternatives
| Scenario | Use StatsPAI | Alternative |
|---|
| One-stop EDA → estimand → DAG → estimate → robustness pipeline | ✅ single import covers all eight AER sections | assemble pyfixest + econml + causalml + differences + ... |
| Agent-driven analysis with self-describing API | ✅ list_functions / describe_function / function_schema | statsmodels / pyfixest (no agent API) |
| Estimand-first "DID vs RD vs IV?" decision | ✅ sp.causal_question + sp.causal | manual judgement call |
| Stata → Python migration (same API names) | ✅ sp.regress, sp.estat, sp.sumstats, sp.feols, sp.panel (Stata xtreg → `sp.feols("y ~ x | id + year", df)orsp.panel(..., method="fe"/"re")`) |
| Full AER-style robustness gauntlet from one package | ✅ Oster / honest_did / E-value / Conley / 2-way / spec_curve / placebo all in sp.* | manually wire 5+ packages |
| Epidemiology / public health (target-trial emulation, IPTW + g-formula + TMLE triplet, MR, KM/AFT survival, E-value, STROBE/TRIPOD reporting) | ✅ sp.target_trial.TargetTrialProtocol + sp.target_trial_emulate + sp.gformula + sp.msm + sp.tmle + sp.hal_tmle + sp.mendelian (sp.mr_ivw/sp.mr_egger/sp.mr_median) + sp.kaplan_meier + sp.aft + sp.evalue + sp.principal_strat — see §A. | hand-stitched zEpid + lifelines + statsmodels + manual MR scripts |
| ML causal inference (DML / S/T/X/R/DR-Learner / causal forest / Dragonnet / TARNet / CEVAE / BCF / matrix completion / policy learning / OPE / conformal CATE / fairness audit / DAG learning) | ✅ sp.dml + sp.metalearner + sp.causal_forest + sp.dragonnet/tarnet/cevae + sp.bcf + sp.matrix_completion + sp.policy_tree + sp.offline_safe_policy + sp.ope.* + sp.conformal_causal.* + sp.fairness.fairness_audit + sp.causal_discovery/pc_algorithm/notears/llm_dag_propose+llm_dag_validate — see §B. | EconML + DoWhy + CausalML + GRF + zEpid + dowhy-gcm assembled by hand |