一键在 Manus 中运行任何 Skill

did-event-study

Run a staggered difference-in-differences / event-study analysis to the Sant'Anna practitioner standard — drives the canonical packages (R `did`/`DRDID`/`didFF`/`contdid`; Stata `csdid`/`drdid`), enforces the doubly-robust default, a mandatory diagnostic + sensitivity suite, uniform-band inference, replicate-and-verify-against-source discipline, and ends in a graded credibility verdict. Use when user says "run a DiD", "event study", "staggered adoption", "Callaway Sant'Anna", "att_gt", "csdid", "did with multiple periods", or points at panel data with a treatment-timing variable. NEVER reimplements an estimator.

在 Manus 中运行

星标1,263

分支2,569

更新时间2026年6月9日 19:11

来源

pedrohcgs

pedrohcgs/claude-code-my-workflow

打开 GitHub 仓库查看创作者相关仓库

安装命令

下载

在 Manus 中运行

适用职业SOC

软件开发工程师计算机与数学类职业15-1252L4

SKILL.md

readonly

同仓库更多 Skills

同仓库

submission-disclosures

pedrohcgs/claude-code-my-workflow

Generate the submission-time disclosure block for a manuscript — the AI-use disclosure statement matched to the target journal's policy, CRediT author-contribution roles, conflict-of-interest statement, and data-availability statement. Use when the user says "AI disclosure", "disclosure statement", "do I need to disclose Claude", "CRediT roles", "conflict of interest statement", "data availability statement", or is preparing a submission package. NOT statistical-disclosure screening of restricted-data outputs — that is /disclosure-check.

2026-06-101.3k

diagnose

pedrohcgs/claude-code-my-workflow

Root-cause a failing or wrong empirical result with a disciplined reproduce → minimise → hypothesise → instrument → fix loop, instead of guessing-and-poking. Use when the user says "why is my regression wrong", "this number changed", "my script errors out", "the result won't reproduce", "debug this", "this estimate looks wrong", or "it worked yesterday". Tuned for research code (R/Stata/Python): type coercion, NA/merge blow-ups, factor levels, clustering/SE choices, weighting, collinearity/convergence, seeds, package-version drift. Use `--no-fix` to localize the root cause without editing shared or load-bearing files.

2026-06-091.3k

audit-reproducibility

pedrohcgs/claude-code-my-workflow

Enforce the replication-protocol.md rule by cross-checking numeric claims in a manuscript against the actual R / Stata / Python outputs. Report PASS/FAIL per claim against tolerance thresholds. Use before submission and before releasing a replication package.

2026-06-091.3k

deep-audit

pedrohcgs/claude-code-my-workflow

Deep consistency audit of the entire repository infrastructure. Launches 4 parallel specialist agents to find factual errors, code bugs, count mismatches, and cross-document inconsistencies. Then fixes all issues and loops until clean. Use when: after making broad changes, before releases, or when user says "audit", "find inconsistencies", "check everything".

2026-06-091.3k

create-lecture

pedrohcgs/claude-code-my-workflow

Create a new Beamer lecture `.tex` from source papers and materials, with notation consistency checks and the project's preamble wired in. Use when user says "create a lecture on X", "new lecture from these papers", "start a deck on topic Y", "scaffold a new Beamer file", "build me a lecture from these PDFs". Scaffolds the full deck — NOT for compiling existing `.tex` (use `/compile-latex`).

2026-06-091.3k

review-paper

pedrohcgs/claude-code-my-workflow

Comprehensive manuscript review with three modes: single-pass (default), --adversarial critic-fixer loop, and --peer [journal] simulated peer-review pipeline (editor + 2 dispositioned referees + editorial decision, calibrated to a target journal). R&R continuation via --peer --r2/--r3; hostile-editor stress test via --peer --stress; reviewer-disposition variance reporting via --peer --variance N. Auto-invokes /review-r + /audit-reproducibility on referenced scripts unless --no-cross-artifact.

2026-06-091.3k

name	did-event-study
description	Run a staggered difference-in-differences / event-study analysis to the Sant'Anna practitioner standard — drives the canonical packages (R `did`/`DRDID`/`didFF`/`contdid`; Stata `csdid`/`drdid`), enforces the doubly-robust default, a mandatory diagnostic + sensitivity suite, uniform-band inference, replicate-and-verify-against-source discipline, and ends in a graded credibility verdict. Use when user says "run a DiD", "event study", "staggered adoption", "Callaway Sant'Anna", "att_gt", "csdid", "did with multiple periods", or points at panel data with a treatment-timing variable. NEVER reimplements an estimator.
argument-hint	[data path] [--outcome --unit --time --gvar] [--control nevertreated\|notyettreated] [--continuous] [--stata]
allowed-tools	["Read","Grep","Glob","Write","Bash"]
effort	high

/did-event-study — DiD / event study, Sant'Anna practitioner standard

This is a thin orchestrator over the canonical packages — it never reimplements an estimator. It walks the practitioner workflow from Difference-in-Differences with Multiple Time Periods (Callaway & Sant'Anna 2021), the Doubly Robust DiD estimators (Sant'Anna & Zhao 2020), and the "What's Trending in DiD?" synthesis (Roth, Sant'Anna, Bilinski & Poe 2023), and it follows the replicate-and-verify-against-source discipline.

Actor → Critic. The skill is the Actor: it runs your packages and the diagnostics. It then puts on the Critic hat for Phase 8 — a graded credibility verdict, never a binary "passes." A mismatch with a pre-test is evidence on credibility, not a gate. (This actor/critic + mandatory-diagnostic + graded-credibility shape mirrors .claude/rules/orchestrator-protocol.md and the verification posture of audit-reproducibility.)

Read first: .claude/rules/did-conventions.md — the HARD standards this skill enforces (data coding, DR default, control group, inference, aggregation, verification, and the pitfalls to avoid). Then the canonical resources in §Resources.

The methodological defaults below reflect Pedro Sant'Anna's sign-off: notyettreated control default, HonestDiD led by relative-magnitudes Mbar (also report M), staggered as an option with att_gt the workhorse, TWFE benchmark-only.

When to use

Staggered or 2×T adoption with panel or repeated cross-sections; a binary absorbing treatment, or a continuous dose.
Any time someone reaches for a TWFE event study under staggered timing — route here instead.

When NOT to use

A single 2×2 with one pre / one post and no covariates is a one-liner — still use DRDID::drdid(), but you don't need the full pipeline.
Reversible / switching treatments (units turning on and off): these packages assume absorbing treatment. Stop and reconsider the design.

Workflow (fixed order)

Phase 0 — Reproducibility setup (gate before any estimation)

set.seed(...) is REQUIRED — all inference is bootstrap-based. (The JEL replication uses a fixed seed; pick one and pin it.)
Pin software (renv::restore(prompt = FALSE)), use here::here() for paths (no hard-coded machine paths — the git-guardrails hook blocks them in .R/.do), one master script runs the pipeline end-to-end.
Resolve namespace conflicts explicitly (conflicted::conflict_prefer("select","dplyr"), …("filter","dplyr")).

Phase 1 — Design / estimand

Reshape to LONG: one row per unit-period (tidyr::pivot_longer).
Required columns: yname (outcome), tname (time), idname (time-invariant, numeric unit id), gname (group = first period treated; never-treated coded EXACTLY 0).
Tabulate the roll-out (share of units/population by cohort) to make the design explicit: 2×2 → 2×T → staggered G×T.
Pick the estimand up front. The recommended single summary is the Overall ATT from aggte(type = "group"); dynamics via type = "dynamic".

Phase 2 — Estimator selection

Follow the decision logic in §Estimator selection. Output: which estimator, est_method/estMethod, control_group, panel vs RC, covariates yes/no.

Phase 3 — Estimation (drive the package; do not reimplement)

2×2 (one pre / one post):
```
DRDID::drdid(yname, tname, idname, dname, xformla = ~covs, data, panel = TRUE, estMethod = "imp")
```
IPW-only: DRDID::ipwdid(..., normalized = TRUE); OR-only: DRDID::ordid(...).
- Pre-flight (learned from validating Card–Krueger): panel = TRUE requires idname unique within each period AND a balanced panel. Real datasets often aren't — check first:
```
stopifnot(nrow(dplyr::count(data, .data[[idname]], .data[[tname]]) |> dplyr::filter(n > 1)) == 0)  # idname unique by period
balanced <- all(table(data[[idname]]) == length(unique(data[[tname]])))
```
- If unbalanced (the common case): either (a) reproduce the full-sample textbook 2×2 with panel = FALSE and a row-unique id (data$rowid <- seq_len(nrow(data)); this equals feols(y ~ d*post) to ~1e-10 — but RC SEs treat the waves as independent, so for a true panel report the clustered/panel SE separately); or (b) balance the panel (keep ids present in all periods) and use panel = TRUE — but this is a different estimand (the balanced subpopulation), so record it as a named alternative (EXPLAINED), e.g. "full-sample 2×2 = 2.914; balanced-panel DR = 2.972 (19 attriting stores dropped)."
- DR with no covariates reduces to the simple 2×2 — DRDID earns its keep once xformla adds covariates.

Staggered / multi-period (G×T or 2×T):

out <- did::att_gt(
  yname, tname, idname, gname,
  xformla     = NULL,             # or ~ x1 + x2 (time-invariant / baseline covariates)
  data        = mydata,
  panel       = TRUE,             # FALSE for repeated cross-sections (idname ignored)
  control_group = "notyettreated",# staggered; "nevertreated" for a clean 2×T design
  est_method  = "dr",             # doubly robust DEFAULT (only used when xformla is set)
  base_period = "universal",      # REQUIRED for a readable event study + HonestDiD
  bstrap = TRUE, cband = TRUE, biters = 1000,   # publication: biters = 25000
  clustervars = NULL,             # ≤ 2, one must equal idname
  weightsname = NULL              # design-relevant weights if any
)

att_gt builds every ATT(g,t) from a clean drdid 2×2 — that is why it avoids the forbidden already-treated-as-control comparisons that bias TWFE.

TWFE event study — a benchmark/sanity-check, never the headline under heterogeneity: fixest::feols(y ~ i(time_to_treat, treat, ref = -1) | id + year, cluster = ~id). Confirm att_gt(est_method = "reg") matches it in simple cases (SEs differ only because of the bootstrap) so any divergence is attributable to design, not a coding bug.
Continuous dose [ALPHA — API may change]: contdid::cont_did(yname, dname, gname, tname, idname, data, target_parameter = "level"|"slope", aggregation = "dose"|"eventstudy"). dname is the time-invariant real dose (its actual value pre-treatment, not 0); gname = 0 for never-treated. level → ATT(d), slope → ACRT(d).
Stata twins (--stata) — R is the benchmark; Stata must match it. csdid y covs, ivar(id) time(t) gvar(g) method(dripw) notyet asinr (the asinr option = "as in R") must reproduce did::att_gt to 1e-6; estat event / estat simple; drdid for the 2×2. Any Python port is held to the same: match R.

Phase 4 — Mandatory diagnostics (none skippable)

Pre-trends (a PRE-TEST, not a test): read pre-treatment ATT(g,t) for t<g and the Wald p-value from summary(out); in event-study form all e<0 ≈ 0, with e = -1 ≈ 0. Passing is evidence on credibility, not proof PT holds where you need it. Do NOT pre-test with a TWFE event study — under selective timing it can reject PT even when it holds.
Event study: aggte(out, type = "dynamic") → ggdid() (red = pre pseudo-ATTs, blue = post; set ylim so panels compare). Pseudo-ATTs are valid only under no-anticipation.
Negative-weights / forbidden-comparison check: satisfied by design via att_gt/csdid; flag negative TWFE weights as the reason to prefer the ATT(g,t) building block.
DR overlap: inspect propensity-score overlap (the JEL Figure 1 idea). PS trimming default trim.level = 0.995; ps.flag reports IPT convergence.

Phase 5 — Sensitivity (ROBUSTNESS, never a pass/fail pre-test)

HonestDiD (Rambachan & Roth) — lead with the relative-magnitudes Mbar breakdown (headline), also report smoothness. Requires base_period = "universal". honest_did() is a NON-exported internal S3 method in HonestDiD (bare honest_did() errors); use HonestDiD:::honest_did(es, …), or the direct path below — validated on mpdta:

es    <- aggte(out, type = "dynamic")                 # universal base period
IF    <- es$inf.function$dynamic.inf.func.e           # influence function
sigma <- crossprod(IF) / n_units^2                    # IF-based covariance
HonestDiD::createSensitivityResults_relativeMagnitudes(
  betahat = es$att.egt, sigma = sigma,
  numPrePeriods = sum(es$egt < 0), numPostPeriods = sum(es$egt >= 0),
  Mbarvec = c(0, 0.5, 1))                             # report the breakdown Mbar

didFF functional-form sensitivity (Roth & Sant'Anna 2023): didFF::didFF(...); where the implied counterfactual density of Y(0) dips below 0, parallel-trends-for-all-functional-forms is violated. Small p → reject insensitivity. (Parallel trends is not invariant to levels vs logs — the functional form is a substantive identification choice.)
He argues formal sensitivity should be standard practice. Pair it with substantive reasoning about which time-varying confounders could break PT and how large a plausible violation is.

Phase 6 — Inference

Multiplier bootstrap, bstrap = TRUE, cband = TRUE → uniform/simultaneous bands robust to multiple testing. biters = 25000 for publication. Never ship pointwise-only (bstrap = FALSE, cband = FALSE) as the headline.
clustervars ≤ 2 (one = idname); cluster TWFE benchmarks at the unit level. Few-treated-cluster settings need care (e.g. a wild-cluster bootstrap via fwildclusterboot/boottest).
Report design-relevant weights (weightsname) AND report weighted and unweighted.

Phase 7 — Aggregation & reporting

aggte(out, type = "dynamic", min_e =, max_e =, balance_e =, bstrap = TRUE, biters = 25000, na.rm = TRUE) for the event study; type = "group" for the headline Overall ATT; type = "calendar" per period. Always pass type explicitly; avoid type = "simple" (overweights early-treated).
Report the e ∈ {0,…,K} average with overall.att/overall.se/CI, BOTH simultaneous and pointwise bands on the plot, and map every coefficient/figure to its generating script + line.

Phase 8 — Credibility verdict (graded, honest — the Critic)

Synthesize the diagnostics into a graded verdict (Strong / Moderate / Weak / Not-credible) with explicit reasons — never a binary "passes":

Design — gname coded right (0 = never-treated)? absorbing treatment? clean control group exists?
Pre-trends — Wald p + visual e<0 ≈ 0 (state: evidence, not proof).
Sensitivity — HonestDiD breakdown Mbar; didFF p-value.
Overlap — DR/PS overlap acceptable; trimming not heavily binding.
Inference — uniform bands; seed set; weights reported both ways.

Estimator selection

Continuous dose?            → contdid::cont_did(...)            [ALPHA]
else 2 groups × 2 periods?  → DRDID::drdid(..., estMethod="imp")
else many periods/cohorts?  → did::att_gt(...)   (wraps drdid per ATT(g,t))
repeated cross-sections?    → att_gt(panel=FALSE) / drdid(panel=FALSE)

Doubly-robust is the default (est_method="dr" / estMethod="imp": IPT propensity score + WLS outcome regression — doubly robust for inference). est_method matters only with covariates.
Control group: notyettreated is the default for staggered G×T (a larger, time-varying comparison; it imposes stronger cross-group PT — "no free lunch"); use nevertreated for a clean 2×T design or when a credible never-treated pool is the right comparison.
Under limited overlap, prefer OR/regression-adjustment over DR.
Heterogeneity-robust estimators usually agree (CS, Sun–Abraham, BJS, dCDH) — the first-order priority is a transparent target parameter + transparent comparison group, not agonizing over the package. Under (quasi-)random rollout timing, the efficient Roth–Sant'Anna staggered estimator is worth considering, but att_gt (Callaway–Sant'Anna) stays the workhorse default.

Verification / replication standard (from `DiD_book`)

Translate from, and verify against, the original author code — benchmark against the actual Stata esttab/outreg outputs, not printed paper numbers.
Match the source to abs_diff < 1e-6 on BOTH point estimate AND SE; loosen only deliberately and document the scope. "Replication first — match original numbers before extending."
Mandatory infra: renv.lock + renv::restore(), here::here(), set.seed, one master script, machine-readable outputs (.rds, .csv coefficients, a per-analysis verification_against_stata.csv).
R is the benchmark; other languages match R. His R packages (did/DRDID/didFF/contdid) are the canonical implementations — Stata (csdid/drdid via asinr = "as in R") and Python ports must reproduce R to 1e-6 (point + analytic SE; bootstrap-SE and cosmetic graphing differences excepted). (This is distinct from replicating a published paper, where that paper's original author code — often Stata — is the truth for its numbers.)

Resources (canonical, public)

did-resources hub: https://psantanna.com/did-resources/ — the curated list (the JEL Practitioner's Guide, What's Trending, the 14-lecture course, the DiD checklist, all packages). Lead here.
Packages: did https://bcallaway11.github.io/did/ · DRDID https://psantanna.com/DRDID/ · didFF · contdid · staggered · Stata csdid/drdid · Python drdid/csdid.
Papers: Callaway & Sant'Anna (2021) https://doi.org/10.1016/j.jeconom.2020.12.001 · Sant'Anna & Zhao (2020) https://doi.org/10.1016/j.jeconom.2020.06.003 · Roth & Sant'Anna (2023, Econometrica) https://doi.org/10.3982/ECTA19402 · Rambachan & Roth (2023, HonestDiD) · continuous treatment https://arxiv.org/abs/2107.02637.

Output

Write to scripts/R/_outputs/ (and scripts/Stata/ if --stata): the master script, the ATT(g,t) + aggregations (.rds), the event-study figure (simultaneous + pointwise bands), the HonestDiD/didFF sensitivity, the verification_against_stata.csv, and a did_credibility_verdict.md (the Phase 8 graded verdict + every table→script:line map).

Exit behavior

Exit 0 with the graded verdict. A Not-credible verdict or a failed source-verification (abs_diff ≥ 1e-6) is surfaced prominently — never silently passed.
Pairs with /audit-reproducibility (numeric claims ↔ outputs) and /replication-package (the deposit).

What this skill does NOT do

Reimplement any estimator — it drives your packages; if a number looks implausible, debug the wrapper / sample / weights / clustering / data construction / engine before interpreting it.
Handle reversible treatments, or use TWFE as the headline under staggered timing.
Consult any private vault — this skill is self-contained and public-resource-only.
Replace your judgment — the credibility verdict is advisory; you are the auditor.

Flags

--outcome --unit --time --gvar — map columns to yname/idname/tname/gname.
--control <nevertreated|notyettreated> — comparison group (default per §Estimator selection).
--continuous — continuous-dose mode (contdid, ALPHA).
--stata — also run the Stata twin (csdid/drdid) for the dual-software cross-check.

Cross-references

.claude/rules/did-conventions.md — the enforceable standards.
.claude/skills/audit-reproducibility/SKILL.md · .claude/skills/replication-package/SKILL.md · .claude/skills/power-analysis/SKILL.md · .claude/skills/simulation-study/SKILL.md.

did-event-study

同仓库更多 Skills

同仓库更多 Skills

/did-event-study — DiD / event study, Sant'Anna practitioner standard

When to use

When NOT to use

Workflow (fixed order)

Phase 0 — Reproducibility setup (gate before any estimation)

Phase 1 — Design / estimand

Phase 2 — Estimator selection

Phase 3 — Estimation (drive the package; do not reimplement)

Phase 4 — Mandatory diagnostics (none skippable)

Phase 5 — Sensitivity (ROBUSTNESS, never a pass/fail pre-test)

Phase 6 — Inference

Phase 7 — Aggregation & reporting

Phase 8 — Credibility verdict (graded, honest — the Critic)

Estimator selection

Verification / replication standard (from DiD_book)

Resources (canonical, public)

Output

Exit behavior

What this skill does NOT do

Flags

Cross-references

/did-event-study — DiD / event study, Sant'Anna practitioner standard

When to use

When NOT to use

Workflow (fixed order)

Phase 0 — Reproducibility setup (gate before any estimation)

Phase 1 — Design / estimand

Phase 2 — Estimator selection

Phase 3 — Estimation (drive the package; do not reimplement)

Phase 4 — Mandatory diagnostics (none skippable)

Phase 5 — Sensitivity (ROBUSTNESS, never a pass/fail pre-test)

Phase 6 — Inference

Phase 7 — Aggregation & reporting

Phase 8 — Credibility verdict (graded, honest — the Critic)

Estimator selection

Verification / replication standard (from DiD_book)

Resources (canonical, public)

Output

Exit behavior

What this skill does NOT do

Flags

Cross-references

Verification / replication standard (from `DiD_book`)

Verification / replication standard (from `DiD_book`)