원클릭으로 Manus에서 모든 스킬 실행

$pwd:

audit-reproducibility

Name: Audit Reproducibility
Author: pedrohcgs

// Enforce the replication-protocol.md rule by cross-checking numeric claims in a manuscript against the actual R / Stata / Python outputs. Report PASS/FAIL per claim against tolerance thresholds. Use before submission and before releasing a replication package.

Manus에서 실행

$ git log --oneline --stat

stars:1,183

forks:2,424

updated:2026년 5월 20일 15:43

SKILL.md

readonly

related-skills.json

같은 저장소

stata-replication.md

from "pedrohcgs/claude-code-my-workflow"

End-to-end Stata replication pipeline — scaffolds numbered `.do` files in `scripts/stata/`, executes them via the `stata-mcp` MCP server, captures logs and outputs to `scripts/stata/_outputs/`, and produces publication-ready tables (esttab) and figures (graph export). Mirrors `/data-analysis` for R-first projects. Use when user says "stata replication", "set up Stata pipeline", "scaffold the .do files", "run Stata analysis", "AEA replication package in Stata", or when a project's analysis language is Stata not R.

2026-05-201.2k

compress-session.md

from "pedrohcgs/claude-code-my-workflow"

Distill the current conversation into a structured note (decisions made, open questions, file pointers with line numbers, next 1–3 actions) and save to `quality_reports/session_logs/` before auto-compression. Differs from `/checkpoint` (explicit stop-point snapshot) and from auto-compaction (which truncates rather than distills). Use when context is approaching auto-compact threshold, when a long pipeline has accumulated many decisions, or when the user says "compress", "distil this session", "before we hit auto-compact", "structured handoff before context resets".

2026-05-201.2k

promote-memory.md

from "pedrohcgs/claude-code-my-workflow"

Review candidate `[LEARN]` entries in `.claude/state/personal-memory.md` (gitignored) and run them through a five-critic council in parallel: generality, staleness, redundancy, evidence, format. Majority vote (3+ of 5) promotes the entry to MEMORY.md. Use when user says "promote memory", "review my learnings", "what should graduate to MEMORY.md", "five-critic council", or as monthly memory maintenance.

2026-05-201.2k

prompt-only.md

from "pedrohcgs/claude-code-my-workflow"

Reformat an informal or dictated request into a structured prompt (Role / Task / Context / Constraints / Output format / Bookend) and emit it as a reusable artifact — does NOT execute. Companion to `/prompt`. Use when user says "format this prompt", "give me a clean version of this ask", "save this prompt for later", "I want a reusable prompt", or wants a prompt to use elsewhere (different model, different conversation, recurring task).

2026-05-201.2k

prompt.md

from "pedrohcgs/claude-code-my-workflow"

Reformat an informal or dictated request into a structured prompt (Role / Task / Context / Constraints / Output format / Bookend) at Light / Standard / Deep depth, then execute it immediately. Use when user has a conversational ask that would benefit from being made specific before Claude runs it, or says "prompt", "structure this", "format this ask", "reshape this request", "make this rigorous". Single-shot input shaping — distinct from `/interview-me` (multi-turn project specification).

2026-05-201.2k

verify-claims.md

from "pedrohcgs/claude-code-my-workflow"

Run Chain-of-Verification (CoVe) on a draft or a block of text with factual claims. Spawns the `claim-verifier` agent in a forked (fresh) context so it never sees the draft — then reports which claims are supported, contradicted, or unverifiable. Use when user says "verify these citations", "check the claims in X", "did I hallucinate anything", "fact-check this draft", "run CoVe on this", or after any text generation that asserts facts about papers, datasets, or numerical results. NOT for style/grammar review (use `/proofread`) or substance review (use `/review-paper`).

2026-05-201.2k

package.json

"author": "pedrohcgs"

"repository": "pedrohcgs/claude-code-my-workflow"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

데이터 과학자컴퓨터 및 수학직15-2051L4

name	audit-reproducibility
description	Enforce the replication-protocol.md rule by cross-checking numeric claims in a manuscript against the actual R / Stata / Python outputs. Report PASS/FAIL per claim against tolerance thresholds. Use before submission and before releasing a replication package.
argument-hint	[manuscript path] [outputs-dir] (outputs-dir defaults to scripts/R/_outputs/)
allowed-tools	["Read","Grep","Glob","Write","Bash","Task","Monitor"]
effort	high

Audit Reproducibility

Compare numeric claims in a manuscript (point estimates, standard errors, p-values, counts) against the actual outputs produced by the analysis pipeline. Report PASS / FAIL per claim against the tolerance thresholds defined in .claude/rules/replication-protocol.md.

Core principle: If the paper says ATT = -1.632 (0.584) and the code produces -1.628 (0.591), we verify — numerically — that the difference is within the documented tolerance. No more "looks close enough" eyeballing.

When to use

Before submission. Catches the "I updated the analysis but forgot to update Table 2" bug.
Before releasing a replication package. Verifies the code actually reproduces the paper.
After a major revision. Ensures the paper still matches the latest code.
Quality-gate in /commit. Pair with a pre-commit invocation on manuscript + analysis changes.

Inputs

$0 — path to the manuscript (.tex, .qmd, .md, .pdf). Required.
$1 — path to the outputs directory. Defaults to scripts/R/_outputs/. Recognised alternatives: scripts/stata/_outputs/ (Stata pipelines built by /stata-replication), _targets/objects/ (R targets workflows), any directory the user-specified outputs live in.

Workflow

Phase 0: Pre-flight

Read replication-protocol.md for the tolerance thresholds currently in effect.
Verify the outputs directory exists and is non-empty. If empty or stale (older than the manuscript), prompt the user to re-run their pipeline (e.g., Rscript scripts/R/00_run_all.R) before auditing.
Ensure a sessionInfo.txt or equivalent environment capture exists in the outputs dir.

Phase 1: Extract claims from the manuscript

Parse the manuscript for numeric claims. Patterns to match:

Point-estimate + SE: ATT = -1.632 (0.584), $\beta = 0.342$ (0.091), hat{\tau} = 1.28** with starred significance
Table cells: & -1.632$^{***}$ & 0.584 & in LaTeX table environments
Counts: our sample of 2,847 firms, $N = 2{,}847$
Summary stats: mean = 0.423, SD = 0.087
P-values: p < 0.01, $p = 0.003$

Record each claim as a tuple:

{
  claim_id: "Table2_col3_ATT",
  location: "Table 2, Column 3, row 'Treatment'",
  kind: "point_estimate" | "standard_error" | "p_value" | "count" | "percentage",
  reported_value: -1.632,
  uncertainty: 0.584,              # only for point estimates
  significance_stars: 3,            # 0-3 or None
  raw_context: "the ATT estimate of -1.632 (0.584) indicates..."
}

Write the extracted claims to quality_reports/reproducibility_claims_[manuscript-name].json so the user can review the extraction before audit.

Phase 2: Extract results from outputs

Scan $1 for corresponding values. Priority order:

.rds files — readRDS(path)$coef[["treatment"]] style lookups. Can use Rscript -e "saveRDS(summary(readRDS(...)), '/tmp/audit.rds')" to extract.
.tex tables — parse LaTeX table cells directly; match on column headers + row labels.
.csv summary files — pandas/readr parse, key-value lookup.
.out / .log files (Stata, regress output) — regex extraction.
.json — direct key lookup.

Record each extracted result:

{
  source: "scripts/R/_outputs/results.rds",
  lookup_key: "fit_main$coefficients['treated']",
  value: -1.628,
  uncertainty: 0.591,
  p_value: 0.005
}

Phase 3: Match claims to results

Use fuzzy heuristics when exact labels don't match:

Name similarity ("treatment effect" ~ "ATT" ~ "treated")
Magnitude similarity (if two candidates have values within 10% of the reported, prefer the one with closer SE)
Context hints from the claim's raw_context field (table number, row label, description)

For every claim, produce a match candidate with a confidence score. Claims below 0.7 confidence get flagged as "UNMATCHED — manual review needed" rather than silently passing.

Phase 4: Tolerance check

For each matched claim, apply the thresholds from replication-protocol.md:

Kind	Tolerance	Example
Integers (N, counts)	Exact	2,847 must equal 2,847
Point estimates	`abs(reported - computed)` < 0.01	-1.632 vs -1.628 → diff = 0.004 → PASS
Standard errors	`abs(reported - computed)` < 0.05	0.584 vs 0.591 → diff = 0.007 → PASS
P-values	Same significance level	p<0.01 and p<0.01 → PASS; p<0.01 and p=0.03 → FAIL
Percentages	±0.1pp	42.3% vs 42.35% → PASS

Respect any tolerance overrides the user has written into their replication-protocol.md fork (they may loosen for MC noise or tighten for administrative data).

Phase 5: Report

Write quality_reports/reproducibility_audit_[manuscript-name].md:

# Reproducibility Audit: [Manuscript Title]

**Date:** [YYYY-MM-DD]
**Manuscript:** [path]
**Outputs directory:** [path]
**Tolerance source:** .claude/rules/replication-protocol.md

## Summary

| Status | Count |
|---|---|
| PASS | N |
| FAIL (diff > tolerance) | M |
| UNMATCHED (manual review) | K |
| **Overall verdict** | **PASS / FAIL** |

## PASS (all within tolerance)
| Claim | Reported | Computed | Diff | Tolerance |
|---|---|---|---|---|
| Table2_col3_ATT | -1.632 (0.584) | -1.628 (0.591) | 0.004 / 0.007 | 0.01 / 0.05 |

## FAIL (outside tolerance — BLOCKER)
| Claim | Reported | Computed | Diff | Tolerance | Location in paper |
|---|---|---|---|---|---|

## UNMATCHED (manual review)
| Claim | Raw context | Candidate sources |
|---|---|---|

## Environment
[sessionInfo excerpt]

## Next steps
1. Fix any FAIL rows — either update the manuscript or rerun analysis.
2. Review UNMATCHED rows — add explicit lookup keys or widen the search scope.
3. After zero FAILs, the paper is replication-ready.

Exit behavior

All PASS: exit 0, summary printed.
Any FAIL: exit 1, summary printed to stderr. This makes the skill usable as a /commit pre-commit gate — see replication-protocol.md for the enforcement pattern.
UNMATCHED > 0 (with 0 FAIL): exit 0 with warning — user must manually review.

Source-language coverage

The skill compares manuscript claims against outputs in three source-language ecosystems:

Source	Default outputs dir	Read-output via	Common claim sources
R (default)	`scripts/R/_outputs/`	`readRDS()`, `arrow::read_parquet()`, `vroom::vroom()`	`.rds` / `.parquet` / `.csv` / `tinytable` `.tex`
Stata (v1.9.0)	`scripts/stata/_outputs/`	`haven::read_dta()` from R, or `pyreadstat.read_dta()` from Python	`.dta` / `esttab` `.tex` / `.smcl` log values
Python	`scripts/python/_outputs/` (or `_targets/`)	`pandas.read_parquet`, `pickle.load`	`.parquet` / `.pickle` / `.csv`

Stata-specific notes (v1.9.0):

.dta outputs are read via haven::read_dta() (R), pyreadstat.read_dta() (Python), or by parsing the corresponding esttab .tex if the table-cell value is what the manuscript cites.
Manuscript cell \input{scripts/stata/_outputs/tab_main.tex} is the strongest provenance signal — the cell value comes mechanically from the .do file. Match the location in the .tex to the regression call in 03_analyze.do.
Clustering df adjustments can differ between reghdfe and base reg, cluster(). If a SE mismatches at the 2nd decimal, the tolerance in replication-protocol.md covers it; if it mismatches at the 1st decimal, investigate the df adjustment.

Passport-mode (v1.9.0)

When quality_reports/passports/<paper-slug>.yaml exists, the skill operates in passport mode: instead of emitting a one-shot report, it reads, updates, and rewrites the passport file in place.

For each claims: entry in the passport, perform the same numeric audit as the default mode (extract reported value from manuscript at location, locate computed value at source_file:source_line / output_file:output_field, compare against tolerance:).
After each claim is audited, update status in place:
- PASS → claim within tolerance.
- FAIL → claim outside tolerance. Record the discrepancy in the entry's notes.
- STALE → if source_file or output_file modification time is later than last_verified_on, mark STALE and re-run the audit logic (after the rerun, status becomes PASS or FAIL — STALE is transient).
Update last_verified_on and last_verified_by: "/audit-reproducibility" per claim.
Update paper.last_audit at the top level.

If a claim in the manuscript is detected that has no matching passport entry, emit an UNVERIFIED warning — the author should add it (passport scope is author-curated, not auto-populated, to avoid bad inferences).

Passport mode does NOT delete passport entries. If a claim disappears from the manuscript, the passport entry remains with a STALE status — the author decides whether to delete (claim retracted) or update the entry's location (claim moved).

See .claude/rules/replication-protocol.md "Claims Provenance: passport.yaml" for the full schema and integration points (/commit, /review-paper).

Cross-references

.claude/rules/replication-protocol.md — the tolerance contract + passport schema.
templates/passport-template.yaml — starter file to copy for a new paper.
.claude/skills/review-r/SKILL.md — catches code-style issues; this skill catches NUMERICAL reproducibility.
.claude/skills/review-paper/SKILL.md — content review; pair with this skill for a full pre-submission audit.

What this skill does NOT do

Re-run your analysis. The skill compares CURRENT outputs against manuscript claims. If the outputs are stale, re-run your pipeline first (the pre-flight phase will warn).
Catch wrong specifications. A regression that compiles cleanly and produces a reproducible -1.632 is reproducible. Whether -1.632 is the RIGHT estimand is a review-paper / domain-reviewer question.
Check external package versions. The sessionInfo.txt capture lets a reviewer see the env; pinning versions is on the user (via renv.lock or a DESCRIPTION file).

Long batch reruns: use the Monitor tool (Apr 2026)

When /audit-reproducibility is asked to verify all numeric claims in a paper, the safest approach is to re-run the full pipeline (00_run_all.R or equivalent) and compare the regenerated outputs to the manuscript values. For pipelines that take more than a couple of minutes, background-launch the rerun and use Anthropic's Monitor tool (Apr 2026 Week 15) to stream stdout. The audit can react to errors mid-stream rather than waiting for the entire pipeline to finish before noticing a failed step.

audit-reproducibility

이 저장소의 다른 Skills

Audit Reproducibility

When to use

Inputs

Workflow

Phase 0: Pre-flight

Phase 1: Extract claims from the manuscript

Phase 2: Extract results from outputs

Phase 3: Match claims to results

Phase 4: Tolerance check

Phase 5: Report

Exit behavior

Source-language coverage

Passport-mode (v1.9.0)

Cross-references

What this skill does NOT do

Long batch reruns: use the Monitor tool (Apr 2026)

Audit Reproducibility

When to use

Inputs

Workflow

Phase 0: Pre-flight

Phase 1: Extract claims from the manuscript

Phase 2: Extract results from outputs

Phase 3: Match claims to results

Phase 4: Tolerance check

Phase 5: Report

Exit behavior

Source-language coverage

Passport-mode (v1.9.0)

Cross-references

What this skill does NOT do

Long batch reruns: use the Monitor tool (Apr 2026)

이 저장소의 다른 Skills