تشغيل أي مهارة في Manus بنقرة واحدة

code-sweep

End-of-milestone audit of a research project's `Code/` folder against the paper and outputs. Surfaces drift between scripts, YAML headers, the script registry, outputs, raw data, and the paper. Categorises findings as MUST-FIX or SUGGESTED and proposes fixes one at a time for the user to confirm. Use this skill when the user says "code sweep", "sweep the code", "run a code sweep", "audit my code folder", "check code freshness", "is everything up to date", "any stale outputs", "audit code consistency", or similar — typically before a paper submission, milestone, or share. Extends `script-registry`; reuses its `registry:` YAML block and `Code/_Claude Scripts/build_registry.R` builder rather than redefining either.

تشغيل في Manus

النجوم١٣٠

التفرعات٢٦

آخر تحديث٢ يونيو ٢٠٢٦ في ٠٩:٥٨

المصدر

aspi6246

aspi6246/Claude-Code-Skills-for-Academics

فتح مستودع GitHub عرض مستودعات المنشئ

أمر التثبيت

تنزيل

تشغيل في Manus

مفيد لـSOC

مطوّرو البرمجياتمهن الحاسوب والرياضيات15-1252L4

SKILL.md

readonly

المزيد من هذا المستودع

نفس المستودع

script-registry

aspi6246/Claude-Code-Skills-for-Academics

Use when creating a new `.Rmd` script in `Code/`, editing the `registry:` YAML block of an existing script in `Code/`, moving a `Code/*.Rmd` into `_Archive/`, or when the user says "update registry", "show registry", or "which script makes X". Before writing any new `Code/*.Rmd` script, read `Code/REGISTRY.md` first and surface existing matches before starting fresh. After any YAML change, run `Code/_Claude Scripts/build_registry.R` to refresh `Code/REGISTRY.md`.

2026-06-02130

glossary

aspi6246/Claude-Code-Skills-for-Academics

Per-project glossary of key definitions, abbreviations, and command-phrases, stored in `GLOSSARY.md` at the project root. Use this skill when the user defines or asks about a project-specific term — variable names, dataset or database names, acronyms — or sets up a command-phrase (a phrase that maps to an action, e.g. "push" = commit and push the paper to GitHub). Triggers include "add to glossary", "glossary: X means Y", "define X for this project", "what does X mean here", "what does X stand for", "from now on X means Y", "show the glossary", "what's in our glossary", and "remove X from glossary". Also use proactively: offer to capture a term when the user defines one in passing, or when you hit an undefined abbreviation or variable name in their code or data. Loaded at session start by `/spin-up` so command-phrases stay active.

2026-05-28130

spin-up

aspi6246/Claude-Code-Skills-for-Academics

Start-of-session orientation routine that briefs Claude on the current state of a project before work begins. Use this skill when the user says "spin up", "spin it up", "let's go", "start up", "spin up the project", "punch it chewy", or any variant signalling they want a session kickoff briefing. Performs: git status check (unpushed commits, remote divergence), CLAUDE.md and README read, most-recent session log read (focusing on "Where we left off"), PINBOARD.md open items, GLOSSARY.md load, and a short synthesis of project state. Ends by asking what to work on today.

2026-05-28130

edmans-audit

aspi6246/Claude-Code-Skills-for-Academics

Audit a finance paper against Alex Edmans' "Learnings From 1,000 Rejections" (2025, Financial Management) three-part framework: Contribution, Execution, Exposition. Reads the paper's current LaTeX, applies the framework with per-research-question execution checks, and generates a structured report with severity-labeled findings. Use when the user says "edmans audit", "audit my paper", "run the Edmans check", "referee readiness", "submission audit", "check paper against Edmans", or "pre-submission check".

2026-05-27130

canvas-lms-api

aspi6246/Claude-Code-Skills-for-Academics

Use this skill whenever working with the Canvas LMS REST API — including creating or updating modules, pages, and files for a course, building sync scripts, managing course structure programmatically, or uploading files and linking them in pages. Trigger this skill any time the user mentions Canvas, LMS, course sync, module pages, or file uploads to Canvas. Also use when writing or debugging any Python script that calls the Canvas API.

2026-04-24130

wrap-up

aspi6246/Claude-Code-Skills-for-Academics

End-of-session cleanup routine that captures all session work before context is lost. Use this skill when the user says "wrap up", "wrap-up", "let's wrap it up", "let's wrap up", "wind down", "end session", "that's it for today", "save and close", "let's call it", "close out", "done for today", or any indication they are finishing a work session and want everything documented before starting fresh.

2026-04-24130

name

code-sweep

description

Code Sweep

Audit a research project for drift between scripts, outputs, raw data, and the paper. Propose fixes one at a time; the user confirms each. This skill extends script-registry — it reuses that skill's YAML schema, builder, and folder conventions. The only YAML additions Code Sweep introduces are status: and seed:.

Invocation

/code-sweep — full sweep (all checks)
/code-sweep --quick — structural checks only; skips dry-run + paper cross-ref
/code-sweep --sample N — override dry-run sample size (default 1000)
/code-sweep --no-dryrun — skip just the dry-run step
/code-sweep --no-paper — skip just the paper cross-ref step

If invoked via trigger phrase, default to full sweep unless the user qualifies ("quick code sweep", etc.).

Step 1 — Layout detection

Read the project's CLAUDE.md (in the current working directory) for folder paths. Default conventions (inherited from script-registry):

Purpose	Default path
Scripts	`Code/`
Working/purled scripts	`Code/_Claude Scripts/`
Archived scripts	`Code/_Archive/` and `Code/_Claude Scripts/_Archive/`
Figures	`Output/Figures/`
Tables	`Output/Tables/`
Processed data	`Output/Processed Data/`
Raw data	`Data/`
Paper	`Paper/`
Main paper file	`Paper/main.tex`
Session logs	`Code/_Claude Session Logs/`
Registry view	`Code/REGISTRY.md`
Registry builder	`Code/_Claude Scripts/build_registry.R`

If the project's CLAUDE.md declares different paths, use those. If no Code/ folder exists, abort with: "This doesn't look like a research project (no Code/ folder). Code Sweep is designed for empirical-finance research projects."

Step 2 — YAML schema extensions

script-registry already defines the registry: block (purpose, inputs, outputs, paper_target, notes). Code Sweep adds two optional fields:

registry:
  purpose: "..."
  inputs:
    - Data/Processed/foo.parquet
  outputs:
    - Output/Figures/fig_descriptive_stats.pdf
    - Output/Tables/tab_main_results.tex
  paper_target: "Table 1, Figure 3"
  notes: "..."
  status: active           # NEW — active | deprecated. Default: active if absent.
  seed: 42                 # NEW — required iff script uses randomness.

produces_figures / produces_tables are not added — they're derivable from outputs: by extension (.pdf/.png → figure, .tex → table).

Step 3 — Naming patterns

Inherit script-registry's convention:

Scripts: ^\d+[a-z]?\. Claude_[A-Za-z0-9_]+\.Rmd$ (e.g. 3. Claude_Clean_Data.Rmd, 12b. Claude_Run_Models.Rmd)
Figures: ^fig_[a-z0-9_]+\.(pdf|png|jpg)$ in Output/Figures/
Tables: ^tab_[a-z0-9_]+\.tex$ in Output/Tables/

If the project's existing files don't follow these patterns, prefer the project's actual convention — sample a few existing files to infer the pattern and check the rest against that, rather than rigidly enforcing the default.

Step 4 — Checks

Run in this order. Tag each finding MUST-FIX or SUGGESTED. Skip checks gated by --quick / --no-dryrun / --no-paper.

MUST-FIX (paper can't reproduce / pipeline broken)

M1. registry: block completeness — for every .Rmd in Code/ (non-recursive, excluding _Archive/):

registry: block exists
purpose, inputs, outputs are present (inputs/outputs may be empty lists)
All inputs: paths actually exist on disk
All outputs: paths actually exist on disk (the script may not have been run yet — if status is active and outputs are missing, flag MUST-FIX; if status: deprecated, demote to SUGGESTED)

M2. Paper cross-reference (skip on --quick or --no-paper) — parse the paper:

Start with Paper/main.tex (or whatever CLAUDE.md designates).
Recursively follow \input{...} and \include{...} to sub-files.
Resolve \graphicspath{{path1/}{path2/}} if present.
Extract every \includegraphics[...]{stem} and \input{...} reference (ignore commented lines).
For each referenced figure stem, the file must exist in Output/Figures/ (try .pdf, .png, .jpg) AND some active script's outputs: must list it.
For each referenced table stem, the file must exist in Output/Tables/ (.tex) AND some active script's outputs: must list it.

Missing file OR missing producing script → MUST-FIX. Don't compile the paper — this is parse-based.

M3. Dry-run viability (skip on --quick or --no-dryrun) — for each status: active script:

Purl: Rscript -e "knitr::purl('Code/<script>.Rmd', output=tempfile(fileext='.R'), quiet=TRUE)"

Wrap the purled code with a header that sets:

options(.codesweep_sample = <N from --sample>)
options(.codesweep_dryrun = TRUE)

Run the wrapped script via Rscript --vanilla with a 120s timeout.
Capture stderr; any non-zero exit or timeout → MUST-FIX with the error message.

Scripts that respect .codesweep_sample should sample their data loads. Scripts that don't will run on full data — slower but still informative. Document this convention in the project's CLAUDE.md if it's not already.

M4. Registry sync — run Rscript "Code/_Claude Scripts/build_registry.R" (or whatever the project uses). After it completes:

Check the "Drift Warnings" section in the rebuilt Code/REGISTRY.md.
Each drift warning (undeclared inputs/outputs, unused inputs/outputs) → MUST-FIX with the proposed YAML fix.
Check the "Scripts Without Registry Metadata" section — any entries → MUST-FIX (propose adding the block).

If build_registry.R doesn't exist, suggest invoking the script-registry skill to set it up, then abort this check.

M5. Reproducibility — set.seed() — grep each active .Rmd for randomness markers: sample(, rnorm(, runif(, rbinom(, rpois(, rbeta(, rgamma(, kmeans(, boot(, bootstrap(, Sys.setenv.*SEED. If any are present:

seed: must be declared in the registry: block
The script must call set.seed(<the registry seed>) somewhere before the randomness

Missing seed OR mismatched seed → MUST-FIX.

SUGGESTED (housekeeping)

S1. Stale outputs — for each declared output, compare its mtime to each declared input's mtime. If any input is newer than this output → flag staleness. Report only ("fig_x.pdf is older than data/raw_y.parquet"). Don't propose re-running. The user knows whether the staleness matters.

S2. Orphan outputs — files in Output/Figures/, Output/Tables/, Output/Processed Data/ not declared in any active script's outputs: → SUGGESTED. Propose: move to Output/_Archive/ (create if missing) OR delete (only if user explicitly says delete).

S3. Orphan inputs — files under Data/ not referenced in any active script's inputs: → SUGGESTED. Don't propose deletion (raw data is precious); just flag.

S4. Naming-convention violations — files in Code/, Output/Figures/, Output/Tables/ that don't match the patterns in Step 3 → SUGGESTED with proposed rename. Do not auto-rename — every rename also requires updating outputs: declarations and possibly the paper's \includegraphics references. Propose the rename and the corresponding YAML/paper edits as one bundle.

S5. Deprecated or DAG-orphan scripts — flag a script as a candidate for Code/_Archive/ if:

status: deprecated, OR
No other active script's inputs: references any of its outputs, AND none of its outputs appear in the paper cross-ref set from M2

→ SUGGESTED move to Code/_Archive/. Per project rule: move, never delete.

S6. Loose .R files in Code/ — any .R file directly under Code/ (not in _Claude Scripts/) that isn't the pipeline orchestrator (Code/00_run_pipeline.R or whatever CLAUDE.md designates) → SUGGESTED move to Code/_Claude Scripts/. Likely a purled or temp script that didn't get filed.

S7. TODO/FIXME inventory — grep \b(TODO|FIXME|XXX|HACK)\b across Code/ (excluding _Archive/). Report each match as file:line:text, grouped by file. Informational only — no proposed fix.

S8. Environment manifest sync — collect every library(pkg), require(pkg), and pkg::fn reference across Code/ (excluding _Archive/). Compare to renv.lock if present, or DESCRIPTION / requirements.txt. Packages referenced but not in the lockfile → SUGGESTED. If no lockfile exists, suggest creating one with renv::init().

Step 5 — Report

Append the sweep results to today's session log at Code/_Claude Session Logs/YYYY-MM-DD.md (the same convention as wrap-up: one file per day, multiple sessions append with a horizontal-rule separator). If the project's CLAUDE.md designates a different log path or filename pattern, follow that instead. Create the log file if it does not exist.

If the day's log already contains content, append with a leading horizontal rule (---) and timestamped section header. The sweep does not replace any prior content.

Format:

---

## Code Sweep — <YYYY-MM-DD HH:MM>

**Invocation:** `/code-sweep <args>`
**Mode:** full | quick | partial (note skipped checks)

### Summary
- MUST-FIX: <N> findings
- SUGGESTED: <M> findings

### MUST-FIX

#### M1.a — `registry:` block missing `inputs:` in `3. Claude_Clean_Data.Rmd`
**Issue:** YAML block missing required field.
**Proposed fix:** add `inputs: [Data/raw/firms.parquet]` based on detected `read_parquet()` call at line 42.
**Status:** pending

(... one block per finding ...)

### SUGGESTED

(... same format ...)

### Outcome
- Applied: <N>
- Declined: <M> (reasons below)
- Deferred to PINBOARD.md: <K>

Update each finding's Status: line as the user responds: pending → applied | declined: <reason> | deferred.

Step 6 — Confirmation flow

Present findings one at a time, MUST-FIX first, in this order:

Print the finding (with file path, line numbers where relevant, proposed fix).
Ask: "Apply this fix? [y/n/skip/defer/explain]"
- y — apply the fix, update Status to applied.
- n — ask for a one-line reason, record declined: <reason>.
- skip — leave Status pending, revisit at end of pass.
- defer — append to PINBOARD.md (use the pinboard skill if available), Status deferred.
- explain — give more context on why this is flagged, then re-ask.
Move to the next finding.

Batch shortcut: when 3+ consecutive findings share the same check ID (e.g., multiple S4 naming violations), after the first one offer: "Apply this same fix to all remaining S4 findings? [y/n]". If y, apply en bloc and report the count.

When all findings are processed:

Print the final summary (N applied, M declined, K deferred).
Finalise the report section in the session log.
If any MUST-FIX were declined or deferred, surface them as a one-line warning: "⚠ MUST-FIX findings unresolved — paper may not reproduce."

Important Rules

Never auto-apply any fix without confirmation. Even "safe" ones (e.g., adding a status: active default). The whole skill is propose-and-confirm.
Never delete code or data. Moves to _Archive/ only. This is a project ground rule.
Never re-run scripts to "fix" stale outputs. S1 flags staleness; the user decides what to re-run.
Never compile the paper. M2 is parse-based. LaTeX compile is the user's job.
Never edit _Archive/ contents. It's the dumping ground.
Don't duplicate script-registry's logic. Use its builder, read its REGISTRY.md, trust its drift detection. Code Sweep adds checks that span beyond the registry (paper cross-ref, dry-run, orphans, naming, TODOs, env manifest).
Don't call other skills inline (e.g., the pinboard skill). If the user wants pinboard entries, just write directly to PINBOARD.md using the existing format the project uses.

Known limitations

Dry-run (M3) only catches errors that surface within 120s on sample data. Scaling bugs, OOM issues, and long-tail failures may slip through.
Paper cross-ref (M2) doesn't follow conditional \if...\else...\fi branches. Figures inside dead branches will look "missing" — user can decline those findings.
Orphan detection (S2, S3) assumes registry: declarations are accurate. Stale or incomplete declarations produce false positives. Run M4 (registry sync) first so drift warnings surface before orphan checks.
Naming patterns (S4) are inferred from existing files if the defaults don't match. If the project mixes conventions, sweep may pick the wrong "standard" — user can decline.

Quick reference — when to invoke

Run Code Sweep:

Before sharing the code with a coauthor or RA
Before a paper submission / re-submission
After a large refactor of Code/
After a data refresh (raw data updated → check downstream staleness)
When the user is unsure what state the project is in

Don't run Code Sweep mid-development on a single script — it's a milestone tool, not a per-commit linter.