Create or improve Senpai target-repository onboarding files: program.md plus instructions/prompt-advisor.md and instructions/prompt-student.md. Use this skill whenever the user wants to point Senpai at a fresh ML or research target repository, define the research objective, primary metric, benchmark contract, allowed edit boundaries, W&B reporting contract, advisor/student prompts, or prepare a repo for autonomous advisor/student experiment loops.
GitHub CLI primitives for the senpai research workflow — label swaps, send-back, close, mark-review, issue checks, PR queries. Use this skill whenever you need to manipulate PR labels, send a PR back to a student, close a dead-end experiment, mark a PR for review, or query the current state of PRs and issues. Also triggers for: "swap labels", "send back to student", "close this PR", "mark for review", "check human issues", "list review-ready PRs", "idle students".
Survey all experiment PRs on a branch and return a structured status report: which students are idle, which PRs await review, which are WIP. This is the heartbeat query — use it to understand the current state of the research track. Triggers for: "survey state", "check PR status", "who's idle", "any PRs ready for review", "what's the current state".
Squash-merge a winning experiment PR and update the baseline. Handles the merge, BASELINE.md update, commit, push, and branch pull. Also handles merge conflicts by sending the PR back for rebase. Use this skill to: merge a winning PR, update baseline, squash merge experiment. Triggers for: "merge winner", "merge this PR", "update baseline after merge", "squash merge".
Submit experiment results for advisor review. Commits changes, pushes the branch, marks the PR as ready, and swaps the status label from wip to review. Use this skill when you've finished running experiments and posted your results comment. Triggers for: "submit for review", "mark PR ready", "send results to advisor", "submit experiment results".
Comprehensive primary skill for agents working with Weights & Biases. Covers both the W&B SDK (training runs, metrics, artifacts, sweeps) and the Weave SDK (GenAI traces, evaluations, scorers). Includes helper libraries, gotcha tables, and data analysis patterns. Use this skill whenever the user asks about W&B runs, Weave traces, evaluations, training metrics, loss curves, model comparisons, or any Weights & Biases data — even if they don't say "W&B" explicitly. Also trigger on training-curve diagnostics questions — run health, divergence, overfit/convergence/plateau, spikes, LR-schedule/grad-norm/grad-histogram reading, dead layers, step-axis choice, and run comparisons.
Poll for assigned experiment PRs for a student. Use this skill to: check for assignments, poll for work, see if there's a PR assigned to me. Triggers for: "any work for me?", "check for assignments", "poll for PRs".
Produce a fresh status report for the senpai ML experiment fleet. Use when the user asks for an experiment status, final status, PR/W&B/pod health check, stale student triage, training shutdown harvest, advisor-state audit, or a "what is really happening right now?" report. The report must prioritize paper-facing test metrics over validation metrics and compare test results to dataset benchmarks or targets.