ワンクリックで
run-ferries
Launch, monitor, and seal Marin canary and daily ferry runs.
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
メニュー
Launch, monitor, and seal Marin canary and daily ferry runs.
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
SOC 職業分類に基づく
Lint, run the pre-PR checks, commit, push, and author or update the branch's pull request in the required plain-text format. Use when committing, pushing, or creating/updating a PR.
Modify or upstream a Grug/Grugformer experiment variant.
Run a perf gate on a PR that touches lib/zephyr internals.
Curate the experiment report index at docs/reports/index.md.
Triage a failed canary ferry run (CI-invoked).
Refresh Marin TPU-vLLM forks from a tpu-inference release/LKG pair, update exact SHA pins, run TPU smokes, and open the Marin PR.
| name | run-ferries |
| description | Launch, monitor, and seal Marin canary and daily ferry runs. |
Two ferry lanes:
canary: fast, low-cost always-on health checkdaily: higher-scale integration run with bounded changesBoth keep core data assumptions aligned and share the same monitoring/triage discipline.
Templates:
experiments/ferries/canary_ferry.py (MoE canary, TPU and GPU via CANARY_ACCELERATOR)experiments/ferries/daily.pyIntent:
Shared baseline:
nemotron_mix baselineus-central1 (zone us-central1-a)docs/experiments/daily-ferry-log.mdDaily baseline defaults:
llama_150m)Canary runs normally do not require a proposal cycle or PR. For daily, collect:
If objective is ambiguous, ask before editing.
SUCCEEDED/FAILED/STOPPED); do not stop early. Full ferry monitoring often takes 4-5 hours.ferry, ferry-daily, ferry-log-only, ferry-sealed.ferry/daily/YYYYMMDD/<run_slug>docs/experiments/daily-ferry-log.md, keep detailed debug/run narrative in the issue.Check the latest entries in docs/experiments/daily-ferry-log.md.
LAST_FERRY_SHA=<last_ferry_commit_sha>
LAST_FERRY_DATE=<YYYY-MM-DD>
git log --oneline "${LAST_FERRY_SHA}..HEAD" -- experiments/ lib/ scripts/
gh issue list \
--label experiment \
--search "updated:>=${LAST_FERRY_DATE}" \
--limit 100
Treat GitHub-tagged ferry PRs/issues as source of truth. Use "since last ferry run" rather than fixed wall-clock boundaries.
experiments/ferries/daily.pyFERRY_DATE in launch env: daily-125m-YYYY-MM-DD style).In the run issue, record:
low/medium/high),Then push the launch commit (no proposal PR by default).
Confirm requester approval in-thread unless they already gave explicit "launch without asking" permission.
uv run iris --cluster=marin job run --no-wait --cpu=1 --memory=2G --extra=cpu \
-- python -m experiments.ferries.daily
After launch, capture and post to the issue:
iris job run, form /<user>/iris-run-job-YYYYMMDD-HHMMSS)Optional deterministic daily rerun name:
uv run iris --cluster=marin job run --no-wait --cpu=1 --memory=2G --extra=cpu \
-e FERRY_DATE "$(date +%Y%m%d-%H%M%S)-daily-ferry" \
-- python -m experiments.ferries.daily
Follow the babysit-job skill with job_id, cluster, experiment=<ferry script path>.
Post in the ferry issue: final status, key metrics/regressions, Iris job ID and W&B link(s), recommendation for next ferry. Optional: post a manual Discord update for major run state changes.
For daily-log metric fields, extract canonical final keys with:
uv run python scripts/ferries/daily_analysis.py \
--run <wandb_run_url_or_path> \
--format markdown
Required terminal issue comment template:
Final status: <SUCCEEDED|FAILED|STOPPED>
Iris job id: <job_id>
W&B link: <url>
Final eval summary: <short summary + key metrics>
Experiment link: <experiment JSON/browser link>
Recommendation / victory decision: <next action>
experiments/ferries/daily.py used for the run).docs/experiments/daily-ferry-log.md, following .agents/skills/commit/SKILL.md for description format.ferry, ferry-daily, ferry-log-only, ferry-sealed.Default mode: launch the existing canary script as-is and monitor. Do not run the daily proposal/PR loop unless intentionally changing canary. Even for unchanged runs, ask the requester before launch unless they explicitly waived that requirement.
Launch (TPU):
uv run iris --config=lib/iris/config/marin.yaml \
job run --memory=16G --disk=16G --cpu=1 --extra=tpu \
-- python -m experiments.ferries.canary_ferry
Launch (GPU / CoreWeave):
uv run iris --config=lib/iris/config/coreweave.yaml \
job run --memory=16G --disk=16G --cpu=1 --extra=cpu \
-e MARIN_PREFIX s3://marin-na/marin \
-e CANARY_ACCELERATOR gpu \
-- python -m experiments.ferries.canary_ferry
If canary fails: triage and identify root cause, only then open a focused PR if a canary script/config change is necessary, relaunch and monitor to terminal state.
profile_summary.json is in the workflow logs (default params: --warmup-steps 5, --breakdown-mode exclusive_per_track, --hot-op-limit 25). The step summary has pointers to the raw trace artifact and W&B run.--run-target — see .agents/skills/profile-training/.step_time.all_steps.count before drawing conclusions from steady-state stats.exclusive_per_track (the default) can hide device stalls that overlap across tracks. Use exclusive_global when investigating stall-heavy profiles.If a daily variant is clearly better holistically, promote it as the new default daily recipe/template.
Promotion signals:
When promoting: open a follow-up PR updating experiments/ferries/daily.py and this skill; include a concise before/after metrics table.
docs/experiments/daily-ferry-log.md.docs/experiments/daily-ferry-log.md.agents/skills/babysit-job/SKILL.md.agents/projects/ferry_framework.md.agents/skills/run-research/