with one click
run-ferries
Launch, monitor, and seal Marin canary and daily ferry runs.
Menu
Launch, monitor, and seal Marin canary and daily ferry runs.
Lint, run the pre-PR checks, commit, push, and author or update the branch's pull request in the required plain-text format. Use when committing, pushing, or creating/updating a PR.
In CI, run the infra/lint catalog review over a PR.
Multi-agent correctness review of a pull request.
End-to-end workflow to fix a GitHub issue in marin-community/marin.
Scheduled scrub: docs and code parity.
Scheduled scrub: repository self-improvement.
| name | run-ferries |
| description | Launch, monitor, and seal Marin canary and daily ferry runs. |
Two ferry lanes:
canary: fast, low-cost always-on health checkdaily: higher-scale integration run with bounded changesBoth keep core data assumptions aligned and share the same monitoring/triage discipline.
Templates:
experiments/ferries/canary_ferry.py (MoE canary, TPU and GPU via CANARY_ACCELERATOR)experiments/ferries/daily.pyIntent:
Shared baseline:
nemotron_mix baselineus-central1 (zone us-central1-a)docs/experiments/daily-ferry-log.mdDaily baseline defaults:
llama_150m)Canary runs normally do not require a proposal cycle or PR. For daily, collect:
If objective is ambiguous, ask before editing.
SUCCEEDED/FAILED/STOPPED); do not stop early. Full ferry monitoring often takes 4-5 hours.ferry, ferry-daily, ferry-log-only, ferry-sealed.ferry/daily/YYYYMMDD/<run_slug>docs/experiments/daily-ferry-log.md, keep detailed debug/run narrative in the issue.Check the latest entries in docs/experiments/daily-ferry-log.md.
LAST_FERRY_SHA=<last_ferry_commit_sha>
LAST_FERRY_DATE=<YYYY-MM-DD>
git log --oneline "${LAST_FERRY_SHA}..HEAD" -- experiments/ lib/ scripts/
gh issue list \
--label experiment \
--search "updated:>=${LAST_FERRY_DATE}" \
--limit 100
Treat GitHub-tagged ferry PRs/issues as source of truth. Use "since last ferry run" rather than fixed wall-clock boundaries.
experiments/ferries/daily.pyFERRY_DATE in launch env: daily-125m-YYYY-MM-DD style).In the run issue, record:
low/medium/high),Then push the launch commit (no proposal PR by default).
Confirm requester approval in-thread unless they already gave explicit "launch without asking" permission.
uv run iris --cluster=marin job run --no-wait --cpu=1 --memory=2G --extra=cpu \
-- python -m experiments.ferries.daily
After launch, capture and post to the issue:
iris job run, form /<user>/iris-run-job-YYYYMMDD-HHMMSS)Optional deterministic daily rerun name:
uv run iris --cluster=marin job run --no-wait --cpu=1 --memory=2G --extra=cpu \
-e FERRY_DATE "$(date +%Y%m%d-%H%M%S)-daily-ferry" \
-- python -m experiments.ferries.daily
Follow the babysit-job skill with job_id, cluster, experiment=<ferry script path>.
Post in the ferry issue: final status, key metrics/regressions, Iris job ID and W&B link(s), recommendation for next ferry. Optional: post a manual Discord update for major run state changes.
For daily-log metric fields, extract canonical final keys with:
uv run python scripts/ferries/daily_analysis.py \
--run <wandb_run_url_or_path> \
--format markdown
Required terminal issue comment template:
Final status: <SUCCEEDED|FAILED|STOPPED>
Iris job id: <job_id>
W&B link: <url>
Final eval summary: <short summary + key metrics>
Experiment link: <experiment JSON/browser link>
Recommendation / victory decision: <next action>
experiments/ferries/daily.py used for the run).docs/experiments/daily-ferry-log.md, following .agents/skills/commit/SKILL.md for description format.ferry, ferry-daily, ferry-log-only, ferry-sealed.Default mode: launch the existing canary script as-is and monitor. Do not run the daily proposal/PR loop unless intentionally changing canary. Even for unchanged runs, ask the requester before launch unless they explicitly waived that requirement.
Launch (TPU):
uv run iris --config=lib/iris/config/marin.yaml \
job run --memory=16G --disk=16G --cpu=1 --extra=tpu \
-- python -m experiments.ferries.canary_ferry
Launch (GPU / CoreWeave):
uv run iris --config=lib/iris/config/coreweave.yaml \
job run --memory=16G --disk=16G --cpu=1 --extra=cpu \
-e MARIN_PREFIX s3://marin-na/marin \
-e CANARY_ACCELERATOR gpu \
-- python -m experiments.ferries.canary_ferry
If canary fails: triage and identify root cause, only then open a focused PR if a canary script/config change is necessary, relaunch and monitor to terminal state.
profile_summary.json is in the workflow logs (default params: --warmup-steps 5, --breakdown-mode exclusive_per_track, --hot-op-limit 25). The step summary has pointers to the raw trace artifact and W&B run.--run-target — see .agents/skills/profile-training/.step_time.all_steps.count before drawing conclusions from steady-state stats.exclusive_per_track (the default) can hide device stalls that overlap across tracks. Use exclusive_global when investigating stall-heavy profiles.If a daily variant is clearly better holistically, promote it as the new default daily recipe/template.
Promotion signals:
When promoting: open a follow-up PR updating experiments/ferries/daily.py and this skill; include a concise before/after metrics table.
docs/experiments/daily-ferry-log.md.docs/experiments/daily-ferry-log.md.agents/skills/babysit-job/SKILL.md.agents/projects/ferry_framework.md.agents/skills/run-research/