with one click
run-research
// Multi-session research workflow: logbooks, experiment issues, and W&B.
// Multi-session research workflow: logbooks, experiment issues, and W&B.
| name | run-research |
| description | Multi-session research workflow: logbooks, experiment issues, and W&B. |
For long-running, exploratory research where an agent iterates on benchmarks, experiments, and hypotheses over multiple sessions. Optimizes for reproducibility, clear decision history, fast iteration, and handoff quality.
run-research is the base workflow for research-like work. Layer domain
skills on top for task-specific constraints, keeping lifecycle process here.
.agents/skills/add-pallas-kernel/SKILL.md as a
specialization on top of this skill.add-pallas-kernel.For each research thread, maintain all of:
research/<topic>).experiment)..agents/logbooks/<topic>.md. Use the
term research logbook consistently in prose and file naming.marin project for pretraining runs..agents/skills/reserve-tpu/SKILL.md for the Iris-backed workflow.experiment..agents/logbooks/<topic>.md.At kickoff, write: motivation, problem statement, success metrics, initial hypotheses, first experiment matrix, links to relevant code paths, key references (papers/blog posts), stop criteria (what evidence is enough to stop/ship/escalate), and a fixed baseline case for repeated comparison.
Prefer creating the experiment issue sooner rather than later; confirm timing with the human collaborator if scope or visibility is uncertain.
Experiment ID convention: assign a short prefix for the series (e.g. MOE-HC)
and use IDs like MOE-HC-001 in logbook entries, W&B run names, and issue
comments.
For each non-trivial experiment:
.agents/logbooks/foo.md:
https://github.com/marin-community/marin/tree/<commit-or-tag>/.agents/logbooks/foo.md).Update cadence: post an issue update on every significant milestone, or every 6 hours, whichever is sooner. A significant milestone means someone is likely to want to find that update later. If none occurred by the 6-hour mark, post a brief heartbeat with current status, blockers, and next ETA.
Issue comment style:
The issue body is the public summary layer: keep a short TL;DR current, track
scope changes, keep links current, summarize takeaways for non-specialists,
maintain a short decision log (decision, evidence, date, owner), maintain a
negative-results index with links, and keep a Conclusion section current as
evidence solidifies.
Write the body for readers who know Marin/LLM systems generally but not this specific thread. Issue updates/body must stand on their own — include enough framing (goal, assumptions, exact commands, result interpretation) for someone else to reproduce or critique the claim. Logbook entries can be terse and context-local but should still include exact commands and links to supporting artifacts.
Label major claims as one of: exploratory (single run / weak evidence),
replicated (repeated and consistent), stable (held across shape/seed/
hardware variants relevant to scope).
When you reach a meaningful milestone:
This creates a stable checkpoint even if the branch continues.
Conclusion (decision/outcome and why).Use this structure in .agents/logbooks/<topic>.md:
# <Topic>: Research Logbook
## Scope
- Goal:
- Primary metric(s):
- Constraints:
## Baseline
- Date:
- Code refs:
- Baseline numbers:
## Experiment Log
### YYYY-MM-DD HH:MM - <short label>
- Hypothesis:
- Command:
- Config:
- Result:
- Interpretation:
- Next action:
Use this body when filing the GitHub experiment issue at kickoff. Title the
issue Experiment: <topic> and apply the experiment label.
## Description
(Add enough context someone outside could understand what you're trying to do.
Doesn't need to be too long, but enough you could explain it to someone working
on LLMs at another lab.)
## Hypothesis or Goal
(What are you trying to learn or achieve?)
### Links
(Delete any that aren't applicable.)
* WandB Report: (link)
* Data Browser: (link)
* (etc.)
## Results
(What did you find, including relevant evaluation metrics, etc.)
Use concise updates in issue comments:
Update: <short label>
- Change:
- Result delta:
- Confidence:
- Links:
- Tag:
- Logbook section:
- W&B:
- Next:
Keep these sections in the issue body:
TL;DRScopeDecision log (append as decisions are made)Negative results index (links to comments/logbook entries)Current baseline (shape/config + reference numbers)Ops hygiene checklist (before claiming a regression):
Before posting a result:
Before closing the issue:
Conclusion..agents/skills/organize-experiments/.agents/skills/add-pallas-kernel/
Monitor an Iris job and recover it on failure. Use when asked to babysit or watch a job or run.
Multi-agent correctness review of a pull request.
Author or update a Marin PR in the required plain-text format. Use when creating or updating a PR.
De-rot markdown docs in lib/iris, lib/zephyr, and lib/fray.
Scheduled scrub: docs and code parity.
Scheduled scrub: TL;DR blocks on experiment issues.