ワンクリックでManusで任意のスキルを実行

$pwd:

ml-experiment

Name: Ml Experiment
Author: Leeroo-AI

// Use when starting, logging, or reviewing ML experiments — maintains a persistent experiment journal with hypotheses, results, and learnings across sessions

Manusで実行

$ git log --oneline --stat

stars:181

forks:17

updated:2026年3月4日 19:31

SKILL.md

readonly

name	ml-experiment
description	Use when starting, logging, or reviewing ML experiments — maintains a persistent experiment journal with hypotheses, results, and learnings across sessions

Experiment Journal

Externalize your experimental reasoning. Every ML project is a sequence of hypotheses tested — this skill makes that sequence visible, persistent, and learnable.

The Iron Law

NO NEW EXPERIMENT WITHOUT LOGGING THE HYPOTHESIS FIRST

If you're about to change a hyperparameter, swap a dataset, try a new architecture, or modify a training recipe — write down what you expect to happen and why BEFORE running it. This is how you learn from experiments instead of just running them.

File Structure

Maintain these files in the project root (create if they don't exist):

experiments/
├── journal.md    — Running experiment log (append-only)
└── lessons.md    — Distilled patterns and rules (curated)

Phases

Phase 1: Before Any Experiment — Log the Hypothesis

Before changing anything or running anything new:

Read experiments/journal.md (if it exists) to see what's been tried
Write a new entry:

### YYYY-MM-DD HH:MM — [Experiment Name]

**Status**: PLANNED

**Hypothesis**: [What you expect to happen and why]
**Change**: [Exactly what's being modified — one variable at a time]
**Config**:
- key_param_1: old_value → new_value
- key_param_2: value (unchanged)
**Expected outcome**: [Specific metric target or qualitative expectation]
**Baseline**: [Current best metric to beat]

Gate: Entry is written before any code runs. No exceptions.

Phase 2: After the Experiment — Log the Result

Once results are in:

Update the journal entry:

**Status**: COMPLETED
**Actual outcome**: [What actually happened — metrics, behavior]
**Delta**: [How this compared to expectation — better/worse/different than expected]
**Duration**: [Wall time, GPU hours]
**Learning**: [One sentence — what this taught you]
**Next**: [What to try based on this result]

Gate: Result is logged before starting the next experiment.

Phase 3: Before the Next Iteration — Review History

Before proposing or starting the next experiment:

Read experiments/journal.md — scan recent entries
Check: Has this exact approach been tried before? What happened?
Read experiments/lessons.md — are there rules that apply?
Only then propose the next experiment

Gate: You can articulate why this experiment is different from previous attempts.

Phase 4: Periodically — Distill Lessons

After every 3-5 experiments, or when a pattern emerges:

Review recent journal entries for patterns
Add rules to experiments/lessons.md:

## Lessons

- [YYYY-MM-DD] [Context]: [Lesson]. Source: [user correction / experiment result / KB finding]
  Example: "2024-03-15 QLoRA: alpha/r ratio matters more than absolute rank for 7B models. Source: experiments showed r=32/alpha=64 outperformed r=64/alpha=64"

## Rules (hard-won)

- NEVER [thing that always fails] because [reason]. Learned: [date]
- ALWAYS [thing that always works] when [condition]. Learned: [date]

Gate: Lessons file has been updated before closing out a series of experiments.

After This

Starting a new experiment? Loop back to Phase 1.
Need ideas for what to try next? Invoke ml-iterate — it reads your journal and proposes ranked options.
Debugging a failed experiment? Invoke ml-debug — include the journal entry as context.
Want to verify a config before running? Invoke ml-verify — catch mistakes before wasting GPU time.

Anti-Patterns

Mistake	Why it happens	What to do instead
Running without logging	"I'll just try this quick thing"	Even quick experiments get logged — they compound into knowledge
Changing multiple variables	"Let me also bump the LR while I'm at it"	One variable per experiment. Otherwise you can't attribute the result.
Not recording the baseline	"I'll remember what the old score was"	Write the baseline metric in the entry. Memory is unreliable.
Skipping the review step	"I know what I tried before"	Read the journal. You'll find experiments you forgot about.
Never distilling lessons	"The journal has everything"	A 200-entry journal is noise. Lessons are signal. Distill regularly.

Examples

Starting a fine-tuning experiment:

User: "Let's try QLoRA with rank 64 instead of 32"
Agent: [Reads experiments/journal.md]
Agent: [Writes new entry with hypothesis: "Higher rank captures more task-specific features, expecting +2% accuracy"]
Agent: [Proceeds with implementation]

After getting results:

User: "Training finished, eval accuracy went from 78% to 81%"
Agent: [Updates journal entry with actual outcome, delta (+3% vs expected +2%), learning]
Agent: [Suggests next experiment based on result]

Before next iteration:

User: "What should we try next?"
Agent: [Reads journal — sees rank 64 worked, rank 16 didn't, data augmentation untested]
Agent: [Invokes ml-iterate with full history context]

related-skills.json

同じリポジトリ

ml-debug.md

from "Leeroo-AI/superml"

Use when something is failing in ML/AI work — OOM, NaN, divergence, crashes, bad throughput, wrong outputs, dependency conflicts

2026-03-15181

ml-iterate.md

from "Leeroo-AI/superml"

Use when the user is stuck, needs ranked next steps, or wants alternatives after initial experiments — "I tried X and got Y, what next?"

2026-03-15181

ml-research.md

from "Leeroo-AI/superml"

Use when the user wants to understand an ML/AI topic, compare approaches, or survey framework capabilities — "how does X work?", "compare X vs Y"

2026-03-15181

ml-verify.md

from "Leeroo-AI/superml"

Use when the user wants to verify code, config, or math before running — or proactively before any expensive training job or deployment

2026-03-15181

ml-plan.md

from "Leeroo-AI/superml"

Use when the user wants an implementation plan, architecture design, or multi-step ML pipeline — "build X", "implement X", "design X", "set up X"

2026-03-08181

using-superml.md

from "Leeroo-AI/superml"

Use when starting any conversation involving ML/AI — establishes how to use Leeroopedia KB tools and workflow skills

2026-03-07181

package.json

"author": "Leeroo-AI"

"repository": "Leeroo-AI/superml"

GitHub リポジトリを開く Creator のリポジトリを見る

$ install --global

$ download --local

Manusで実行

$ useful --forSOC

データサイエンティストコンピュータ・数学職15-2051L4

name	ml-experiment
description	Use when starting, logging, or reviewing ML experiments — maintains a persistent experiment journal with hypotheses, results, and learnings across sessions

Experiment Journal

Externalize your experimental reasoning. Every ML project is a sequence of hypotheses tested — this skill makes that sequence visible, persistent, and learnable.

The Iron Law

NO NEW EXPERIMENT WITHOUT LOGGING THE HYPOTHESIS FIRST

File Structure

Maintain these files in the project root (create if they don't exist):

experiments/
├── journal.md    — Running experiment log (append-only)
└── lessons.md    — Distilled patterns and rules (curated)

Phases

Phase 1: Before Any Experiment — Log the Hypothesis

Before changing anything or running anything new:

Read experiments/journal.md (if it exists) to see what's been tried
Write a new entry:

### YYYY-MM-DD HH:MM — [Experiment Name]

**Status**: PLANNED

**Hypothesis**: [What you expect to happen and why]
**Change**: [Exactly what's being modified — one variable at a time]
**Config**:
- key_param_1: old_value → new_value
- key_param_2: value (unchanged)
**Expected outcome**: [Specific metric target or qualitative expectation]
**Baseline**: [Current best metric to beat]

Gate: Entry is written before any code runs. No exceptions.

Phase 2: After the Experiment — Log the Result

Once results are in:

Update the journal entry:

**Status**: COMPLETED
**Actual outcome**: [What actually happened — metrics, behavior]
**Delta**: [How this compared to expectation — better/worse/different than expected]
**Duration**: [Wall time, GPU hours]
**Learning**: [One sentence — what this taught you]
**Next**: [What to try based on this result]

Gate: Result is logged before starting the next experiment.

Phase 3: Before the Next Iteration — Review History

Before proposing or starting the next experiment:

Read experiments/journal.md — scan recent entries
Check: Has this exact approach been tried before? What happened?
Read experiments/lessons.md — are there rules that apply?
Only then propose the next experiment

Gate: You can articulate why this experiment is different from previous attempts.

Phase 4: Periodically — Distill Lessons

After every 3-5 experiments, or when a pattern emerges:

Review recent journal entries for patterns
Add rules to experiments/lessons.md:

## Lessons

- [YYYY-MM-DD] [Context]: [Lesson]. Source: [user correction / experiment result / KB finding]
  Example: "2024-03-15 QLoRA: alpha/r ratio matters more than absolute rank for 7B models. Source: experiments showed r=32/alpha=64 outperformed r=64/alpha=64"

## Rules (hard-won)

- NEVER [thing that always fails] because [reason]. Learned: [date]
- ALWAYS [thing that always works] when [condition]. Learned: [date]

Gate: Lessons file has been updated before closing out a series of experiments.

After This

Starting a new experiment? Loop back to Phase 1.
Need ideas for what to try next? Invoke ml-iterate — it reads your journal and proposes ranked options.
Debugging a failed experiment? Invoke ml-debug — include the journal entry as context.
Want to verify a config before running? Invoke ml-verify — catch mistakes before wasting GPU time.

Anti-Patterns

Mistake	Why it happens	What to do instead
Running without logging	"I'll just try this quick thing"	Even quick experiments get logged — they compound into knowledge
Changing multiple variables	"Let me also bump the LR while I'm at it"	One variable per experiment. Otherwise you can't attribute the result.
Not recording the baseline	"I'll remember what the old score was"	Write the baseline metric in the entry. Memory is unreliable.
Skipping the review step	"I know what I tried before"	Read the journal. You'll find experiments you forgot about.
Never distilling lessons	"The journal has everything"	A 200-entry journal is noise. Lessons are signal. Distill regularly.

Examples

Starting a fine-tuning experiment:

User: "Let's try QLoRA with rank 64 instead of 32"
Agent: [Reads experiments/journal.md]
Agent: [Writes new entry with hypothesis: "Higher rank captures more task-specific features, expecting +2% accuracy"]
Agent: [Proceeds with implementation]

After getting results:

User: "Training finished, eval accuracy went from 78% to 81%"
Agent: [Updates journal entry with actual outcome, delta (+3% vs expected +2%), learning]
Agent: [Suggests next experiment based on result]

Before next iteration:

User: "What should we try next?"
Agent: [Reads journal — sees rank 64 worked, rank 16 didn't, data augmentation untested]
Agent: [Invokes ml-iterate with full history context]

ml-experiment

Experiment Journal

The Iron Law

File Structure

Phases

Phase 1: Before Any Experiment — Log the Hypothesis

Phase 2: After the Experiment — Log the Result

Phase 3: Before the Next Iteration — Review History

Phase 4: Periodically — Distill Lessons

After This

Anti-Patterns

Examples

このリポジトリの他の Skills

このリポジトリの他の Skills

Experiment Journal

The Iron Law

File Structure

Phases

Phase 1: Before Any Experiment — Log the Hypothesis

Phase 2: After the Experiment — Log the Result

Phase 3: Before the Next Iteration — Review History

Phase 4: Periodically — Distill Lessons

After This

Anti-Patterns

Examples