Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

$pwd:

research-recipe

Name: Research Recipe
Author: mybigday

// Run a literature-first crawl before writing ANY ML training/fine-tuning/inference code. Spawns an Explore sub-agent that mines papers, citation graphs, methodology sections, and matched HF datasets to produce a ranked list of training recipes attributed to specific published results. Triggered when the user asks to fine-tune, train, or improve a model, or when the user names a task/benchmark and you need a recipe before coding.

In Manus ausführen

$ git log --oneline --stat

stars:0

forks:0

updated:28. April 2026 um 08:03

SKILL.md

readonly

name

research-recipe

description

Run a literature-first crawl before writing ANY ML training/fine-tuning/inference code. Spawns an Explore sub-agent that mines papers, citation graphs, methodology sections, and matched HF datasets to produce a ranked list of training recipes attributed to specific published results. Triggered when the user asks to fine-tune, train, or improve a model, or when the user names a task/benchmark and you need a recipe before coding.

research-recipe — literature-first training recipe

Your knowledge of TRL/Transformers/PEFT APIs is outdated. Your knowledge of which dataset+method combo produces the best result on benchmark Z is even more outdated. Always crawl the literature first.

When to fire

The user asks to fine-tune / train / improve a model.
The user names a task or benchmark ("RAG with citations", "code completion on HumanEval", "math reasoning on MATH").
Before writing a trainer config from scratch.

Skip only for trivial non-code operations.

How to fire

Spawn a sub-agent so paper text never enters the main context:

Agent(
  description="Literature crawl for <task>",
  subagent_type="Explore",
  prompt="""
Literature crawl for <TASK>. Start from <ANCHOR PAPER OR TOPIC>.

1. Find anchor papers: search arxiv + HF papers for the task. Identify 2-3
   landmark papers (high citations, recent, or both). Use WebSearch and
   `WebFetch https://huggingface.co/papers?q=<task>`.

2. Crawl citation graph DOWNSTREAM: papers that cite the anchors and improved
   on them. Hit Semantic Scholar:
   `WebFetch https://api.semanticscholar.org/graph/v1/paper/arXiv:<id>/citations?fields=title,year,citationCount,abstract&limit=30`.

3. Read methodology sections (sections 3, 4, 5 — Methodology / Experiments /
   Results) of the 3-5 most promising recent papers. Use the arxiv HTML view:
   `WebFetch https://arxiv.org/html/<id>v1` or the PDF via abs page.

4. For each paper, extract:
   - DATASET: name, size, source, HF Hub repo if any, format (messages /
     prompt+completion / prompt+chosen+rejected).
   - METHOD: optimizer, learning rate, schedule, epochs, effective batch size,
     sequence length, key tricks (packing, FlashAttention, curriculum, …).
   - RESULT: exact numbers on a named benchmark.

5. Find ONE working code example using current TRL/Transformers APIs. Use
   `gh search code 'SFTTrainer path:examples extension:py' --limit 10` then
   `gh api repos/<owner>/<repo>/contents/<path>` to read it.

6. Cross-check the top dataset(s) on HF Hub via
   `WebFetch https://huggingface.co/api/datasets/<repo_id>` for downloads,
   recent activity, and (optionally) audit columns by suggesting
   `python scripts/inspect_dataset.py <repo_id>` to the main agent.

OUTPUT (500-1500 words):
- Ranked list of recipes. For each:
  - Paper: title, arxiv_id, date, venue
  - Result: exact numbers
  - Dataset(s): name, size, HF availability, format verified (yes/no)
  - Method: training approach + hyperparameters
  - "What made it work": specific insight
- SOTA landscape (one paragraph).
- Code anchor: one URL/path of a current working example.
- Risks: anything you saw in the literature that commonly breaks training
  (OOM, format mismatch, deprecated args).
"""
)

After the sub-agent returns

Translate the top recipe into:

A modified configs/sft_default.yaml (or new configs/<recipe>.yaml).
A scripts/inspect_dataset.py <dataset> call to confirm format.
The pre-flight block from AGENTS.md ("Reference implementation: …").

Do not write trainer code from memory. Read the example URL the sub-agent returned, then adapt the matching template under scripts/.

related-skills.json

gleiches Repository

rocm-strix-halo.md

from "mybigday/ml-intern-kit"

Set up training and inference on AMD Ryzen AI Max+ 395 / Strix Halo (gfx1151, RDNA 3.5) with TheRock nightly ROCm wheels. Triggered when the host has gfx1151, when `rocminfo` shows Strix Halo, or when the user mentions Strix Halo / Ryzen AI Max / gfx1151 / 128GB unified memory.

2026-04-280

env-bootstrap.md

from "mybigday/ml-intern-kit"

Recreate the ml-intern-kit Python environment on a new machine (laptop, rented GPU box, Docker, fresh checkout). Triggered when the user is on a new host, sees ImportError on a core dep (torch/transformers/trl/peft/accelerate), or wants to install flash-attn / unsloth / bitsandbytes after the fact.

2026-04-280

eval-model.md

from "mybigday/ml-intern-kit"

Evaluate a trained or downloaded language model with `lm-eval-harness` standard tasks (arc, hellaswag, gsm8k, mmlu, truthfulqa, ifeval, ...). Triggered when the user wants to benchmark, eval, or compare a model — pre- or post-training.

2026-04-280

inspect-dataset.md

from "mybigday/ml-intern-kit"

Audit a Hugging Face dataset before training to confirm splits, columns, format, sample rows, distributions, and duplicates. Triggered before any training/fine-tuning script runs, when a user mentions a new dataset, or when you hit a KeyError / format mismatch in a training job.

2026-04-280

launch-hf-job.md

from "mybigday/ml-intern-kit"

Submit a training/inference script to Hugging Face Jobs (`hf jobs run`). Triggered when the user wants to run training in the cloud, scale beyond local hardware, or kick off a multi-hour fine-tune. Enforces pre-flight: hub_model_id required, ≥2h timeout for training, single-job validation before batch.

2026-04-280

train-dpo.md

from "mybigday/ml-intern-kit"

Direct Preference Optimization (DPO) fine-tune with TRL `DPOTrainer`. Triggered when the user wants to align a model on preferences / pairwise comparisons / chosen-vs-rejected data, or improve an existing SFT checkpoint with a preference dataset.

2026-04-280

package.json

"author": "mybigday"

"repository": "mybigday/ml-intern-kit"

GitHub-Repository öffnen Creator-Repositorys ansehen

$ install --global

$ download --local

In Manus ausführen

$ useful --forSOC

DatenwissenschaftlerInformatik- und Mathematikberufe15-2051L4

name

research-recipe

description

research-recipe — literature-first training recipe

When to fire

The user asks to fine-tune / train / improve a model.
The user names a task or benchmark ("RAG with citations", "code completion on HumanEval", "math reasoning on MATH").
Before writing a trainer config from scratch.

Skip only for trivial non-code operations.

How to fire

Spawn a sub-agent so paper text never enters the main context:

Agent(
  description="Literature crawl for <task>",
  subagent_type="Explore",
  prompt="""
Literature crawl for <TASK>. Start from <ANCHOR PAPER OR TOPIC>.

1. Find anchor papers: search arxiv + HF papers for the task. Identify 2-3
   landmark papers (high citations, recent, or both). Use WebSearch and
   `WebFetch https://huggingface.co/papers?q=<task>`.

2. Crawl citation graph DOWNSTREAM: papers that cite the anchors and improved
   on them. Hit Semantic Scholar:
   `WebFetch https://api.semanticscholar.org/graph/v1/paper/arXiv:<id>/citations?fields=title,year,citationCount,abstract&limit=30`.

3. Read methodology sections (sections 3, 4, 5 — Methodology / Experiments /
   Results) of the 3-5 most promising recent papers. Use the arxiv HTML view:
   `WebFetch https://arxiv.org/html/<id>v1` or the PDF via abs page.

4. For each paper, extract:
   - DATASET: name, size, source, HF Hub repo if any, format (messages /
     prompt+completion / prompt+chosen+rejected).
   - METHOD: optimizer, learning rate, schedule, epochs, effective batch size,
     sequence length, key tricks (packing, FlashAttention, curriculum, …).
   - RESULT: exact numbers on a named benchmark.

5. Find ONE working code example using current TRL/Transformers APIs. Use
   `gh search code 'SFTTrainer path:examples extension:py' --limit 10` then
   `gh api repos/<owner>/<repo>/contents/<path>` to read it.

6. Cross-check the top dataset(s) on HF Hub via
   `WebFetch https://huggingface.co/api/datasets/<repo_id>` for downloads,
   recent activity, and (optionally) audit columns by suggesting
   `python scripts/inspect_dataset.py <repo_id>` to the main agent.

OUTPUT (500-1500 words):
- Ranked list of recipes. For each:
  - Paper: title, arxiv_id, date, venue
  - Result: exact numbers
  - Dataset(s): name, size, HF availability, format verified (yes/no)
  - Method: training approach + hyperparameters
  - "What made it work": specific insight
- SOTA landscape (one paragraph).
- Code anchor: one URL/path of a current working example.
- Risks: anything you saw in the literature that commonly breaks training
  (OOM, format mismatch, deprecated args).
"""
)

After the sub-agent returns

Translate the top recipe into:

A modified configs/sft_default.yaml (or new configs/<recipe>.yaml).
A scripts/inspect_dataset.py <dataset> call to confirm format.
The pre-flight block from AGENTS.md ("Reference implementation: …").

Do not write trainer code from memory. Read the example URL the sub-agent returned, then adapt the matching template under scripts/.

research-recipe

research-recipe — literature-first training recipe

When to fire

How to fire

After the sub-agent returns

Mehr aus diesem Repository

research-recipe — literature-first training recipe

When to fire

How to fire

After the sub-agent returns

Mehr aus diesem Repository