一键在 Manus 中运行任何 Skill

$pwd:

train-mlx

Name: Train Mlx
Author: mybigday

// LoRA fine-tune a causal LM natively on Apple Silicon (M1/M2/M3/M4) via `mlx-lm`. Triggered when the user is on macOS arm64, mentions MLX, or wants better throughput than torch-MPS. Use this instead of `train-sft` when the host is a Mac with no NVIDIA GPU.

在 Manus 中运行

$ git log --oneline --stat

stars:0

forks:0

updated:2026年4月28日 08:03

SKILL.md

readonly

name	train-mlx
description	LoRA fine-tune a causal LM natively on Apple Silicon (M1/M2/M3/M4) via `mlx-lm`. Triggered when the user is on macOS arm64, mentions MLX, or wants better throughput than torch-MPS. Use this instead of `train-sft` when the host is a Mac with no NVIDIA GPU.

train-mlx — Apple MLX LoRA fine-tune

For everything except Apple Silicon, prefer the train-sft skill (torch + TRL). MLX wins on M-series because the unified memory means a 7B model that spills VRAM on a 24GB consumer NVIDIA card fits comfortably on a 36GB M-series.

When to fire

uname -sm returns Darwin arm64.
User mentions MLX, M-series, "Mac GPU", or unified memory.
Torch-MPS keeps OOM-ing or running out of MPS-supported ops.

Sequence

Install the MLX path (idempotent):

make env-mlx        # bootstrap with --accel=mps --mlx

Get the data into MLX's JSONL layout:
```
python scripts/dump_mlx_jsonl.py trl-lib/Capybara data/capybara-mlx
```
Produces train.jsonl + valid.jsonl with the messages (or text) field. You can skip this if you already have HF JSONL on disk.
Pick a 4-bit MLX-community base for fast loading on 16-32GB Macs:
- mlx-community/SmolLM2-1.7B-Instruct-4bit
- mlx-community/Llama-3.2-3B-Instruct-4bit
- mlx-community/Qwen2.5-7B-Instruct-4bit

Smoke run, then scale:

python scripts/mlx_finetune.py --config configs/mlx_default.yaml --iters 50
python scripts/mlx_finetune.py --config configs/mlx_default.yaml --iters 1000 --fuse

--fuse merges the LoRA back into the base weights so you can load the result with plain mlx_lm.generate or convert back to safetensors.

Push to Hub (optional): mlx_lm.fuse --upload-repo you/your-mlx-model.

Memory budget rules of thumb

Model	Min unified memory	Notes
≤1.7B	16 GB	works on M1/M2 base
3B	24 GB	M2 Pro / M3 Pro
7B 4-bit	36 GB	M3 Max / M4 Pro
13B 4-bit	64 GB	M2 Ultra / M3 Max

What MLX does NOT support

DPO/GRPO out of the box (mlx-lm has SFT + LoRA + DoRA — for preference data, generate offline preferences and do SFT-on-chosen, or use torch-MPS via train-dpo skill at slower speeds).
Multi-GPU (single-Mac only — fine, since you have one accelerator anyway).
bitsandbytes-style 8-bit optimizers (MLX has its own quantization).

Eval

scripts/eval_lm.py (lm-eval-harness) loads MLX-fused safetensors fine. For in-process MLX eval: mlx_lm.generate --model outputs/mlx-lora-fused --prompt "...".

related-skills.json

同仓库

rocm-strix-halo.md

from "mybigday/ml-intern-kit"

Set up training and inference on AMD Ryzen AI Max+ 395 / Strix Halo (gfx1151, RDNA 3.5) with TheRock nightly ROCm wheels. Triggered when the host has gfx1151, when `rocminfo` shows Strix Halo, or when the user mentions Strix Halo / Ryzen AI Max / gfx1151 / 128GB unified memory.

2026-04-280

env-bootstrap.md

from "mybigday/ml-intern-kit"

Recreate the ml-intern-kit Python environment on a new machine (laptop, rented GPU box, Docker, fresh checkout). Triggered when the user is on a new host, sees ImportError on a core dep (torch/transformers/trl/peft/accelerate), or wants to install flash-attn / unsloth / bitsandbytes after the fact.

2026-04-280

eval-model.md

from "mybigday/ml-intern-kit"

Evaluate a trained or downloaded language model with `lm-eval-harness` standard tasks (arc, hellaswag, gsm8k, mmlu, truthfulqa, ifeval, ...). Triggered when the user wants to benchmark, eval, or compare a model — pre- or post-training.

2026-04-280

inspect-dataset.md

from "mybigday/ml-intern-kit"

Audit a Hugging Face dataset before training to confirm splits, columns, format, sample rows, distributions, and duplicates. Triggered before any training/fine-tuning script runs, when a user mentions a new dataset, or when you hit a KeyError / format mismatch in a training job.

2026-04-280

launch-hf-job.md

from "mybigday/ml-intern-kit"

Submit a training/inference script to Hugging Face Jobs (`hf jobs run`). Triggered when the user wants to run training in the cloud, scale beyond local hardware, or kick off a multi-hour fine-tune. Enforces pre-flight: hub_model_id required, ≥2h timeout for training, single-job validation before batch.

2026-04-280

research-recipe.md

from "mybigday/ml-intern-kit"

Run a literature-first crawl before writing ANY ML training/fine-tuning/inference code. Spawns an Explore sub-agent that mines papers, citation graphs, methodology sections, and matched HF datasets to produce a ranked list of training recipes attributed to specific published results. Triggered when the user asks to fine-tune, train, or improve a model, or when the user names a task/benchmark and you need a recipe before coding.

2026-04-280

package.json

"author": "mybigday"

"repository": "mybigday/ml-intern-kit"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

软件开发工程师计算机与数学类职业15-1252L4

name	train-mlx
description	LoRA fine-tune a causal LM natively on Apple Silicon (M1/M2/M3/M4) via `mlx-lm`. Triggered when the user is on macOS arm64, mentions MLX, or wants better throughput than torch-MPS. Use this instead of `train-sft` when the host is a Mac with no NVIDIA GPU.

train-mlx — Apple MLX LoRA fine-tune

When to fire

uname -sm returns Darwin arm64.
User mentions MLX, M-series, "Mac GPU", or unified memory.
Torch-MPS keeps OOM-ing or running out of MPS-supported ops.

Sequence

Install the MLX path (idempotent):

make env-mlx        # bootstrap with --accel=mps --mlx

Get the data into MLX's JSONL layout:
```
python scripts/dump_mlx_jsonl.py trl-lib/Capybara data/capybara-mlx
```
Produces train.jsonl + valid.jsonl with the messages (or text) field. You can skip this if you already have HF JSONL on disk.
Pick a 4-bit MLX-community base for fast loading on 16-32GB Macs:
- mlx-community/SmolLM2-1.7B-Instruct-4bit
- mlx-community/Llama-3.2-3B-Instruct-4bit
- mlx-community/Qwen2.5-7B-Instruct-4bit

Smoke run, then scale:

python scripts/mlx_finetune.py --config configs/mlx_default.yaml --iters 50
python scripts/mlx_finetune.py --config configs/mlx_default.yaml --iters 1000 --fuse

--fuse merges the LoRA back into the base weights so you can load the result with plain mlx_lm.generate or convert back to safetensors.

Push to Hub (optional): mlx_lm.fuse --upload-repo you/your-mlx-model.

Memory budget rules of thumb

Model	Min unified memory	Notes
≤1.7B	16 GB	works on M1/M2 base
3B	24 GB	M2 Pro / M3 Pro
7B 4-bit	36 GB	M3 Max / M4 Pro
13B 4-bit	64 GB	M2 Ultra / M3 Max

What MLX does NOT support

DPO/GRPO out of the box (mlx-lm has SFT + LoRA + DoRA — for preference data, generate offline preferences and do SFT-on-chosen, or use torch-MPS via train-dpo skill at slower speeds).
Multi-GPU (single-Mac only — fine, since you have one accelerator anyway).
bitsandbytes-style 8-bit optimizers (MLX has its own quantization).

Eval

scripts/eval_lm.py (lm-eval-harness) loads MLX-fused safetensors fine. For in-process MLX eval: mlx_lm.generate --model outputs/mlx-lora-fused --prompt "...".

train-mlx

train-mlx — Apple MLX LoRA fine-tune

When to fire

Sequence

Memory budget rules of thumb

What MLX does NOT support

Eval

同仓库更多 Skills

同仓库更多 Skills

train-mlx — Apple MLX LoRA fine-tune

When to fire

Sequence

Memory budget rules of thumb

What MLX does NOT support

Eval