Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

$pwd:

env-bootstrap

Name: Env Bootstrap
Author: mybigday

// Recreate the ml-intern-kit Python environment on a new machine (laptop, rented GPU box, Docker, fresh checkout). Triggered when the user is on a new host, sees ImportError on a core dep (torch/transformers/trl/peft/accelerate), or wants to install flash-attn / unsloth / bitsandbytes after the fact.

Ejecutar en Manus

$ git log --oneline --stat

stars:0

forks:0

updated:28 de abril de 2026, 08:03

SKILL.md

readonly

name	env-bootstrap
description	Recreate the ml-intern-kit Python environment on a new machine (laptop, rented GPU box, Docker, fresh checkout). Triggered when the user is on a new host, sees ImportError on a core dep (torch/transformers/trl/peft/accelerate), or wants to install flash-attn / unsloth / bitsandbytes after the fact.

env-bootstrap — recreate the env anywhere

Default path

bash bootstrap.sh           # auto-detects accelerator (cuda / rocm / mps / cpu)
bash bootstrap.sh --dev     # + pytest, ruff, jupyter
bash bootstrap.sh --eval    # + lm-eval-harness
bash bootstrap.sh --all     # everything except flash-attn / unsloth / mlx
source .venv/bin/activate
make doctor                 # prints torch backend, GPU info, mlx if installed

Force a specific accelerator

make env-cuda    # NVIDIA CUDA 12.4 wheels
make env-rocm    # AMD ROCm 6.2 wheels (gfx942 / gfx110X dGPUs only)
make env-strix-halo  # AMD Ryzen AI Max+ 395 / gfx1151 (TheRock nightly)
make env-mps     # macOS / Apple Silicon, torch-MPS only
make env-mlx     # macOS arm64: torch-MPS + Apple MLX
make env-cpu     # CPU-only wheels

bootstrap.sh picks the right PyTorch wheel index automatically: cu124 for NVIDIA, rocm6.2 for AMD, default (MPS) for arm64 macOS, cpu everywhere else.

When `uv` isn't installed

curl -LsSf https://astral.sh/uv/install.sh | sh

bootstrap.sh will pick it up automatically. If installing uv isn't an option, the script falls back to python3.11 -m venv + pip install -r requirements.txt.

When you need flash-attn or unsloth

These have CUDA / ABI requirements that vary per box, so they are deliberately not in requirements.txt. Install after the core stack works:

make flash      # flash-attn, requires matching CUDA toolkit
make unsloth    # unsloth, Linux only

If make flash fails with a build error, attn_implementation="sdpa" in the training config is a good fallback — slower than flash-attn-2 but works everywhere PyTorch does.

When you need a fully-pinned, identical env (Docker)

docker build -t ml-intern-kit:cu124 .
docker run --gpus all -it --rm \
    -v "$PWD":/workspace \
    -v "$HOME/.cache/huggingface":/root/.cache/huggingface \
    --env-file .env \
    ml-intern-kit:cu124 bash

The Dockerfile pins CUDA 12.4 + Python 3.11 + uv-resolved deps from pyproject.toml. Cache mounts let the host keep model/dataset downloads across container runs.

Common breakage on fresh boxes

Symptom	Fix
`torch.cuda.is_available()` is False on NVIDIA	Wrong wheel. `make env-cuda` re-pulls from `cu124` index.
`torch.cuda.is_available()` is False on AMD	ROCm uses the CUDA API surface — `torch.cuda.is_available()` should be True with a `torch.version.hip` set. If False: `make env-rocm`, then `rocminfo`.
`bitsandbytes` import error on macOS / ROCm	Expected — gated to CUDA Linux. The training scripts auto-substitute `optim="adamw_torch"` on those backends.
`bf16` errors / NaN loss on macOS	`train_sft.py` auto-switches to `fp16` on MPS. If you see this on torch ≤ 2.3, upgrade torch.
`flash-attn` build fails	Skip it; the scripts auto-fall-back to `sdpa`. flash-attn is CUDA-only upstream.
MPS op-not-implemented error	Set `PYTORCH_ENABLE_MPS_FALLBACK=1` to fall through to CPU for that op, or use the `train-mlx` path instead.
HF Hub auth fails	`hf auth login` (or `huggingface-cli login`) and re-run, or set `HF_TOKEN` in `.env`.

related-skills.json

mismo repositorio

rocm-strix-halo.md

from "mybigday/ml-intern-kit"

Set up training and inference on AMD Ryzen AI Max+ 395 / Strix Halo (gfx1151, RDNA 3.5) with TheRock nightly ROCm wheels. Triggered when the host has gfx1151, when `rocminfo` shows Strix Halo, or when the user mentions Strix Halo / Ryzen AI Max / gfx1151 / 128GB unified memory.

2026-04-280

eval-model.md

from "mybigday/ml-intern-kit"

Evaluate a trained or downloaded language model with `lm-eval-harness` standard tasks (arc, hellaswag, gsm8k, mmlu, truthfulqa, ifeval, ...). Triggered when the user wants to benchmark, eval, or compare a model — pre- or post-training.

2026-04-280

inspect-dataset.md

from "mybigday/ml-intern-kit"

Audit a Hugging Face dataset before training to confirm splits, columns, format, sample rows, distributions, and duplicates. Triggered before any training/fine-tuning script runs, when a user mentions a new dataset, or when you hit a KeyError / format mismatch in a training job.

2026-04-280

launch-hf-job.md

from "mybigday/ml-intern-kit"

Submit a training/inference script to Hugging Face Jobs (`hf jobs run`). Triggered when the user wants to run training in the cloud, scale beyond local hardware, or kick off a multi-hour fine-tune. Enforces pre-flight: hub_model_id required, ≥2h timeout for training, single-job validation before batch.

2026-04-280

research-recipe.md

from "mybigday/ml-intern-kit"

Run a literature-first crawl before writing ANY ML training/fine-tuning/inference code. Spawns an Explore sub-agent that mines papers, citation graphs, methodology sections, and matched HF datasets to produce a ranked list of training recipes attributed to specific published results. Triggered when the user asks to fine-tune, train, or improve a model, or when the user names a task/benchmark and you need a recipe before coding.

2026-04-280

train-dpo.md

from "mybigday/ml-intern-kit"

Direct Preference Optimization (DPO) fine-tune with TRL `DPOTrainer`. Triggered when the user wants to align a model on preferences / pairwise comparisons / chosen-vs-rejected data, or improve an existing SFT checkpoint with a preference dataset.

2026-04-280

package.json

"author": "mybigday"

"repository": "mybigday/ml-intern-kit"

Abrir repositorio de GitHub Ver repositorios del creador

$ install --global

$ download --local

Ejecutar en Manus

$ useful --forSOC

Administradores de redes y sistemas informáticosOcupaciones informáticas y matemáticas15-1244L4

name	env-bootstrap
description	Recreate the ml-intern-kit Python environment on a new machine (laptop, rented GPU box, Docker, fresh checkout). Triggered when the user is on a new host, sees ImportError on a core dep (torch/transformers/trl/peft/accelerate), or wants to install flash-attn / unsloth / bitsandbytes after the fact.

env-bootstrap — recreate the env anywhere

Default path

bash bootstrap.sh           # auto-detects accelerator (cuda / rocm / mps / cpu)
bash bootstrap.sh --dev     # + pytest, ruff, jupyter
bash bootstrap.sh --eval    # + lm-eval-harness
bash bootstrap.sh --all     # everything except flash-attn / unsloth / mlx
source .venv/bin/activate
make doctor                 # prints torch backend, GPU info, mlx if installed

Force a specific accelerator

make env-cuda    # NVIDIA CUDA 12.4 wheels
make env-rocm    # AMD ROCm 6.2 wheels (gfx942 / gfx110X dGPUs only)
make env-strix-halo  # AMD Ryzen AI Max+ 395 / gfx1151 (TheRock nightly)
make env-mps     # macOS / Apple Silicon, torch-MPS only
make env-mlx     # macOS arm64: torch-MPS + Apple MLX
make env-cpu     # CPU-only wheels

bootstrap.sh picks the right PyTorch wheel index automatically: cu124 for NVIDIA, rocm6.2 for AMD, default (MPS) for arm64 macOS, cpu everywhere else.

When `uv` isn't installed

curl -LsSf https://astral.sh/uv/install.sh | sh

bootstrap.sh will pick it up automatically. If installing uv isn't an option, the script falls back to python3.11 -m venv + pip install -r requirements.txt.

When you need flash-attn or unsloth

These have CUDA / ABI requirements that vary per box, so they are deliberately not in requirements.txt. Install after the core stack works:

make flash      # flash-attn, requires matching CUDA toolkit
make unsloth    # unsloth, Linux only

If make flash fails with a build error, attn_implementation="sdpa" in the training config is a good fallback — slower than flash-attn-2 but works everywhere PyTorch does.

When you need a fully-pinned, identical env (Docker)

docker build -t ml-intern-kit:cu124 .
docker run --gpus all -it --rm \
    -v "$PWD":/workspace \
    -v "$HOME/.cache/huggingface":/root/.cache/huggingface \
    --env-file .env \
    ml-intern-kit:cu124 bash

The Dockerfile pins CUDA 12.4 + Python 3.11 + uv-resolved deps from pyproject.toml. Cache mounts let the host keep model/dataset downloads across container runs.

Common breakage on fresh boxes

Symptom	Fix
`torch.cuda.is_available()` is False on NVIDIA	Wrong wheel. `make env-cuda` re-pulls from `cu124` index.
`torch.cuda.is_available()` is False on AMD	ROCm uses the CUDA API surface — `torch.cuda.is_available()` should be True with a `torch.version.hip` set. If False: `make env-rocm`, then `rocminfo`.
`bitsandbytes` import error on macOS / ROCm	Expected — gated to CUDA Linux. The training scripts auto-substitute `optim="adamw_torch"` on those backends.
`bf16` errors / NaN loss on macOS	`train_sft.py` auto-switches to `fp16` on MPS. If you see this on torch ≤ 2.3, upgrade torch.
`flash-attn` build fails	Skip it; the scripts auto-fall-back to `sdpa`. flash-attn is CUDA-only upstream.
MPS op-not-implemented error	Set `PYTORCH_ENABLE_MPS_FALLBACK=1` to fall through to CPU for that op, or use the `train-mlx` path instead.
HF Hub auth fails	`hf auth login` (or `huggingface-cli login`) and re-run, or set `HF_TOKEN` in `.env`.

env-bootstrap

env-bootstrap — recreate the env anywhere

Default path

Force a specific accelerator

When uv isn't installed

When you need flash-attn or unsloth

When you need a fully-pinned, identical env (Docker)

Common breakage on fresh boxes

Más de este repositorio

Más de este repositorio

env-bootstrap — recreate the env anywhere

Default path

Force a specific accelerator

When uv isn't installed

When you need flash-attn or unsloth

When you need a fully-pinned, identical env (Docker)

Common breakage on fresh boxes

When `uv` isn't installed

When `uv` isn't installed