NVIDIA-NeMo

Fine-tune models on NeMo Platform with `automodel`, `unsloth`, or `rl` (all `submit`-only): HF dataset conversion, filesets, model entities, and job JSON (hyperparameters, batch, schedule, optimizer) + job polling. `automodel`/`unsloth` run SFT/LoRA as Docker GPU jobs; `rl` runs DPO (preference optimization) on a Ray cluster (Kubernetes). Use for train, fine-tune, customize, SFT, LoRA, DPO, preference optimization, learning rate, epochs, or nemo customization.

creating-a-plugin

Creates a new NeMo plugin from scratch. Use when starting plugin development, setting up a plugin package, registering surfaces via entry points, or asking how plugins are discovered by the platform. Trigger keywords: create plugin, new plugin, plugin setup, entry-points, plugin structure, get started, plugin discovered, entry point.

plugin-authz

Declares HTTP authorization on NeMo Platform plugin routes with @path_rule, AuthzScope, PermissionSet, and CallerKind. Use when adding or securing a plugin route, minting permission ids, choosing PRINCIPAL vs SERVICE_PRINCIPAL callers, fixing a hard_fail bundle build from an unruled route, or migrating off get_authz_contribution. Trigger keywords: authz, authorization, permission, path_rule, AuthzScope, PermissionSet, perm, CallerKind, SERVICE_PRINCIPAL, scope, hard_fail, on_invalid_plugin, unruled route, bundle build, get_authz_contribution.

plugin-controller

Creates background reconcile-loop controllers using NemoController. Use when implementing state-machine reconciliation, running periodic background work, managing deployment lifecycle, building service-principal entity clients for background use, or understanding controller startup/shutdown sequence. Trigger keywords: controller, NemoController, reconcile, background loop, reconcile_one, list_objects, on_startup, state machine, deployment lifecycle, service principal, interval_seconds.

plugin-function

Creates in-process NemoFunction surfaces for NeMo Platform plugins. Use when adding a function, declaring spec_schema, mounting function routes with add_function_routes, understanding the two CLI verbs (run / submit), or streaming NDJSON frames. Trigger keywords - function, NemoFunction, spec_schema, add_function_routes, nemo_platform_plugin.functions, two verbs, run, submit, streaming, NDJSON, FunctionContext.

plugin-job

Creates schedulable NemoJob surfaces for NeMo Platform plugins. Use when adding a job, declaring spec_schema / input_spec_schema / to_spec / compile, mounting job routes with add_job_routes, understanding the three CLI verbs (run / submit / explain), or running jobs in containers. Trigger keywords - job, NemoJob, spec_schema, input_spec_schema, to_spec, compile, add_job_routes, nemo_platform_plugin.jobs, three verbs, run, submit, explain, NemoJobScheduler.

plugin-service

Builds HTTP service surfaces for NeMo Platform plugins using NemoService, RouterSpec, NemoListResponse, and NemoFilter. Use when adding REST API routes to a plugin, implementing CRUD endpoints, handling pagination and filtering, or testing FastAPI routes. Trigger keywords: HTTP routes, REST API, FastAPI, CRUD, endpoint, router, NemoService, pagination, filter, list endpoint, NemoListResponse, RouterSpec.

11 skills1.0k225updated 2026-06-25

Showing top 8 of 54 collected skills in this repository.

#002

Gym

8.5% of creator

skill

occupation

description

updated

add-benchmark

Guide for adding a new benchmark or training environment to NeMo-Gym. Use when the user asks to add, create, or integrate a benchmark, evaluation, training environment, or resources server into NeMo-Gym. Also use when wrapping an existing 3rd-party benchmark library. Covers the full workflow: data preparation, resources server implementation, agent wiring, YAML config, testing, and reward profiling (baselining). Triggered by: "add benchmark", "new resources server", "integrate benchmark", "wrap benchmark", "add training environment", "add eval".

nemo-gym-debugging

Use when debugging a Nemo Gym run or reward profiling job. Covers rollout collection failures, empty or partial JSONL outputs, stale materialized inputs, verifier/schema errors, Ray or Slurm issues, vLLM readiness, judge failures, tool/sandbox failures, cache problems, and throughput bottlenecks.

nemo-gym-reward-profiling

nemo-gym-reward-profiling

Use to help users get started with Nemo Gym reward profiling. Covers the basic gym env start, gym eval run, and gym eval profile workflow, repeated rollouts, materialized inputs, rollout JSONL artifacts, task and rollout identity, output inspection, partial profiling, and rollout_infos. For failed jobs, prefer nemo-gym-debugging.

Manage stacked branches and pull requests with the gh-stack GitHub CLI extension. Use when the user wants to create, push, rebase, sync, navigate, or view stacks of dependent PRs. Triggers on tasks involving stacked diffs, dependent pull requests, branch chains, or incremental code review workflows.

2026-06-23

nemo-gym-blade-analysis

data-scientists-152051

Use when analyzing NeMo Gym benchmark rollouts for BLADE-style reports, writing benchmark methodology notes, checking whether a benchmark is BLADE-ready, comparing model runs, or explaining why a benchmark report passed, failed, or changed. Covers aggregate metrics, rollout evidence, report structure, root-cause taxonomy, judge expectations, and improvement recommendations. For generic reward profiling commands, prefer nemo-gym-reward-profiling; for failed infrastructure jobs, prefer nemo-gym-debugging.

2026-06-22

nemo-gym-blade-analysis

data-scientists-152051

2026-06-22

Showing top 8 of 11 collected skills in this repository.

#003

Megatron-Bridge

10 skills783405updated 2026-07-10

7.8% of creator

skill

occupation

description

updated

testing

Testing reference for Megatron Bridge — unit and functional test layout, tier semantics (L0/L1/L2/flaky), script conventions, running tests locally, adding/moving/disabling tests, and pytest conventions.

2026-07-10

review-pr

Structured single-agent code review workflow for PRs, commits, and local diffs. Use when asked to review code, understand a PR, rubber duck a change, prepare GitHub review comments, compare a change against Megatron Bridge conventions, or produce high-signal findings without subagents or tmux.

2026-07-10

adding-model-support

Guide for adding support for new LLM or VLM models in Megatron-Bridge. Covers bridge, provider, recipe, tests, docs, and examples.

2026-07-08

bump-dependency

Bump a pinned dependency (TransformerEngine, Megatron-LM, NRX, etc.), regenerate the lockfile, open a PR, and drive it to green by attaching a watchdog to the "CICD NeMo" workflow and quarantining failing functional tests as flaky until the run is green.

2026-06-22

build-and-dependency

Dev environment setup for Megatron Bridge — container-based development, uv package management, lockfile regeneration, adding dependencies, Slurm container usage, and common build pitfalls.

2026-06-01

cicd

CI/CD reference for Megatron Bridge — pipeline structure, commit and PR workflow, CI failure investigation, and common failure patterns.

2026-06-01

verl-e2e-testing

External verl end-to-end validation workflow for Megatron-Bridge changes. Covers running a small verl Megatron backend job from a Bridge checkout, choosing LoRA/DDP plus optional save/resume and parallelism variants, setting PYTHONPATH so verl imports the local Bridge tree, and reporting pass/fail evidence.

2026-05-31

nemo-rl-e2e-testing

9 skills1.8k469updated 2026-06-25

External NeMo-RL end-to-end validation workflow for Megatron-Bridge model/provider changes, including downstream compatibility checks, external RL lifecycle behavior, Megatron policy setup, HF import/export, checkpoint/resume, non-colocated vLLM refit, delta weight transfer, optional LoRA/generation variants, and questions such as "does this model work in NeMo-RL", "run NeMo-RL e2e", or "external RL loop validation". Covers running NeMo-RL Megatron policy jobs from a Bridge checkout, choosing GRPO/SFT/checkpoint/non-colocated refit variants, setting PYTHONPATH so NeMo-RL imports the local Bridge tree, and reporting pass/fail evidence.

2026-05-29

Showing top 8 of 10 collected skills in this repository.

#004

7.0% of creator

skill

occupation

description

updated

config-conventions

Configuration conventions for NeMo-RL. YAML is the single source of truth for defaults. Covers BaseModel/TypedDict usage, dataclass for internal classes, exemplar YAML updates, and forbidden default patterns.

error-handling

Error handling guidelines for NeMo-RL. Covers exception specificity, minimal try bodies, and else blocks.

linting-and-formatting

Code style guidelines for NeMo-RL (Python and shell). Covers naming, indentation, comments, docstrings, reflection avoidance, and uv usage.

review-pr

Interactive code review for NVIDIA-NeMo/RL pull requests. Checks out PR locally, reads existing comments, applies coding guidelines from skills, previews findings, and posts review comments. Also supports reviewing the current branch locally.

build-and-dependency

Build and dependency management for NeMo-RL. Covers Docker image building and running, uv usage, venv setup, and adding dependencies.

cicd

CI/CD reference for NeMo-RL. Covers GitHub Actions pipeline structure, CI triggering via /ok to test, and CI failure investigation.

contributing

Contribution conventions for NeMo-RL. Covers PR title format, commit sign-off, and CI triggering.

NVIDIA copyright header requirements for NeMo-RL. Covers which files need headers and the exact header text.

9 skills12520updated 2026-07-08

Showing top 8 of 9 collected skills in this repository.

#005

Switchyard

7.0% of creator

skill

occupation

description

updated

switchyard-lib-core

switchyard-coding-agent-launchers

Use when adding, modifying, refactoring, renaming, restructuring, deprecating, or reviewing anything under `switchyard/lib/` — profiles, request/response processors, backends, translators, stats collection, intake, telemetry, observability, routing decisions, or CLI wiring that builds a runnable profile. Triggers on phrases like "add a profile", "new processor", "new backend", "wire stats", "add a preset", "track per-tier …", "intake telemetry", "rename random_routing", "refactor construction", or any edit to `switchyard/lib/profiles/`, `processors/`, `backends/`, `translators/`, or route-table code.

2026-07-08

Use when editing `switchyard launch claude` / `switchyard launch codex` / `switchyard launch openclaw` or anything under `switchyard/cli/launchers/`, `switchyard/cli/launch_command.py`, `switchyard/cli/routing/`, `switchyard/cli/model_catalog/`, or `switchyard/cli/config/user_config.py`. Triggers on "modify the claude launcher", "wire codex to a new model", "add an openclaw launcher flag", "saved user defaults", "routing profiles", or "first-run configure".

2026-07-06

switchyard-pr-reviewer

Multi-mode, adversarially-verified PR review for switchyard that drafts inline comments the maintainers' way. Use when asked to "review this PR", "do a live/code review", "walk me through this PR and flag anything", "check correctness / tests / design vs the ticket", "is this serious / blocking?", "review the Rust changes", or "post comments on a PR". Sequences correctness, test-quality, ticket-coherence, simplify, docs, stale-comment, and Rust-craft modes; dispatches Rust to rust-code-reviewer. Every phase has a self-contained manual path; when the host CLI provides them, built-in slash skills are optional accelerators. Drafts first, posts only the approved subset, never auto-posts.

2026-07-06

publish-package

Build and publish Switchyard packages through the current OSS-style GitHub release path. Use when asked to publish, release, ship, tag, cut a version, build wheel artifacts, or prepare a package release.

switchyard-testing-ci

Use when validating a Switchyard change, preparing or reviewing a PR, debugging CI failures (ruff, mypy, SPDX, pytest, slim-install), choosing which local tests to run, or diagnosing dependency, optional-extra, stale-name, CLI, server, translation, routing, stats, or live e2e failures. Triggers on phrases like "is this ready to merge", "CI is failing", "which tests should I run", or "how do I reproduce this locally".

run-pre-merge-checks

Run the live end-to-end production tests against the real NVIDIA Inference Hub backend, then produce a screenshot-ready summary that can be attached to an MR. Use this skill when asked to run pre-merge checks, run the e2e tests, validate before merging, or generate an MR test summary.

rust-code-reviewer

Use this agent for a strict Rust code review. Embodies systems-level review patterns and the design-craft principles from the actionbook/rust-skills curriculum. Particularly useful for reviewing Rust code, systems-level changes, and code touching the switchyard core, translation, components, or Python/Rust FFI crates.

switchyard-codebase-exploration

Use when modifying, debugging, reviewing, refactoring, renaming, restructuring, or planning any Switchyard change — even if the user only names a symptom, test failure, CLI flag, endpoint, model route, or file path. Triggers on phrases like "fix this bug", "why is X failing", "where does Y live", "add a new profile", "rename X to Y", "refactor this", "explain how X works", or any request that will touch `switchyard/`, `tests/`, or `examples/`. Forces a fresh impact map before editing so agents do not edit from stale memory, and routes the agent to the matching workstream skill (`switchyard-lib-core`, `switchyard-coding-agent-launchers`, `switchyard-testing-ci`) before reading source.

8 skills1.7k337updated 2026-07-08

Showing top 8 of 9 collected skills in this repository.

#006

Nemotron

6.2% of creator

skill

occupation

description

updated

nemotron-3-ultra-text2sql-lora

Run the Nemotron-3 Ultra Text2SQL LoRA fine-tuning tutorial (NeMo Megatron-Bridge) end-to-end for the user on their SLURM cluster: data prep, distributed checkpoint conversion, and packed LoRA fine-tuning of the 550B hybrid Mamba-Transformer MoE, ending at a saved adapter. Use when the user wants to run this cookbook, fine-tune Nemotron-3 Ultra with LoRA, or adapt the notebook to their own cluster.

2026-07-08

nemotron-ultra

nemotron-customizer-airgap

Reference desk for NVIDIA Nemotron 3 Ultra (550B-A55B) — architecture, NVFP4 pretraining, SFT, MOPD (multi-teacher on-policy distillation), MTP boosting, quantization, inference. Use when the user asks facts about Ultra rather than building a pipeline.

2026-06-04

network-and-computer-systems-administrators

Prepare, validate, build, and use Nemotron Customizer airgap image bundles for offline clusters. Use when planning airgapped deployments, editing deploy/nemotron-customizer/airgap/airgap.yaml, selecting workflow targets, grouping step execution images, baking repo overlays or wheel additions, resuming airgap runner builds, or submitting `nemotron steps run` jobs inside an airgapped environment.

nemotron-add-model

Onboard a new model family (Nemotron or third-party) into skills/ — paper chunks, recipe summaries, context packs, and model card. Use when a contributor wants downstream skills like /nemotron-customize to be able to route to a new model.

nemotron-add-pattern

Add a cross-cutting decision pattern under src/nemotron/steps/patterns/. Use when a recurring ML decision (tokenizer lock, eval bookends, LoRA-on-small-data, etc.) must be encoded so other skills can fire it during planning.

nemotron-add-step

Add a new step under src/nemotron/steps/<category>/<step_id>/ — manifest (step.toml), runner glue, configs, and per-step README.md. Use when extending the catalog so /nemotron-customize can route to it.

nemotron-nano3

Reference desk for Nemotron 3 Nano / Llama-Nemotron Nano 3 — architecture, training data, recipes, evaluation, quantization, deployment. Use when the user asks facts about the model rather than building a pipeline.

nemotron-super3

Reference desk for NVIDIA Nemotron 3 Super — architecture, training data, recipes (pretrain/SFT/RL/eval/quantization), and deployment notes. Use when the user asks facts about Super3 rather than building a pipeline.

7 skills2.1k194updated 2026-06-15

#007

DataDesigner

5.4% of creator

skill

occupation

description

updated

datadesigner-docs

Maintain the NeMo Data Designer Fern docs site under fern/. Use for any documentation change. Triggered by: "edit docs", "add doc page", "update docs", "rename page", "fix broken link", "add redirect", "preview docs", "publish docs", "regenerate notebooks", "update dev note", any request that touches `fern/`.

2026-06-15

review-code

Perform a thorough code review of the current branch or a GitHub PR by number.

2026-06-15

search-docs

Search local Fern documentation for content related to a topic

2026-06-15

create-pr

Create a GitHub PR with a well-formatted description matching the repository PR template (flat Changes by default; optional Added/Changed/Removed/Fixed grouping)

2026-04-06

commit

Commit current changes with a clear, descriptive message

2026-03-25

search-github

Search GitHub issues, discussions, and PRs for content related to a topic

2026-03-25

update-pr

6 skills699212updated 2026-07-09

Update an existing GitHub PR description to reflect current changes after incorporating feedback

2026-03-25

#008

Automodel

4.7% of creator

skill

occupation

description

updated

linting-and-formatting

network-and-computer-systems-administrators

Code style and quality rules for NeMo AutoModel — ruff configuration, naming conventions, type hints, docstrings, copyright headers, and the code review checklist.

2026-07-09

build-and-dependency

Dev environment setup for NeMo AutoModel — container-based development, uv package management, installation options, environment variables, and common build pitfalls.

2026-06-28

fern-docs

Maintain the NeMo AutoModel Fern docs site under docs/ (MDX content) + docs/fern/ (infra) — add, update, move, or remove pages; manage redirects, slugs, navigation, and version aliases; run validation and previews.

2026-06-27

cicd

CI/CD reference for NeMo AutoModel — pipeline structure, commit and PR workflow, CI failure investigation, and common failure patterns.

parity-testing

Verify numerical parity between NeMo AutoModel implementations and reference HuggingFace models, including state dict and forward-pass checks.

testing

Testing reference for NeMo AutoModel — unit and functional test layout, tier semantics (L0/L1/L2), running tests locally, adding or disabling tests, and pytest conventions.

5 skills17.8k3.5kupdated 2026-05-28

#009

Speech

3.9% of creator

skill

occupation

description

updated

nemo-speech-asr-finetune

Guide NeMo Speech users through ASR fine-tuning with container setup and Lhotse training.

babysit-pr

Get a pull request to green CI. Diagnose and fix CI failures, push fixes, re-trigger CI via the "Run CICD" label, and repeat until all checks pass. Does not post comments — this is a local developer tool.

2026-04-17

fix-issue

Fix a GitHub issue in NeMo Speech (NVIDIA-NeMo/NeMo). Read the issue, reproduce the bug with a failing test, implement the fix, and verify tests pass. Only opens a PR if the user explicitly asks for it.

2026-04-17

verify

Run style checks and tests on changed files to verify code quality before committing.

2026-04-17

debug-training-logs

4 skills336updated 2026-06-29

Debug distributed training failures (NeMo, Megatron, PyTorch) from worker stderr logs and optional AIStore daemon logs. Finds root cause across NCCL timeouts, data loading errors, and storage failures.

2026-04-15

#010

Safe-Synthesizer

3.1% of creator

skill

occupation

description

updated

safe-synthesizer

Use NeMo Safe Synthesizer through task-specific routing: running the CLI or SDK, configuring parameters, troubleshooting runtime failures, inspecting artifacts, and interpreting evaluation outputs. Use when the user asks about safe-synthesizer, NeMo Safe Synthesizer, synthetic data pipeline runs, DP settings, generation failures, artifacts, logs, offline/GPU setup, config overrides, or evaluation metrics.

2026-06-29

github-cli

Interact with the Safe-Synthesizer GitHub repository using the gh CLI. Activate when users want to list or create pull requests, check out PRs, work on someone else's PR, check CI status, investigate workflow failures, view job logs, create or triage issues, check review and approval status, manage releases, or inspect repo metadata. Trigger keywords - pull request, PR, issue, workflow, CI, actions, failed job, job log, release, review, approve, CODEOWNERS, labels, milestone, checkout, gh, GitHub.

2026-06-03

uv-build

uv package management, dependency groups, PyTorch index handling, hatch build system, and versioning for this repo. Triggers on: uv, uv sync, uv lock, uv add, uv build, dependency, pyproject.toml, extras, cpu, cu129, hatch, wheel, version, publish.

2026-06-03

git-worktrees

2 skills6.7k765updated 2026-06-24

Create, manage, and clean up git worktrees for isolated development, PR review, and A/B testing of agent configurations. Trigger keywords - worktree, worktrees, git worktree, parallel branches, isolated workspace, worktree cleanup, worktree prune, PR review, address PR comments, work on branch, work on PR.

2026-05-14

#011

Guardrails

1.6% of creator

skill

occupation

description

updated

guardrails-developer-create-guardrails

guardrails-developer-guide

Helps developers create a NeMo Guardrails configuration for an LLM application. Use when users want to build, scaffold, configure, test, or iterate on input, output, retrieval, dialog, execution, Colang, or catalog-based guardrails. Trigger keywords - create guardrails, build guardrails, scaffold config, write rails, create config.yml, add input rails, add output rails, Colang flow, guardrails config, test guardrails.

2026-06-24

1 skills1.7k299updated 2026-07-07

Routes NVIDIA NeMo Guardrails library product-usage questions to the canonical documentation. Use when users ask how to install, configure, integrate, evaluate, observe, deploy, troubleshoot, or use the NVIDIA NeMo Guardrails library. Trigger keywords - install guardrails, configure rails, guardrail catalog, Colang, Python API, LangChain, LangGraph, server, evaluate guardrails, tracing, metrics, Docker, troubleshooting.

2026-06-24

#012

Curator

0.8% of creator

skill

occupation

description

updated

nemo-curator-docs