with one click
nemo-speech-asr-finetune
// Guide NeMo Speech users through ASR fine-tuning with container setup and Lhotse training.
// Guide NeMo Speech users through ASR fine-tuning with container setup and Lhotse training.
| name | nemo-speech-asr-finetune |
| description | Guide NeMo Speech users through ASR fine-tuning with container setup and Lhotse training. |
Use this skill when a user wants to fine-tune a NeMo Speech ASR model, choose a checkpoint, adapt a tokenizer,
configure Lhotse dataloading, train, average checkpoints, or evaluate a fine-tuned ASR .nemo checkpoint.
Also use it for post-run refinement planning after fine-tuning.
Default posture:
trainer.max_steps, not trainer.max_epochs.val_wer as the checkpoint monitor for validation.Load only the reference file needed for the current stage:
references/setup-checkpoints.md.references/data-lhotse.md.references/architecture-tokenizer-metrics.md.references/training-evaluation.md and, when reporting WER,
references/evaluation-style-contract.md.references/refinement-iteration.md.If the user explicitly asks for parallel/sub-agent work, split the work by these same stages. Keep each agent scoped to one stage and have the main agent integrate the final command/config.
Generic fine-tuning uses examples/asr/speech_to_text_finetune.py. For architecture-specific recipes, route to:
examples/asr/asr_ctc/speech_to_text_ctc_bpe.pyexamples/asr/asr_transducer/speech_to_text_rnnt_bpe.pyexamples/asr/asr_hybrid_transducer_ctc/speech_to_text_hybrid_rnnt_ctc_bpe.pyexamples/asr/speech_multitask/speech_to_text_aed.pyAlways check the current repo docs before giving version-sensitive claims:
README.mddocs/source/asr/fine_tuning.rstdocs/source/asr/datasets.rstdocs/source/dataloaders.rstdocs/source/asr/featured_models.rstdocs/source/asr/asr_checkpoints.rstnemo/collections/common/data/lhotse/dataloader.pybatch_size=null, batch_duration=null, and quadratic_duration=null when adding bucket_batch_size.model.validation_ds.use_lhotse=true, but prefer static validation batch_size with bucketing disabled.fused_batch_size for RNNT/TDT fine-tuning guidance from this skill.--memory-fraction only after a real training OOM.1e-4.min_duration, max_duration, min_tps, and max_tps.amp=true for inference/evaluation; use amp=false compute_dtype=bfloat16.multitask_metrics_cfg so ASR and translation/task-specific samples are evaluated with
the right constrained metrics.Get a pull request to green CI. Diagnose and fix CI failures, push fixes, re-trigger CI via the "Run CICD" label, and repeat until all checks pass. Does not post comments — this is a local developer tool.
Fix a GitHub issue in NeMo Speech (NVIDIA-NeMo/NeMo). Read the issue, reproduce the bug with a failing test, implement the fix, and verify tests pass. Only opens a PR if the user explicitly asks for it.
Run style checks and tests on changed files to verify code quality before committing.
Debug distributed training failures (NeMo, Megatron, PyTorch) from worker stderr logs and optional AIStore daemon logs. Finds root cause across NCCL timeouts, data loading errors, and storage failures.