Profile GPU workloads with NVIDIA Nsight Systems (nsys). Use this skill when the user says 'profile', 'nsys', 'capture a trace', 'GPU profiling', 'kernel breakdown', 'where is time spent', 'decode is slow', 'prefill is slow', 'TPOT regression', 'TTFT regression', or wants to understand where GPU time is going. Also trigger when the user pastes nsys output and asks for help interpreting it. This skill covers the full lifecycle: capturing traces with the right flags, analyzing kernel/API summaries, detecting tail latency problems, and diagnosing performance regressions.
Track any non-trivial task through a doc placed in the right domain directory under docs/ (models/<line>/, subsystems/<area>/, playbooks/, lessons/, etc.). Covers the full lifecycle: read docs first, write a plan, wait for approval, log execution, and capture lessons learned. Use this skill whenever the user gives a concrete task โ not just Q&A or casual discussion. Even if the task seems straightforward, if it involves doing something (not just answering something), open or reuse a domain doc. Bias toward using this skill. When in doubt, use it.