with one click
with one click
Triage one tt-forge-models training test failing with a bfloat16 dtype-mismatch RuntimeError (e.g. "mat1 and mat2 must have the same dtype, but got Float and BFloat16", "'<op>' not implemented for 'BFloat16'"). For cross-dtype operands, attempts a minimal loader fix propagating `dtype_override` into the offending tensor constructor, then re-runs CPU + pytest and updates the YAML (passing -> EXPECTED_PASSING; new failure -> KNOWN_FAILURE_XFAIL). For op-not-implemented (no PyTorch kernel), goes straight to KNOWN_FAILURE_XFAIL with the verbatim error. Updates every training entry sharing the affected loader. Never edits inference YAML or `dynamic_loader.py`.
Triage one tt-forge-models training test stuck at FAILED_FE_COMPILATION with reason "tt-forge-models doesn't implement unpack_forward_output for this model." Inspects the model's forward output, registers a handler or writes a per-loader override, and updates the YAML.
Analyze CI benchmark workflow runs from GitHub Actions for the tt-xla project. Produces a markdown report covering failed jobs (with root-cause error extraction via logs and Glean), successful model performance metrics (samples/sec, TTFT, device perf), perf regressions/improvements vs previous nightly, and the full dependency commit chain (tt-xla, tt-mlir, tt-metal). Use this skill whenever the user wants to analyze a CI run, review nightly benchmark results, investigate CI failures, check benchmark performance from a workflow run, or asks about "latest nightly" results. Also trigger when the user pastes a GitHub Actions run URL or mentions a run ID in the context of performance analysis, or asks about perf regressions.
Use when auditing a TTNN model's IR for missed op fusion opportunities — both direct TTNN fusions (a fused ttnn op already exists) and theoretical fusions (the pattern is a single kernel in torch/triton/cuda)
Analyzes, debugs and proposes fixes for graph breaks in PyTorch/XLA model compilation. Use when a model generates more graphs than expected during compilation, the user mentions "graph break", or when debugging excessive graph generation in tt-xla pipelines.
Code review skill specialized for tt-xla (Python + C++ PJRT plugin for Tenstorrent hardware). Covers C++ memory safety, PJRT API patterns, Python test standards, and project-specific conventions.
| name | analyze-nightly |
| description | Analyze a GitHub Actions run and summarize failures |
| disable-model-invocation | false |
| allowed-tools | Read, Read(/tmp/**), Write(/tmp/**), Glob, Grep, Bash(git clone *), Bash(gh run view *), Bash(gh run list *), Bash(gh run download *), Bash(gh pr view *), Bash(tee *), Bash(gh api *), Bash(gh api * > /tmp/**), Bash(wc -l /tmp/**), Bash(jq *), Bash(mkdir -p /tmp/**), Bash(rm -rf /tmp/**), Bash(for *) |
| context | fork |
| argument-hint | run-id [save] |
| model | opus |
Create a summary of test failures or job failures, grouped by ownership area:
Analyze the run with run-id $0 and create a summary of test failures:
gh CLI tool.
Run gh run view $0 --json jobs --jq '.jobs[].url' to fetch all job URLs,
which have the following format, from which you can extract {job-id}:
https://github.com/tenstorrent/tt-xla/actions/runs/{run-id}/job/{job-id}gh api subcommand. Discard any job
that was successfully completed, canceled, skipped, or is still in progress.
Focus only on jobs that failed!gh CLI tool. Analyze the logs in search for error messages,
failure messages, timeout messages, or any other text indicating a root cause
for the failure of that step of the job. Keywords to look for are error,
assert, assertion, failed, throw, failure, fatal, timeout, timed out, HTTP
error codes from 400 and 500 range, Linux exit codes corresponding to signals
processes can receive, etc. Don't limit yourself to only these keywords, there
may be others that indicate a root cause, these are just the most common ones.Output format that you need to follow (raw Markdown text):
# {ownership-area-name}
## {root-cause-1}
- {test-or-step-name} ({arch-list}) -> [job-link]({url})
- {test-or-step-name} ({arch-list}) -> [job-link]({url})
## {root-cause-2}
- {test-or-step-name} ({arch-list}) -> [job-link]({url})
- {test-or-step-name} ({arch-list}) -> [job-link]({url})
# {ownership-area-name}
## {root-cause-1}
- {test-or-step-name} ({arch-list}) -> [job-link]({url})
- {test-or-step-name} ({arch-list}) -> [job-link]({url})
## {root-cause-2}
- {test-or-step-name} ({arch-list}) -> [job-link]({url})
- {test-or-step-name} ({arch-list}) -> [job-link]({url})
After producing the Markdown summary:
Analysis written to: /tmp/nightly-analysis-$0.mdAlways respect these additional constraints:
gh {subcommand} commands that may modify
the state of the GitHub repository (for example issue creation/deletion,
PR closing, branch manipulation etc.), especially when using the gh api
subcommand. Always use only read-only calls!