Run any Skill in Manus
with one click

Run any Skill in Manus with one click

tenstorrent

GitHub creator profile

tenstorrent

Repository-level view of 70 collected skills across 11 GitHub repositories.

github/tenstorrent largest repo: tt-metal

skills collected

70

repositories

11

updated

2026-07-21

repository map

Where the skills live

Top repositories by collected skill count, with their share of this creator catalog and occupation spread.

22 skills · 2026-07-21

Software DevelopersSoftware Quality Assurance Analysts & TestersComputer Occupations, All Other

3 occupation categories · 100% classified

12 skills · 2026-06-15

Software DevelopersSoftware Quality Assurance Analysts & TestersData Scientists

3 occupation categories · 100% classified

10 skills · 2026-07-17

Software Developers

1 occupation categories · 100% classified

8 skills · 2026-06-22

Software Developers

1 occupation categories · 100% classified

6 skills · 2026-03-31

Software DevelopersNetwork & Computer Systems Administrators

2 occupation categories · 100% classified

tt-inference-server

4 skills · 2026-07-17

Software Developers

1 occupation categories · 100% classified

4 skills · 2026-06-30

Software Developers

1 occupation categories · 100% classified

1 skills · 2026-03-02

Software Quality Assurance Analysts & Testers

1 occupation categories · 100% classified

Showing the top 8 repositories here; full repository list continues below.

repository explorer

Repositories and representative skills

creator/repo/skill

22 skills1.6k544updated 2026-07-21

skill

occupation

description

updated

quasar-perf-test

software-quality-assurance-analysts-and-testers

Create, extend, debug, and validate Quasar LLK performance tests and their PerfRunType kernel paths. Use when adding perf_[op]_quasar.py, wiring PerfConfig, implementing UNPACK_ISOLATE / MATH_ISOLATE / PACK_ISOLATE / L1_CONGESTION, or investigating implausible Quasar perf metrics and dvalid handshakes.

quasar-perf-test

software-developers

Create, extend, debug, and validate Quasar LLK performance tests and their PerfRunType kernel paths. Use when adding perf_[op]_quasar.py, wiring PerfConfig, implementing UNPACK_ISOLATE / MATH_ISOLATE / PACK_ISOLATE / L1_CONGESTION, or investigating implausible Quasar perf metrics and dvalid handshakes.

agentic-workflows

computer-occupations-all-other

Route gh-aw workflow design/create/debug/upgrade requests to the right prompts.

software-developers

Convert LLK lib/API tile-size args to ckernel::TensorShape and maintain TRISC TensorShape coverage. Use when adding TensorShape parameters, replacing face_r_dim/num_faces, editing LLK_VALIDATE_TENSOR_SHAPE_*, regenerating tensor_shape_coverage_*.h, or reviewing TensorShape PRs.

cfg-word-overlap-audit

software-developers

Audit LLK code for races on the backend CONFIG register file where differently-named fields share the SAME 32-bit config word — both cross-thread (unpack/math/pack write the same word) and intra-thread (a full-word write clobbers a sibling field the same thread set elsewhere). Use after adding/changing any ALU_FORMAT_SPEC / ALU_ACC_CTRL / ALU_ROUNDING_MODE / STACC_RELU / THCON_SEC* write, any WRCFG_32b/cfg[]= full-word write to a multi-field word, or any cfg_reg_rmw_tensix on a word another thread also touches.

instruction-latency-audit

software-developers

Audit hand-written Tensix/SFPU instruction sequences for missing pipeline-latency padding — where a dependent instruction consumes a multi-cycle-latency result before it is ready and a NOP (or independent-instruction spacing) is required. Use after touching any raw TTI_SFP*/TTI_* sequence, ckernel_sfpu_* kernels, or hand-assembled instruction streams. NOT a cross-thread race — an intra-thread micro-architectural hazard.

software-developers

Audit cross-core NoC synchronization in dataflow kernels — noc_semaphore_wait/set/inc balance and direction, multicast fan-out counts, and data-before-signal NoC ordering (noc_async_write_barrier / noc_async_writes_flushed before a remote credit). The half of dataflow that dataflow-cb-sync-audit (CB credits) does not reach. Use after touching reader/writer kernels, noc_semaphore_*, noc_async_*_barrier, or any cross-core handshake not expressed as a CB.

software-developers

Run all nine LLK hazard audits (mmio-race, reconfig-stall, cfg-word-overlap, semaphore-handshake, mailbox-sync, dataflow-cb-sync, srcreg-bank-sync, noc-sync, instruction-latency) across four synchronization surfaces, and add a cross-class JOIN pass that catches emergent races no single audit can see — where one audit's verdict is "safe because <invariant owned by another audit>". Use for a full hazard sweep of an LLK change, or before merging anything touching config writes, reconfig/uninit, inter-thread/cross-core sync, the SrcA/SrcB-Dst data path, or hand-written instruction sequences.

Showing top 8 of 22 collected skills in this repository.

12 skills7328updated 2026-06-15

skill

occupation

description

updated

compare-nightly

software-quality-assurance-analysts-and-testers

Compare two nightly GitHub Actions runs and classify failures as new, persisting, or fixed

pull-main-summary

software-developers

Stash local changes, pull latest main, and display a structured commit summary. For tt_forge_models uplift commits, shows the affected models from the submodule log.

sharding-model-analysis

data-scientists-152051

Analyze a PyTorch model, then design and implement an optimal multi-device sharding strategy for Tenstorrent hardware that minimizes collective-communication (CCL) ops. A strategy is both which parallelism to use (tensor / sequence / data) and how to shard under it - e.g. Megatron column→row is one of several tensor-parallel schemes. Use whenever the user asks to shard a model, distribute it across devices, write Shardy annotations, reduce CCLs, or analyze a model for tensor/sequence parallelism. Trigger even when the user only says "sharding strategy" or names a model with a device count, without asking for a full plan.

ci-benchmark-analyzer

software-developers

Analyze CI benchmark workflow runs from GitHub Actions for the tt-xla project. Produces a markdown report covering failed jobs (with root-cause error extraction via logs and Glean), successful model performance metrics (samples/sec, TTFT, device perf), perf regressions/improvements vs previous nightly, and the full dependency commit chain (tt-xla, tt-mlir, tt-metal). Use this skill whenever the user wants to analyze a CI run, review nightly benchmark results, investigate CI failures, check benchmark performance from a workflow run, or asks about "latest nightly" results. Also trigger when the user pastes a GitHub Actions run URL or mentions a run ID in the context of performance analysis, or asks about perf regressions.

software-developers

Review and fix language correctness and formatting in documentation files.

triage-dtype-bfloat16

software-quality-assurance-analysts-and-testers

Triage one tt-forge-models training test failing with a bfloat16 dtype-mismatch RuntimeError (e.g. "mat1 and mat2 must have the same dtype, but got Float and BFloat16", "'<op>' not implemented for 'BFloat16'"). For cross-dtype operands, attempts a minimal loader fix propagating `dtype_override` into the offending tensor constructor, then re-runs CPU + pytest and updates the YAML (passing -> EXPECTED_PASSING; new failure -> KNOWN_FAILURE_XFAIL). For op-not-implemented (no PyTorch kernel), goes straight to KNOWN_FAILURE_XFAIL with the verbatim error. Updates every training entry sharing the affected loader. Never edits inference YAML or `dynamic_loader.py`.

triage-unpack-forward-output

software-developers

Triage one tt-forge-models training test stuck at FAILED_FE_COMPILATION with reason "tt-forge-models doesn't implement unpack_forward_output for this model." Inspects the model's forward output, registers a handler or writes a per-loader override, and updates the YAML.

finding-missed-fusions

software-developers

Use when auditing a TTNN model's IR for missed op fusion opportunities — both direct TTNN fusions (a fused ttnn op already exists) and theoretical fusions (the pattern is a single kernel in torch/triton/cuda)

Showing top 8 of 12 collected skills in this repository.

10 skills41updated 2026-07-17

skill

occupation

description

updated

software-developers

Diagnose data-corruption failures in tt-emule operations (wrong values, partial zeros, off-by-N row/col writes). Use when a regression test reports ATOL > 1e-3 on otherwise-functional kernel paths.

software-developers

Registry of deliberate, non-ideal workarounds currently live in tt-emule (hacks we accepted to make something pass, NOT normal development practice). Read this before adding any new workaround, and when touching code that carries one — each entry records what the hack is, why it is not the right fix, the real root cause, and the path to removing it.

software-developers

Diagnose and fix emule regressions caused by bumping the tt-metal pin (and its bundled tt-umd submodule). Use when a pin bump turns the C++ or TTNN regression red — device-open crashes, JIT-compile errors, hangs, or new data mismatches — and you need to prove the cause and land a faithful emule-side fix.

software-developers

Verification checklist to walk before declaring a tt-emule mock complete. Use after implementing a stub (typically at the end of /implement-mock or /compute-llk-bringup) to catch the recurring failure modes — signature drift, no-op math, missing format dispatch, JIT-cache staleness, regression baseline drift.

shepherd-emule-pr

software-developers

Take an open tt-emule PR to a mergeable, ethos-compliant state. Use when asked to review a PR, address its comments, and prepare it to land — verifies every review point by build (not by trusting "addressed"), runs an emule-ethos pass for the bug classes reviewers miss, then rebases onto main locally and hands the developer a change+verification summary (with a drafted per-point comment and the exact push command) to review and push. The skill never force-pushes the author's branch itself — the human stays in the loop for that.

index-based-ops

software-developers

Bring up or debug index-based compute ops (TopK, Sort, Argmax) in tt-emule — ops that return values + indices, where ties make the index choice non-unique. Use when implementing one of these ops or triaging its value/index test failures.

compute-llk-bringup

software-developers

Specialization of `/implement-mock` for compute-kernel LLK shims (`<op>_tile` / `<op>_tile_init` under `include/jit_hw/api/compute/`). Use when bringing up a missing compute op or addressing its PCC failures.

software-developers

Workflow for adding a faithful mock of a silicon API to tt-emule. Composes `/arch-lookup` (find HW spec) → emule-mapping strategy selection → stub implementation → verification against sentinel + tt-metal regression.

Showing top 8 of 10 collected skills in this repository.

8 skills294139updated 2026-06-22

skill

occupation

description

updated

uplift-ttsim-ci

software-developers

Uplift the TTSim version used by tt-mlir CI and refresh WH/BH simulator skips. Use when updating `.github/workflows/call-test-ttsim.yml`, changing `ttsim-version`, validating WH or BH TTSim golden tests, or triaging simulator-specific pytest skips.

software-developers

How to add a new operation (op) to the tt-mlir compiler across all layers: TTIR/TTNN dialect definitions, StableHLO composite conversion, TTIR-to-TTNN conversion, EmitC/EmitPy conversions, flatbuffer schema and serialization, runtime implementation, OpModel, ttir_builder, golden functions, and all associated tests. Use this skill whenever the user asks to add an op, implement an op, create a new operation, add support for a TTNN op, or mentions adding an op to the compiler pipeline. Also trigger when the user wants to know what files to change for a new op, or asks about the op-adding workflow.

ttir-model-op-analysis

software-developers

Given a `.mlir` file (or a directory of `.mlir` files) with TTIR ops, run the same TTIR normalization passes as `D2MFrontendPipeline` before D2M, then produce per-file outputs: `preprocessed.mlir`, `ttir-op-report.txt` (op counts from normalized IR), and `ops.mlir` (one func per unique op configuration, golden-style). Optional: per-pass IR dumps.

ttir-decomposition-for-ttmetal

software-developers

Add a new composite op decomposition pattern to the TTMetal pipeline. Use when the user wants to decompose/lower a high-level TTIR op (e.g. rms_norm, sdpa, layer_norm, softmax) into primitive TTIR ops (matmul, add, multiply, etc.) for the D2M/TTMetal backend. Also trigger when the user mentions "decomposition pattern", "decompose op for ttmetal", or "lower op to primitives".

run-ops-mlir-snippets

software-developers

Compile and optionally execute every func.func in an ops.mlir-style snippet file (or every .mlir file in a directory) using `run_ops_mlir_snippets.py`. Use when the user wants to compile or run TTIR op snippets on device, test ops.mlir files, or check which ops compile/execute successfully.

add-ttir-d2m-lowering

software-developers

Elementwise TTIR→D2M→TTMetal path: tablegen, TTIRToD2M.cpp, D2MToTTKernel.cpp, and — only when the kernel API callee is new — TTKernelIncludesMap.h (per-op api/compute/eltwise_unary/*.h mapping for JIT). Does not edit D2MGenericRegionOps.cpp or TTKernelToCpp.cpp. Not for reductions, matmul, views, or CCL.

validate-tt-mlir-against-tt-xla

software-developers

Validate a tt-mlir PR against tt-xla by creating a cherry-picked branch and triggering CI. Invoked as: /validate-tt-mlir-against-tt-xla <PR number or URL>. Use this skill whenever the user wants to test, validate, qualify, or check a tt-mlir PR in tt-xla, or mentions running uplift qualification test suite, or asks to trigger tt-xla CI for a tt-mlir change. Also triggers when the user mentions "xla validate", "xla test", or "validate in xla".

add-ttir-builder-op

software-developers

Add full builder API support (@tag, @parse, @split) for a TTIR op. Use this skill whenever the user wants to add builder support for a new TTIR op, upgrade an existing _op_proxy-based op to use @tag/@parse/@split decorators, or asks about how to add builder API for an op in ttir_builder.py. Also trigger when the user mentions adding tag/parse/split for an op, or wants to make an op work with the parse/split test infrastructure.

6 skills33050updated 2026-03-31

8.6% of creator

skill

occupation

description

updated

software-developers

File a bug report with a reproducer against Tenstorrent repos (tt-lang, tt-metal, tt-xla)

tt-connect-remote-device

network-and-computer-systems-administrators

Set up and verify remote connection to Tenstorrent hardware. Provides tools for running kernels, copying files, and reading logs on remote devices.

tt-enable-tracing

software-developers

TTNN trace capture and replay for eliminating dispatch overhead. Essential for real-time inference and multi-chip performance.

tt-lang-profile-optimize

software-developers

Profile and optimize TT-Lang kernels for performance. Covers auto-profiling, perf summary, signposts, and optimization workflow.

software-developers

Comprehensive TT-Lang DSL reference including programming model, APIs, hardware constraints, and guides for translating CUDA, Triton, PyTorch, or TTNN kernels

software-developers

TTNN operations library reference for Tenstorrent hardware. Covers tensor APIs, ops catalog, model conversion from PyTorch, and memory/layout configuration.

tt-inference-server

4 skills6628updated 2026-07-17

5.7% of creator

skill

occupation

description

updated

add-model-dynamo

software-developers

Checklist for onboarding a new LLM to the cpp_server inference backend so it serves through Dynamo — registering the model type, fetching tokenizer files, tokenizer static data (eos/stop/think tokens), and Dynamo discovery (reasoning + tool-call parsers, generation_config publishing). Use whenever a new model is being onboarded to cpp_server, a HuggingFace model is being wired into the Dynamo deploy, or a model "loads but generates wrong / isn't discoverable / isn't selectable".

software-developers

Checklist for adding new code (models, workflows, runners, tests, CLI flags) to tt-inference-server-v2 and wiring it back into the v1 entry point run.py via workflows/v2_bridge.py. Use whenever code is being added or modified under tt-inference-server-v2/, when routing a model from v1 to v2, or when a v2 feature "works standalone but not through run.py".

profile-cpp-server

software-developers

Capture on-CPU and off-CPU flamegraphs of the running tt_media_server_cpp main (Drogon) and worker processes using Linux perf + Brendan Gregg's FlameGraph. Use when the user asks to profile the C++ server, find a performance bottleneck, identify slow code paths, capture a flamegraph, or investigate CPU usage / lock contention in BlazeRunner or the decode scheduler.

build-blaze-images

software-developers

Trigger Tenstorrent Blaze media server and tt-metal upstream Docker image builds with gh. Use when the user asks to build Blaze and Metal Docker images, run the tt-shield media server image workflow, or build tt-metal images from the tt-llm-engine submodule reference.

4 skills4914updated 2026-06-30

5.7% of creator

skill

occupation

description

updated

feature-branch-pr

software-developers

Make a change in TT-Studio the team way: branch off `dev` with a `<username>/<feature>` branch, keep the diff minimal and in-scope, verify the running app via its health endpoints before pushing, leave every other branch untouched, and open a PR against `dev` with a human, professional title and description. Use whenever the user asks to start a feature/fix/branch, make a change and open a pull request, or "do this properly on a new branch" in this repo. Enforces no AI-tool attribution in commits, PR text, or review comments.

license-attribution-compliance

software-developers

Review third-party license attribution for TT-Studio when dependencies or bundled assets change, and keep the attribution files in sync. Use when a PR adds/updates a dependency (app/backend/requirements.txt, app/agent, inference-api, docker-control-service, or app/frontend/package.json), bundles a binary/model/weights file, or when the user asks to "attribute a license", "update third-party-licenses", "check license compliance", or run the license check. Pairs with the deterministic gate dev-tools/check_license_attribution.py (freshness + new-dependency attribution) that runs on pull requests; this skill does the judgment the script cannot -- classifying licenses and flagging NonCommercial/copyleft/provenance risks.

tt-studio-debug-bundle

software-developers

Triage a TT-Studio bug-report ZIP / log bundle (named `tt-studio-logs-ttbr-*` or referenced as `ttbr-*`) to root-cause why a model deploy, board allocation, or Voice Agent pipeline failed. Use when the user shares a path to a TT-Studio log bundle, mentions a bug report ID like `ttbr-<hex>`, or attaches a screenshot of the TT-Studio Bug Report issue template. Fans out parallel Explore subagents across `app/backend/board_control/`, `app/backend/docker_control/`, `app/backend/shared_config/`, and `app/frontend/src/components/` to map log evidence to source code.

tt-studio-overview

software-developers

Project map for TT-Studio — what the platform is, where its docs live, its key components (React frontend, Django backend, FastAPI inference server, Docker), and the AI model types it supports. Use when you need a high-level orientation of the repo, want to know which doc covers a topic, or are trying to locate where a component or capability lives before diving into code.

1 skills5431updated 2026-03-02

1.4% of creator

skill

occupation

description

updated

llk-test-runner-skill

software-quality-assurance-analysts-and-testers

Delegates LLK test runs to the llk-test-runner agent using @.cursor/agents/llk-test-runner.md. Use when the user asks to run tests or mentions LLK tests. Ensure test commands run to completion before reading terminal output (no polling).

1 skills1618updated 2026-04-17

1.4% of creator

skill

occupation

description

updated

perf-benchmark-single-chip

software-developers

Run device performance benchmarks for tt-blacksmith single-chip training workloads. Use when the user asks to benchmark, profile, or measure performance of a training workload on Tenstorrent hardware, or mentions tracy, tt-perf-report, or device time analysis.

1 skills91updated 2026-05-27

1.4% of creator

skill

occupation

description

updated

software-developers

Guide the user through submitting a new entry to the Awesome Tenstorrent list. Use when the user wants to add, submit, or propose a project, library, tool, or resource to tt-awesome.

terraform-provider-netbox

1 skills00updated 2026-04-28

1.4% of creator

skill

occupation

description

updated

rebase-onto-upstream

software-developers

Rebases this Tenstorrent fork of terraform-provider-netbox onto the latest e-breuninger upstream and cuts a new release. Use when the user asks to rebase onto upstream, sync with upstream, pull in upstream changes, update from e-breuninger, or release a new version of the fork.

Showing 11 of 11 repositories

All repositories loaded

Run any Skill with one click