Write a ForgeModel-compatible loader for a HuggingFace model, validate it on CPU, and push the result to a branch on tenstorrent/tt-forge-models.
Install tt-forge, run the model loader from the cpu bringup branch on Tenstorrent hardware, iterate on failures, and open a PR to tenstorrent/tt-forge-models on success.
File a bug report with a reproducer against Tenstorrent repos (tt-lang, tt-metal, tt-xla)
Set up and verify remote connection to Tenstorrent hardware. Provides tools for running kernels, copying files, and reading logs on remote devices.
TTNN trace capture and replay for eliminating dispatch overhead. Essential for real-time inference and multi-chip performance.
Profile and optimize TT-Lang kernels for performance. Covers auto-profiling, perf summary, signposts, and optimization workflow.
Comprehensive TT-Lang DSL reference including programming model, APIs, hardware constraints, and guides for translating CUDA, Triton, PyTorch, or TTNN kernels
TTNN operations library reference for Tenstorrent hardware. Covers tensor APIs, ops catalog, model conversion from PyTorch, and memory/layout configuration.