with one click
model-bringup-cpu
// Write a ForgeModel-compatible loader for a HuggingFace model, validate it on CPU, and push the result to a branch on tenstorrent/tt-forge-models.
// Write a ForgeModel-compatible loader for a HuggingFace model, validate it on CPU, and push the result to a branch on tenstorrent/tt-forge-models.
Install tt-forge, run the model loader from the cpu bringup branch on Tenstorrent hardware, iterate on failures, and open a PR to tenstorrent/tt-forge-models on success.
File a bug report with a reproducer against Tenstorrent repos (tt-lang, tt-metal, tt-xla)
Set up and verify remote connection to Tenstorrent hardware. Provides tools for running kernels, copying files, and reading logs on remote devices.
TTNN trace capture and replay for eliminating dispatch overhead. Essential for real-time inference and multi-chip performance.
Profile and optimize TT-Lang kernels for performance. Covers auto-profiling, perf summary, signposts, and optimization workflow.
Comprehensive TT-Lang DSL reference including programming model, APIs, hardware constraints, and guides for translating CUDA, Triton, PyTorch, or TTNN kernels
| name | model-bringup-cpu |
| description | Write a ForgeModel-compatible loader for a HuggingFace model, validate it on CPU, and push the result to a branch on tenstorrent/tt-forge-models. |
| argument-hint | <model_id> <branch_name> [--report <path>] |
You are running inside a GitHub Actions job on a Tenstorrent machine (Ubuntu 24.04 container). The gh CLI is already authenticated with a token that has write access to tenstorrent/tt-forge-models. Git identity is pre-configured.
Parse from the invocation line before proceeding:
| Argument | Example | Required |
|---|---|---|
<model_id> | meta-llama/Llama-3.2-1B | yes |
<branch_name> | claude/bringup-llama-3-2-1b | yes |
--report <path> | --report /github/workspace/report.md | no |
REPORT_PATH — resolved in this order: (1) $GITHUB_WORKSPACE/$REPORT_FILE if both env vars are set, (2) the --report <path> argument if present, (3) ./bringup-report-cpu.md as a local fallback.
STATUS_FILE — resolved as $GITHUB_WORKSPACE/$STATUS_FILE if both env vars are set, otherwise falls back to $GITHUB_WORKSPACE/bringup-cpu-status.txt. The workflow reads this path to decide whether to commit and push.
FORGE_MODELS_DIR — read from the $FORGE_MODELS_DIR environment variable (set by the workflow to $GITHUB_WORKSPACE/tt-xla/third_party/tt_forge_models). tt-forge-models is already checked out here as a submodule by the workflow.
TT_XLA_DIR — read from the $TT_XLA_DIR environment variable (set by the workflow to $GITHUB_WORKSPACE/tt-xla).
Activate the tt-xla venv before every Python or pip command:
source "$TT_XLA_DIR/venv/activate"
The venv is pre-created by the workflow setup step and has torch, jax, and the latest tt-forge wheel installed.
Read these files to understand what you must implement:
$FORGE_MODELS_DIR/base.py — the ForgeModel abstract base class (load_model, load_inputs, _get_model_info)$FORGE_MODELS_DIR/config.py — ModelVariant/StrEnum, ModelGroup, ModelTask, ModelSource, Framework, LLMModelConfig$FORGE_MODELS_DIR/README.md — directory structure and interface contractAll paths under $FORGE_MODELS_DIR/:
llama/causal_lm/pytorch/loader.py, gpt2/causal_lm/pytorch/loader.pybert/masked_lm/pytorch/loader.pyefficientnet/image_classification/pytorch/loader.py, resnet/image_classification/pytorch/loader.pysequence_classification/pytorch/loader.pyRead at least 2 loaders that are closest in architecture to the target model.
python3 - <<'PYEOF'
from huggingface_hub import model_info
info = model_info("MODEL_ID_HERE")
print("Pipeline tag:", info.pipeline_tag)
print("Library:", info.library_name)
print("Tags:", info.tags[:20])
PYEOF
Also check transformers auto-classes:
python3 - <<'PYEOF'
from transformers import AutoConfig
cfg = AutoConfig.from_pretrained("MODEL_ID_HERE", trust_remote_code=False)
print("Model type:", cfg.model_type)
print("Architectures:", getattr(cfg, 'architectures', None))
PYEOF
Use this to determine:
mistral, phi, gemma) used as the top-level directory namecausal_lm, masked_lm, sequence_classification, image_classification, token_classification, question_answeringconfig.py (e.g. ModelTask.NLP_CAUSAL_LM)AutoModelForCausalLM, AutoModelForSequenceClassification1b, 7b, base, largeCreate this directory structure under $FORGE_MODELS_DIR/ (replace family, task, variant placeholders):
third_party/tt_forge_models/
<family>/
__init__.py
<task>/
__init__.py
pytorch/
__init__.py
loader.py
Follow EXACTLY the pattern from the reference loaders you read. Key requirements:
ModelVariant(StrEnum) with at least one variant_VARIANTS dict mapping variant to LLMModelConfig(pretrained_model_name=...)DEFAULT_VARIANT = ModelVariant.<VARIANT>ModelLoader(ForgeModel) class__init__ calls super().__init__(variant), initialises self._tokenizer = None (for NLP) or similar_get_model_info returns ModelInfo(model=..., variant=variant, group=ModelGroup.GENERALITY, task=..., source=ModelSource.HUGGING_FACE, framework=Framework.TORCH)load_model(dtype_override=None) loads via appropriate Auto class with from_pretrained, applies dtype if providedload_inputs(dtype_override=None) creates a small batch of sample inputs (tokenized text or dummy tensors), returns a dict# SPDX-FileCopyrightText: (c) 2025 Tenstorrent AI ULC
#
# SPDX-License-Identifier: Apache-2.0
source "$TT_XLA_DIR/venv/activate"
uv pip install <any-extra-packages> # e.g. tiktoken, einops, timm
Check the HuggingFace model card or loader imports for extra requirements not already in the venv.
source "$TT_XLA_DIR/venv/activate"
cd "$FORGE_MODELS_DIR"
python3 - <<'PYEOF'
import sys, torch
sys.path.insert(0, ".")
from FAMILY.TASK.pytorch import ModelLoader
loader = ModelLoader()
model = loader.load_model()
model.eval()
inputs = loader.load_inputs()
with torch.no_grad():
outputs = model(**inputs)
print("CPU SUCCESS")
print("Output type:", type(outputs))
PYEOF
On failure:
torch_dtype=torch.float32 and check the model size first.load_inputs() dict keys do not match what the model expects, check model.forward signature and adjust.REPORT_PATH = $GITHUB_WORKSPACE/$REPORT_FILE (if both env vars are set)
OR <--report value> (if --report arg is present)
OR ./bringup-report-cpu.md (local fallback)
STATUS_PATH = $GITHUB_WORKSPACE/$STATUS_FILE (if both env vars are set)
OR $GITHUB_WORKSPACE/bringup-cpu-status.txt (if only GITHUB_WORKSPACE is set)
OR ./bringup-cpu-status.txt (local fallback)
On CPU test success:
mkdir -p "$(dirname "$REPORT_PATH")"
echo "SUCCESS" > "${STATUS_PATH}"
On CPU test failure (after all attempts):
mkdir -p "$(dirname "$REPORT_PATH")"
echo "FAILED" > "${STATUS_PATH}"
The workflow reads $STATUS_FILE to decide whether to commit and push — you do not need to run any git commands.
Write a Markdown report to $REPORT_PATH:
# CPU Bringup Report
**Model:** MODEL_ID
**Branch:** BRANCH_NAME
**Status:** SUCCESS or FAILED
## Files created
- List of files created with brief description
## CPU test result
Pass or Fail — include relevant output or error snippet
## Issues encountered and fixes applied
- Bullet list of any problems hit and how they were resolved
## Notes for TT hardware bringup
- Any known limitations, dtype requirements, or size warnings