Skip to main content
Run any Skill in Manus
with one click

distributed-llm-pretraining-torchtitan

Stars9,996
Forks745
UpdatedJanuary 29, 2026 at 01:35

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

File Explorer
5 files
SKILL.md
readonly