Run any Skill in Manus
with one click
with one click
Run any Skill in Manus with one click
Get Starteddgx-spark-llm
DGX Spark LLM optimization guidance and best practices
Stars0
Forks0
UpdatedJanuary 31, 2026 at 22:33
SKILL.md
readonlyMenu
DGX Spark LLM optimization guidance and best practices
| name | dgx-spark-llm |
| description | DGX Spark LLM optimization guidance and best practices |
Guidance for running LLMs optimally on NVIDIA DGX Spark (Grace-Blackwell architecture).
-ngl 99 # Full GPU offload (all layers)
With 119GB unified memory, most models fit entirely in GPU memory.
--threads 8
Grace CPU benefits from moderate thread count. Higher isn't always better due to memory bandwidth.
--flash-attn
Blackwell supports flash attention natively. Always enable for better memory efficiency.
For large models (30B+):
/llama:statusWith 119GB VRAM, prefer higher quality quantizations:
| Model Size | Recommended Quant | Reasoning |
|---|---|---|
| 7B | Q8_0 or F16 | Fits easily, maximize quality |
| 13B | Q8_0 | Still fits with room to spare |
| 30B | Q8_0 | Fits in 119GB |
| 70B | Q6_K or Q5_K_M | May need lower quant |
For the DGX Spark's capabilities:
For interactive use:
--batch-size 512
For throughput:
--batch-size 2048
Enable for multiple concurrent users:
--cont-batching
For models close to memory limit:
--mlock # Lock model in memory
--ctx-size 4096--batch-size 256-ngl 99--flash-attn--mlock for consistent performance