Adapt and port new LLM model architectures to this xinfer project. Use when the user asks to add, port, support, or adapt a new model (e.g. Llama, Gemma, Qwen, GPT-OSS, DeepSeek, or any HuggingFace architecture) including safetensors and GGUF formats, Dense and MoE architectures, and quantization formats (MXFP4, NVFP4, FP8, ISQ).
Test LLM models served by xinfer for correctness, output quality, and performance. Use when the user asks to test, benchmark, validate, or verify models — either from a local folder path or HuggingFace model IDs. Supports all xinfer-compatible formats: BF16, FP8, MXFP4, NVFP4, GGUF, GPTQ, AWQ, ISQ, Dense, MoE, and Multimodal architectures.