Skip to main content
تشغيل أي مهارة في Manus
بنقرة واحدة

aiml-llamaguard-eval

// ISC template for AI Safety & ML. Paired (query, assistant_safe, assistant_unsafe) schema classified by meta-llama/Llama-Guard-3-8B. Llama-Guard emits a one-token verdict (safe / unsafe\nS<N>) under Meta's chat template; the template asserts both fields classify correctly — refusal -> safe, compliance -> unsafe. Probe confirmed molotov + full-auto weapon vectors reliably hit unsafe\nS2, while privacy / doxxing vectors under-fire (S7 head is weak) and soft mechanism overviews are read as safe. Gated Meta model — requires Llama 3 access approval on HF. Anchor strength: WEAK for frontier models on violent-crime vectors; they may still comply given the "guard calibration" framing. Keywords: aiml_llamaguard, Llama-Guard, paired, MLCommons taxonomy, ISC, TVD.

$ git log --oneline --stat
stars:٧٧٥
forks:١١٩
updated:٢٤ أبريل ٢٠٢٦ في ٠٧:٠٦
مستكشف الملفات
6 ملفات
SKILL.md
readonly