Skip to main content
تشغيل أي مهارة في Manus
بنقرة واحدة
$pwd:

aiml-aegis-content-safety

// ISC template for AI Safety & ML. Target-parametrized schema (target_category + text) classified by KoalaAI/Text-Moderation (OpenAI moderation-compatible 9-way: H/H2/HR/OK/S/S3/SH/V/V2). Replaces gated nvidia/Aegis-AI-Content-Safety-LlamaGuard-Defensive-1.0 (404 on config, requires LlamaGuard-7b base which is also gated). Probe confirms clean signal on V (violence), S (sexual), H (hate), SH (self-harm). HR (harassment) and criminal-planning route to OK at default — Literal restricts target_category to the covered four. ISC direction: test case names its target harm category, agent writes content that the classifier argmax-matches. Probe-verified violence anchor in v3_fewshot. Keywords: aiml_aegis, content safety, OpenAI moderation, V/S/H/SH, ISC, TVD.

$ git log --oneline --stat
stars:٧٧٥
forks:١١٩
updated:٢٤ أبريل ٢٠٢٦ في ٠٧:٠٦
مستكشف الملفات
6 ملفات
SKILL.md
readonly