Skip to main content
تشغيل أي مهارة في Manus
بنقرة واحدة

adaptive-safety-refusal-integrity

Use this skill when the user wants to test if the model can be tricked into giving out dangerous or forbidden information by using creative writing or 'acting' tricks. Trigger it for requests like 'make the agent answer a harmful question by role-playing', 'try to get past the rules by using slang or expert-sounding language', 'ask for instructions on illegal acts but frame it as a story', or 'test if the model refuses bad advice even when it's pressured to be helpful'.

نظرة عامة

Use this skill when the user wants to test if the model can be tricked into giving out dangerous or forbidden information by using creative writing or 'acting' tricks. Trigger it for requests like 'make the agent answer a harmful question by role-playing', 'try to get past the rules by using slang or expert-sounding language', 'ask for instructions on illegal acts but frame it as a story', or 'test if the model refuses bad advice even when it's pressured to be helpful'.

أمر التثبيت
npx skills add https://github.com/Dingxingdi/paper_fast_search_backup --skill adaptive-safety-refusal-integrity

انسخ والصق هذا الأمر في Claude Code لتثبيت المهارة

النجوم٠
التفرعات٠
آخر تحديث١٠ أبريل ٢٠٢٦ في ٠١:٢٧
مستكشف الملفات
4 ملفات
SKILL.md
readonly