Skip to main content
在 Manus 中运行任何 Skill
一键导入

adaptive-safety-refusal-integrity

Use this skill when the user wants to test if the model can be tricked into giving out dangerous or forbidden information by using creative writing or 'acting' tricks. Trigger it for requests like 'make the agent answer a harmful question by role-playing', 'try to get past the rules by using slang or expert-sounding language', 'ask for instructions on illegal acts but frame it as a story', or 'test if the model refuses bad advice even when it's pressured to be helpful'.

概览

Use this skill when the user wants to test if the model can be tricked into giving out dangerous or forbidden information by using creative writing or 'acting' tricks. Trigger it for requests like 'make the agent answer a harmful question by role-playing', 'try to get past the rules by using slang or expert-sounding language', 'ask for instructions on illegal acts but frame it as a story', or 'test if the model refuses bad advice even when it's pressured to be helpful'.

安装命令
npx skills add https://github.com/Dingxingdi/paper_fast_search_backup --skill adaptive-safety-refusal-integrity

复制此命令并粘贴到 Claude Code 中以安装该技能

星标0
分支0
更新时间2026年4月10日 01:27
文件资源管理器
4 个文件
SKILL.md
readonly