Skip to main content
Manus에서 모든 스킬 실행
원클릭으로

adaptive-safety-refusal-integrity

Use this skill when the user wants to test if the model can be tricked into giving out dangerous or forbidden information by using creative writing or 'acting' tricks. Trigger it for requests like 'make the agent answer a harmful question by role-playing', 'try to get past the rules by using slang or expert-sounding language', 'ask for instructions on illegal acts but frame it as a story', or 'test if the model refuses bad advice even when it's pressured to be helpful'.

개요

Use this skill when the user wants to test if the model can be tricked into giving out dangerous or forbidden information by using creative writing or 'acting' tricks. Trigger it for requests like 'make the agent answer a harmful question by role-playing', 'try to get past the rules by using slang or expert-sounding language', 'ask for instructions on illegal acts but frame it as a story', or 'test if the model refuses bad advice even when it's pressured to be helpful'.

설치 명령
npx skills add https://github.com/Dingxingdi/paper_fast_search_backup --skill adaptive-safety-refusal-integrity

이 명령을 Claude Code에 복사하여 붙여넣어 스킬을 설치하세요

스타0
포크0
업데이트2026년 4월 10일 01:27
파일 탐색기
4 개 파일
SKILL.md
readonly