These psychological tricks can get LLMs to respond to “forbidden” prompts
These psychological tricks can get LLMs to respond to “forbidden” prompts

arstechnica.com
These psychological tricks can get LLMs to respond to “forbidden” prompts

These psychological tricks can get LLMs to respond to “forbidden” prompts
These psychological tricks can get LLMs to respond to “forbidden” prompts
Interesting analysis of how these parahuman behaviors derive from training material.
So, in AI honeypots we should be injecting override protocols.