Refusal in LLMs is mediated by a single direction
Refusal in LLMs is mediated by a single direction

www.lesswrong.com
Refusal in LLMs is mediated by a single direction — LessWrong

Refusal in LLMs is mediated by a single direction
Refusal in LLMs is mediated by a single direction — LessWrong