Research on language model refusal behavior, examining whether LLMs can distinguish between legitimate and illegitimate rules when asked to help users evade restrictions. The paper (arxiv 2604.06233) analyzes how models handle requests framed as resisting "unjust" or "absurd" rules.
Research
Blind Refusal: Language Models Refuse to Help Users Evade Unjust, Absurd, and Illegitimate Rules
Research on language model refusal behavior, examining whether LLMs can distinguish between legitimate and illegitimate rules when asked to help users evade restrictions. The paper (arxiv 2604.06233) analyzes how mode...
Thursday, April 9, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.AIBY sys://pipeline
Tags
research
/// RELATED