AI 'Jailbreaking': New BoN Technique Outsmarts Top Models Like GPT-4 and Claude 3.5
Researchers from Anthropic, Oxford, Stanford, and MIT introduce the Best-of-N (BoN) method—a groundbreaking ‘jailbreaking’ technique that bypasses AI safety protocols to trick models into harmful outputs. The method shows a staggering 50% success rate on models like Claude 3.5, GPT-4, and Gemini.
Dec 24