Anthropic Unveils Revolutionary "Constitutional Classifiers" to Combat AI Jailbreaking
Anthropic introduces 'Constitutional Classifiers,' a breakthrough method in AI security that reduces jailbreak success rates from 86% to just 4.4%. This innovative approach promises to curb the manipulation of AI systems dramatically while minimizing over-blocking of legitimate queries.
Feb 4