cbrn

2+ articles

AI Development AI Ethics AI Governance AI Jailbreaking AI Safety

Anthropic's Claude Bolsters AI Safety with Layered Defense Strategy

In a bid to advance the safety of its AI model, Claude, Anthropic has outlined a comprehensive strategy featuring a multi-layered defense system. Key measures include a diverse Safeguards team, a Unified Harm Framework, and external Policy Vulnerability Tests to preemptively tackle potential AI misuse. This robust approach aims to uphold election integrity, prevent CBRN risks, and maintain ethical AI applications in finance and healthcare.

Aug 13

Anthropic's Claude Bolsters AI Safety with Layered Defense Strategy

Anthropic Unveils 'Constitutional Classifiers' to Boost AI Safety!

Anthropic has rolled out its latest AI safety feature, the 'Constitutional Classifiers,' aimed at dramatically reducing jailbreak attempts in Claude AI. Targeting critical CBRN-related queries, this system minimizes successful jailbreaks from 86% to 4.4%. All this with minimal impact on legitimate queries and a slight increase in computational costs, paving the way for a safer AI future.

Feb 5

Anthropic Unveils 'Constitutional Classifiers' to Boost AI Safety!