wojciech zaremba

1+ articles

AGI AI Research AI Safety AI Security Deliberative Alignment

OpenAI's Bold New Strategy: 'Deliberative Alignment' Takes AI Safety to Next Level

OpenAI's latest innovation, 'deliberative alignment,' aims to teach AI to think through safety protocols like never before. This three-stage process promises enhanced reasoning in AI models, with the new o1 model setting benchmarks in safety. But the journey to flawless AI safekeeping hits a bump as a security researcher exposes vulnerabilities, pointing to the persistent challenges in AI control.

Dec 31

OpenAI's Bold New Strategy: 'Deliberative Alignment' Takes AI Safety to Next Level