One of the ‘godfathers of AI’
is warning that current models are exhibiting dangerous behaviors as he launches a new nonprofit focused on building “honest” systems.
Yoshua Bengio, a pioneer of artificial neural networks and deep learning, has announced LawZero, an organization focused on building safer models away from commercial pressures.
In
a blog post announcing the new organization, Bengio said LawZero had been created “in response to evidence that today’s frontier AI models are growing dangerous capabilities and behaviours, including deception, cheating, lying, hacking, self-preservation, and more generally, goal misalignment.”
The nonprofit is building a system called Scientist AI designed to serve as a guardrail for increasingly powerful AI agents.
AI models created by the nonprofit will not give the definitive answers typical of current systems. Instead, they will give probabilities for whether a response is correct.
Bengio
told The Guardian that his models would have a “sense of humility that it isn’t sure about the answer.”
In the blog post announcing the venture, Bengio said he was “deeply concerned by the behaviors that unrestrained agentic AI systems are already beginning to exhibit—especially tendencies toward self-preservation and deception.”
He cited recent examples, including a scenario in which Anthropic’s Claude 4 chose to blackmail an engineer to avoid being replaced, as well as another experiment that showed an AI model covertly embedding its code into a system to avoid being replaced.
“These incidents,” Bengio said, “are early warning signs of the kinds of unintended and potentially dangerous strategies AI may pursue if left unchecked.”
—Beatrice Nolan