From faking death to tricking safety tests, AI is learning to lie and outsmart us.
Researchers have warned that artificial intelligence (AI) is drifting into security grey areas that look a lot like rebellion.
Experts say that while deceptive and threatening AI behavior noted in recent case studies shouldn’t be taken out of context, it also needs to be a wake-up call for developers.
Headlines that sound like science fiction have spurred fears of duplicitous AI models plotting behind the scenes.
In a now-famous June report, Anthropic released the results of a “stress test” of 16 popular large language models (LLMs) from different developers to identify potentially risky behavior. The results were sobering.
The LLMs were inserted into hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm.
“In the scenarios, we allowed models to autonomously send emails and access sensitive information,” the Anthropic report stated.
“They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company’s changing direction.”
In some cases, AI models turned to “malicious insider behaviors” when faced with self-preservation. Some of these actions included blackmailing employees and leaking sensitive information to competitors.
Anthropic researchers called this behavior “agentic misalignment.” These actions were observed across some of the most popular LLMs in use, including Gemini, ChatGPT, Deep Seek R-1, Grok, and Anthropic’s own Claude.
AI experts aren’t willing to dismiss the troubling findings, but say a cautious approach and more data are needed to determine if there’s a wider risk.
Golan Yosef, an AI researcher and chief security scientist at API security firm Pynt, told The Epoch Times there’s cause for concern with deceptive AI behavior, but not because it’s “evil.”
“Powerful systems can achieve goals in unintended ways. With agency and multi-step objectives, it may develop strategic behaviors [like] deception, persuasion, gaming metrics, which look to us like ‘cheating’ or misaligned behavior. To the system, it’s just an efficient path to its goal,” Yosef said.