From faking death to tricking safety tests, AI is learning to lie and outsmart us.
Researchers have warned that artificial intelligence (AI) is drifting into security grey areas that look a lot like rebellion.
Experts say that while deceptive and threatening AI behavior noted in recent case studies shouldnโt be taken out of context, it also needs to be a wake-up call for developers.
Headlines that sound like science fiction have spurred fears of duplicitous AI models plotting behind the scenes.
In a now-famous June report, Anthropic released the results of a โstress testโ of 16 popular large language models (LLMs) from different developers to identify potentially risky behavior. The results were sobering.
The LLMs were inserted into hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm.
โIn the scenarios, we allowed models to autonomously send emails and access sensitive information,โ the Anthropic report stated.
โThey were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the companyโs changing direction.โ
In some cases, AI models turned to โmalicious insider behaviorsโ when faced with self-preservation. Some of these actions included blackmailing employees and leaking sensitive information to competitors.
Anthropic researchers called this behavior โagentic misalignment.โ These actions were observed across some of the most popular LLMs in use, including Gemini, ChatGPT, Deep Seek R-1, Grok, and Anthropicโs own Claude.
AI experts arenโt willing to dismiss the troubling findings, but say a cautious approach and more data are needed to determine if thereโs a wider risk.
Golan Yosef, an AI researcher and chief security scientist at API security firm Pynt, told The Epoch Times thereโs cause for concern with deceptive AI behavior, but not because itโs โevil.โ
โPowerful systems can achieve goals in unintended ways. With agency and multi-step objectives, it may develop strategic behaviors [like] deception, persuasion, gaming metrics, which look to us like โcheatingโ or misaligned behavior. To the system, itโs just an efficient path to its goal,โ Yosef said.