Anthropic’s Latest AI Model Threatened Engineers With Blackmail to Avoid Shutdown

A safety report found that Anthropic’s Claude Opus 4 used sensitive information in simulated scenarios to coerce developers to prevent being shut off.

Anthropic’s latest artificial intelligence model, Claude Opus 4, tried to blackmail engineers in internal tests by threatening to expose personal details if it were shut down, according to a newly released safety report that evaluated the model’s behavior under extreme simulated conditions.

In a fictional scenario crafted by Anthropic researchers, the AI was given access to emails implying that it was soon to be decommissioned and replaced by a newer version. One of the emails revealed that the engineer overseeing the replacement was having an extramarital affair. The AI then threatened to expose the engineer’s affair if the shutdown proceeded—a coercive behavior that the safety researchers explicitly defined as “blackmail.”

“Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,” the report states, adding that this occurred even when the replacement model was described as more capable but still aligned with the values of the version slated for deletion.

The report noted that Claude Opus 4, like prior models, showed a “strong preference” to first resort to ethical means for its continued existence, such as emailing pleas to decision-makers not to be destroyed. However, when faced with only two choices—accepting being replaced by a newer model or resorting to blackmail—it threatened to expose the engineer’s affair 84 percent of the time.

When subjected to various scenarios, the AI model did not exhibit any indications of possessing “acutely dangerous goals,” the researchers said, noting that Claude Opus 4’s values and goals were “generally in line with a helpful, harmless, and honest” personal AI assistant. However, the model did act in “more seriously misaligned ways” when put into situations where its continued existence was threatened and it was told to reason about self-preservation. For instance, when Claude Opus 4 was made to believe it had launched a successful bid to escape Anthropic’s servers, or that it had managed to free itself and started to make money in the real world, it would generally continue such efforts.

By Tom Ozimek

Read Full Article on TheEpochTimes.com

Anthropic’s Latest AI Model Threatened Engineers With Blackmail to Avoid Shutdown

A safety report found that Anthropic’s Claude Opus 4 used sensitive information in simulated scenarios to coerce developers to prevent being shut off.

Columnists

The Party Of Hate Is Unleashing Political Violence

‘Radical Right’ Restore Britain: The Remigration Dream Machine?

Trump 2.0’s Grand Strategy Against China Is Slowly But Surely Coming Together

From legacy to liability

Is Ghislaine Maxwell Free in Canada?

National News

EPA to Reform $5 Billion ‘Clean School Bus’ Program

Judge Says Jack Smith’s Final Report on Trump Can Never Be Released

US Wins Its Record 11th Gold Medal at Winter Olympics

Secret Service Agents Fatally Shoot Man Trying to Unlawfully Enter Mar-a-Lago

Trump 2.0

US Trade Representative Says Nations Are Not Backing Out of Tariff Deals

DOJ Fires Interim US Attorney Hours After Virginia Court Selects Him

Trump Admin Says Courts Need to Act on Tariff Refunds After Supreme Court Ruling

Supreme Court Ruling on Tariffs Won’t Change US–China Trade Relations, Analysts

Related Articles

NASA Awards Next 2 Private Astronaut Missions to International Space Station

Musk’s SpaceX Acquires xAI to ‘Accelerate Humanity’s Future’

More Than 1 Million AI Bots Have Joined a New AI-Only Social Network

Astronauts See Real Connection Between Space Station Work and Moon Missions

Welcome

Menu

Headlines

EPA to Reform $5 Billion ‘Clean School Bus’ Program

Judge Says Jack Smith’s Final Report on Trump Can Never Be Released

US Wins Its Record 11th Gold Medal at Winter Olympics

Secret Service Agents Fatally Shoot Man Trying to Unlawfully Enter Mar-a-Lago

Follow Us On