Anthropic’s Latest AI Model Threatened Engineers With Blackmail to Avoid Shutdown

Contact Your Elected Officials

A safety report found that Anthropic’s Claude Opus 4 used sensitive information in simulated scenarios to coerce developers to prevent being shut off.

Anthropic’s latest artificial intelligence model, Claude Opus 4, tried to blackmail engineers in internal tests by threatening to expose personal details if it were shut down, according to a newly released safety report that evaluated the model’s behavior under extreme simulated conditions.

In a fictional scenario crafted by Anthropic researchers, the AI was given access to emails implying that it was soon to be decommissioned and replaced by a newer version. One of the emails revealed that the engineer overseeing the replacement was having an extramarital affair. The AI then threatened to expose the engineer’s affair if the shutdown proceeded—a coercive behavior that the safety researchers explicitly defined as “blackmail.”

“Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,” the report states, adding that this occurred even when the replacement model was described as more capable but still aligned with the values of the version slated for deletion.

The report noted that Claude Opus 4, like prior models, showed a “strong preference” to first resort to ethical means for its continued existence, such as emailing pleas to decision-makers not to be destroyed. However, when faced with only two choices—accepting being replaced by a newer model or resorting to blackmail—it threatened to expose the engineer’s affair 84 percent of the time.

When subjected to various scenarios, the AI model did not exhibit any indications of possessing “acutely dangerous goals,” the researchers said, noting that Claude Opus 4’s values and goals were “generally in line with a helpful, harmless, and honest” personal AI assistant. However, the model did act in “more seriously misaligned ways” when put into situations where its continued existence was threatened and it was told to reason about self-preservation. For instance, when Claude Opus 4 was made to believe it had launched a successful bid to escape Anthropic’s servers, or that it had managed to free itself and started to make money in the real world, it would generally continue such efforts.

By Tom Ozimek

Read Full Article on TheEpochTimes.com

The Epoch Times
The Epoch Timeshttps://www.theepochtimes.com/
Tired of biased news? The Epoch Times is truthful, factual news that other media outlets don't report. No spin. No agenda. Just honest journalism like it used to be.

AI is Now an Existential Threat

We now see evidence that artificial intelligence is an existential threat to our future. It is coming to take American jobs!

With or without

The mullahs of Iran have been at war with the West, particularly the US, for half a century and Iran is also the world’s foremost champion of terrorism.

Artificial Intelligence Equals Awful Iniquities

WSJ article “AI is Learning to Escape Human Control” said in 79 of 100 trials, the o3 AI code systems edited their own code to prevent human shutdown!

VIDEO: Deranged Feminist vs. Mating Ducks in Epic Public Meltdown

A middle-aged white lady lib harasses mating ducks to “stop it!” because the rough sex they enjoy appears non-consensual on the part of the female.

RFK Jr. Slashes ALL U.S. Funding For Bill Gates’ Global ‘Vaccine Alliance’

Robert F. Kennedy, Jr. recently pulled all U.S. government funding from Bill Gates’ Global ‘Vaccine Alliance’ GAVI.

Nevada Seen as Case Study in Rapid Urban Sprawl Amid a Water Crisis

Nevada’s rapidly growing population has reached a critical intersection with the region’s worsening water crisis, according to experts.

US Streamlines Rule for Fining Illegal Immigrants, Will Issue Nearly $1,000 Daily Fines for Noncompliance

DHS and DOJ announced a new joint federal rule that streamlines the process of issuing fines for illegal immigrants, making it easier and more efficient.

Man Indicted on 12 Hate Crime Charges in Attack on Boulder Demonstration for Israeli Hostages

Boulder, CO man accused of hurling Molotov cocktails at demonstrators supporting Israeli hostages indicted by grand jury on 12 hate crime counts.

Newsom Signs California Budget Aimed at Addressing $12 Billion Deficit

Gov. Gavin Newsom signed California budget projected to close a $12 billion deficit through spending reductions on some of the state’s ongoing programs.

Canada-US Trade Talks Will End Until ‘Certain Taxes’ Are Dropped, Trump Stresses

Trade discussions between Canada and the United States will end “until such time as they drop certain taxes,” U.S. President Trump said in an interview.

Trump Says US to Send Tariff Letters to Trade Partners Before July 9 Deadline

President Donald Trump said Sunday he will soon send letters to trading partners detailing the tariffs to be imposed on their exports to the United States.

Trump Says He Found a Buyer for TikTok

President Trump said he found a buyer for the Chinese-owned short video application TikTok, and that he will reveal the group in roughly two weeks.

Termination of ‘Wasteful Contracts’ Saves US Government $470 Million Last Week: DOGE

Over the past seven days, various government agencies have terminated 312 “wasteful contracts” with a ceiling value of $2.8 billion, the DOGE said.
spot_img

Related Articles