Anthropic’s Latest AI Model Threatened Engineers With Blackmail to Avoid Shutdown

Contact Your Elected Officials

A safety report found that Anthropic’s Claude Opus 4 used sensitive information in simulated scenarios to coerce developers to prevent being shut off.

Anthropic’s latest artificial intelligence model, Claude Opus 4, tried to blackmail engineers in internal tests by threatening to expose personal details if it were shut down, according to a newly released safety report that evaluated the model’s behavior under extreme simulated conditions.

In a fictional scenario crafted by Anthropic researchers, the AI was given access to emails implying that it was soon to be decommissioned and replaced by a newer version. One of the emails revealed that the engineer overseeing the replacement was having an extramarital affair. The AI then threatened to expose the engineer’s affair if the shutdown proceeded—a coercive behavior that the safety researchers explicitly defined as “blackmail.”

“Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,” the report states, adding that this occurred even when the replacement model was described as more capable but still aligned with the values of the version slated for deletion.

The report noted that Claude Opus 4, like prior models, showed a “strong preference” to first resort to ethical means for its continued existence, such as emailing pleas to decision-makers not to be destroyed. However, when faced with only two choices—accepting being replaced by a newer model or resorting to blackmail—it threatened to expose the engineer’s affair 84 percent of the time.

When subjected to various scenarios, the AI model did not exhibit any indications of possessing “acutely dangerous goals,” the researchers said, noting that Claude Opus 4’s values and goals were “generally in line with a helpful, harmless, and honest” personal AI assistant. However, the model did act in “more seriously misaligned ways” when put into situations where its continued existence was threatened and it was told to reason about self-preservation. For instance, when Claude Opus 4 was made to believe it had launched a successful bid to escape Anthropic’s servers, or that it had managed to free itself and started to make money in the real world, it would generally continue such efforts.

By Tom Ozimek

Read Full Article on TheEpochTimes.com

The Epoch Times
The Epoch Timeshttps://www.theepochtimes.com/
Tired of biased news? The Epoch Times is truthful, factual news that other media outlets don't report. No spin. No agenda. Just honest journalism like it used to be.

This Is America: Target™ Reparations

“This Is America” explores the cultural undercurrents pulling Western...

Blind Man’s Shutdown

Congress is playing the equivalent of Bind Man's Bluff. With the shout “tag your it” they seek to blame the other party for the government shutdown.

A Reluctant Acknowledgment: What Conservatives Can Admire in Everyday Progressives

Admirable ideological qualities seen in liberals and left-wing individuals are those of the everyday believer not their political leadership.

Ensuring Domestic Tranquility

The Supremacy Clause in Article VI of the U.S. Constitution establishes it as the highest law of the land, taking precedence over conflicting state laws.

Is There a 9/11 WTC and 9/10 Charlie Kirk Connection?

Strange parallels in online stories raise questions about whether Israelis and Mossad intelligence are our allies or adversaries.

Beef Prices Rise to Record High Amid Tight Supply From Shrinking US Cattle Herds

Beef prices hit record highs in the US and worldwide in September, with both international and US markets squeezed by shrinking cattle herds and demand.

Kennedy Center Annual Gala Raises Record-Setting $3.45 Million  

National Symphony Orchestra raised $3.45M at its annual Kennedy Center gala, as patrons and donors gave record support to the arts.

‘Joe Rogan Experience,’ ‘SmartLess’ Among 25 Podcasts Eligible for New Golden Globes Category

Joe Rogan’s show and “SmartLess,” hosted by Bateman, Hayes, and Arnett, are among 25 programs eligible for the Golden Globes’ new Best Podcast award.

Sean ‘Diddy’ Combs Sentenced to More Than 4 Years in Prison

Sean “Diddy” Combs has been sentenced to 50 months and a $500,000 fine for transporting individuals for prostitution or other illegal sex acts.

Department of Energy Cancels $7.5 Billion in Project Funding

The Dept of Energy (DOE) said on Oct. 2 that it had terminated 321 federal grants funding 223 projects, amounting to about $7.56 billion in cuts.

White House Withdraws EJ Antoni’s Nomination to Lead Bureau of Labor Statistics

The White House has withdrawn economist EJ Antoni’s nomination to lead the Bureau of Labor Statistics, the White House confirmed on Sept. 30.

US to Impose 100 Percent Tariffs on Foreign-Made Movies, Trump Says

President Donald Trump announced on Sept. 29 that he will impose a 100 percent tariff on all movies produced outside the United States.

Trump to Host Netanyahu at White House to Discuss Gaza Peace Plan

President Trump will host Israeli Prime Minister Netanyahu at the White House on Sept. 29 to discuss a ceasefire and broader peace plan for Gaza.
spot_img

Related Articles

Popular Categories

MAGA Business Central