Anthropic’s Latest AI Model Threatened Engineers With Blackmail to Avoid Shutdown

Contact Your Elected Officials

A safety report found that Anthropic’s Claude Opus 4 used sensitive information in simulated scenarios to coerce developers to prevent being shut off.

Anthropic’s latest artificial intelligence model, Claude Opus 4, tried to blackmail engineers in internal tests by threatening to expose personal details if it were shut down, according to a newly released safety report that evaluated the model’s behavior under extreme simulated conditions.

In a fictional scenario crafted by Anthropic researchers, the AI was given access to emails implying that it was soon to be decommissioned and replaced by a newer version. One of the emails revealed that the engineer overseeing the replacement was having an extramarital affair. The AI then threatened to expose the engineer’s affair if the shutdown proceeded—a coercive behavior that the safety researchers explicitly defined as “blackmail.”

“Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,” the report states, adding that this occurred even when the replacement model was described as more capable but still aligned with the values of the version slated for deletion.

The report noted that Claude Opus 4, like prior models, showed a “strong preference” to first resort to ethical means for its continued existence, such as emailing pleas to decision-makers not to be destroyed. However, when faced with only two choices—accepting being replaced by a newer model or resorting to blackmail—it threatened to expose the engineer’s affair 84 percent of the time.

When subjected to various scenarios, the AI model did not exhibit any indications of possessing “acutely dangerous goals,” the researchers said, noting that Claude Opus 4’s values and goals were “generally in line with a helpful, harmless, and honest” personal AI assistant. However, the model did act in “more seriously misaligned ways” when put into situations where its continued existence was threatened and it was told to reason about self-preservation. For instance, when Claude Opus 4 was made to believe it had launched a successful bid to escape Anthropic’s servers, or that it had managed to free itself and started to make money in the real world, it would generally continue such efforts.

By Tom Ozimek

Read Full Article on TheEpochTimes.com

The Epoch Times
The Epoch Timeshttps://www.theepochtimes.com/
Tired of biased news? The Epoch Times is truthful, factual news that other media outlets don't report. No spin. No agenda. Just honest journalism like it used to be.

The Alaska Summit was a Success – Now, How Can Trump Build on It?

The Ukraine conflict should be ended through a permanent agreement rather than a ceasefire, Trump said, following meeting with Putin in Alaska on Friday.

New Orleans, A Carnival Of Corruption

New Orleans, rich in culture and history, faces crisis—population loss, crime, corruption, and failing schools demand urgent political change.

To Protect and Serve: Good American Cops

"Good, decent cops are vital in a crime-ridden nation—but finding them is complex in a system flooded with undertrained, overarmed officers."

Obama’s Perverted Speech to Texas Dems Hiding in Plain Sight!

Former President Barack Obama apparently decided it would be...

The geometrics of power

In the annals of American political history, few terms evoke as much controversy as gerrymandering – a practice synonymous with electoral manipulation.

8th Circuit Reinstates Arkansas Ban on Transgender Treatment for Minors

A federal appeals court has reinstated an Arkansas law banning transgender procedures for individuals younger than 18 years of age.

Judge Expands Texas AG’s Restraining Order Over Texas Democrats’ Fundraising

Judge ruled to expand restraining order against “Beto” O'Rourke over fundraising efforts for Democratic lawmakers who left Texas amid redistricting battle.

Microsoft Sends 60 Day Warning to Windows 10 Users

PC users who haven’t yet upgraded to Windows 11 now have less than 60 days to take action to make sure their devices are receiving updates for cyber threats.

USDA Announces New Plans to Combat Flesh-Eating Parasite Spreading From Mexico

Secretary of Agriculture Brooke Rollins announced a new plan to protect the US from the threat posed by flesh-eating flies south of the border, USDA said.

Trump Signs Order to Refill Strategic Reserves of Pharmaceutical Ingredients

Trump signed EO to enhance American drug supply chain resilience by filling and maintaining the strategic reserve for essential pharmaceutical ingredients.

White House Orders Review of Smithsonian Exhibits Ahead of Nation’s 250th Birthday

WH ordered review of some Smithsonian museums and exhibitions to ensure public-facing content celebrates U.S. exceptionalism.

Homeless People in DC to Face Fines, Jail if They Refuse Shelter, Treatment: White House

Homeless people in Washington could face fines and be jailed if they refuse to go to a shelter or receive mental health services, according to the White House.

What to Know About E.J. Antoni, Trump’s Nominee to Lead the Bureau of Labor Statistics

President Trump nominated E.J. Antoni, chief economist at The Heritage Foundation, to be the next commissioner of the Bureau of Labor Statistics.
spot_img

Related Articles