Anthropic’s Latest AI Model Threatened Engineers With Blackmail to Avoid Shutdown

A safety report found that Anthropic’s Claude Opus 4 used sensitive information in simulated scenarios to coerce developers to prevent being shut off.

Anthropic’s latest artificial intelligence model, Claude Opus 4, tried to blackmail engineers in internal tests by threatening to expose personal details if it were shut down, according to a newly released safety report that evaluated the model’s behavior under extreme simulated conditions.

In a fictional scenario crafted by Anthropic researchers, the AI was given access to emails implying that it was soon to be decommissioned and replaced by a newer version. One of the emails revealed that the engineer overseeing the replacement was having an extramarital affair. The AI then threatened to expose the engineer’s affair if the shutdown proceeded—a coercive behavior that the safety researchers explicitly defined as “blackmail.”

“Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,” the report states, adding that this occurred even when the replacement model was described as more capable but still aligned with the values of the version slated for deletion.

The report noted that Claude Opus 4, like prior models, showed a “strong preference” to first resort to ethical means for its continued existence, such as emailing pleas to decision-makers not to be destroyed. However, when faced with only two choices—accepting being replaced by a newer model or resorting to blackmail—it threatened to expose the engineer’s affair 84 percent of the time.

When subjected to various scenarios, the AI model did not exhibit any indications of possessing “acutely dangerous goals,” the researchers said, noting that Claude Opus 4’s values and goals were “generally in line with a helpful, harmless, and honest” personal AI assistant. However, the model did act in “more seriously misaligned ways” when put into situations where its continued existence was threatened and it was told to reason about self-preservation. For instance, when Claude Opus 4 was made to believe it had launched a successful bid to escape Anthropic’s servers, or that it had managed to free itself and started to make money in the real world, it would generally continue such efforts.

By Tom Ozimek

Read Full Article on TheEpochTimes.com

The Epoch Times
The Epoch Timeshttps://www.theepochtimes.com/
Tired of biased news? The Epoch Times is truthful, factual news that other media outlets don't report. No spin. No agenda. Just honest journalism like it used to be.

Columns

As China’s Economy Weakens, Tens of Thousands of Workers Protest Against Unpaid Wages

Tens of thousands of workers in China have not received their full salaries, or have not been paid for months, employees across several sectors told The Epoch Times.

Democrats 2025, the Very Definition of Insanity

Here are two recent examples that show why the popularity of President Donald Trump is on the rise while the media and the left are doing a crash and burn.

$67 in France and $798 in US–Why Prescription Drug Prices Are so High in US

Prescription drugs cost more in the US than anywhere else in the world. President Donald Trump and some bipartisan senators want to change that.

WHO ‘Pandemic Agreement’ Passes, Further Erodes Nation-State Primacy

Bioterror Propaganda Roundup: The latest updates on the “new...

Is COVID On the Rise All over Asia?

Many videos and social media posts imply China and the CCP are headed towards an economic, social, and political doom.

News

DNC Sets June Vote on Possible Invalidation of Election of Vice Chairs

DNC will vote in June to decide if it will invalidate the vice-chair elections of activist David Hogg and Pennsylvania State Rep. Malcolm Kenyatta on Feb. 1.

Supreme Court Temporarily Shields DOGE From Freedom of Information Requests

The Supreme Court temporarily blocked lower court orders requiring the DOGE to respond to freedom of information requests in a pending lawsuit.

FTC, DOJ File ‘Statement of Interest’ Against BlackRock, State Street, and Vanguard in Coal Manipulation Case

FTC and DOJ filed “Statement of Interest” in multi-state litigation accusing BlackRock, State Street, and Vanguard of conspiring to restrict coal production.

Federal Judge Temporarily Blocks Trump Admin’s Ban on Harvard’s Ability to Enroll Foreign Students

A federal judge on Friday issued an injunction to block a DHS directive that revoked Harvard University’s ability to enroll foreign students.

Harvard Sues Trump Administration After It Blocked Foreign Students’ Enrollment

Harvard Univ filed a lawsuit against the Trump admin over its decision a day earlier to revoke the Ivy League school’s ability to enroll foreign students.

US Indicts Russian National Over Alleged Role in Qakbot Ransomware Attacks

DOJ unsealed charges against Russian citizen leading a cyber-criminal group responsible for Qakbot malware, targeting computers in US and globally.

MAHA Commission Recommends Nutrition Trials to Improve Childhood Health

Trump’s commission on health said the govt should launch new clinical trials on nutrition and improve surveillance of vaccines and drugs given to children.

CIA Says Winning Tech War With China Top Priority, Citing ‘Existential Threat’ to US

CIA says China is “existential threat” to US and top priority is outpacing CCP in tech arms race spanning semiconductors, biotechnology, and AI.
spot_img

Related Articles