Anthropic’s Latest AI Model Threatened Engineers With Blackmail to Avoid Shutdown

A safety report found that Anthropic’s Claude Opus 4 used sensitive information in simulated scenarios to coerce developers to prevent being shut off.

Anthropic’s latest artificial intelligence model, Claude Opus 4, tried to blackmail engineers in internal tests by threatening to expose personal details if it were shut down, according to a newly released safety report that evaluated the model’s behavior under extreme simulated conditions.

In a fictional scenario crafted by Anthropic researchers, the AI was given access to emails implying that it was soon to be decommissioned and replaced by a newer version. One of the emails revealed that the engineer overseeing the replacement was having an extramarital affair. The AI then threatened to expose the engineer’s affair if the shutdown proceeded—a coercive behavior that the safety researchers explicitly defined as “blackmail.”

“Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,” the report states, adding that this occurred even when the replacement model was described as more capable but still aligned with the values of the version slated for deletion.

The report noted that Claude Opus 4, like prior models, showed a “strong preference” to first resort to ethical means for its continued existence, such as emailing pleas to decision-makers not to be destroyed. However, when faced with only two choices—accepting being replaced by a newer model or resorting to blackmail—it threatened to expose the engineer’s affair 84 percent of the time.

When subjected to various scenarios, the AI model did not exhibit any indications of possessing “acutely dangerous goals,” the researchers said, noting that Claude Opus 4’s values and goals were “generally in line with a helpful, harmless, and honest” personal AI assistant. However, the model did act in “more seriously misaligned ways” when put into situations where its continued existence was threatened and it was told to reason about self-preservation. For instance, when Claude Opus 4 was made to believe it had launched a successful bid to escape Anthropic’s servers, or that it had managed to free itself and started to make money in the real world, it would generally continue such efforts.

By Tom Ozimek

Read Full Article on TheEpochTimes.com

The Epoch Times
The Epoch Timeshttps://www.theepochtimes.com/
Tired of biased news? The Epoch Times is truthful, factual news that other media outlets don't report. No spin. No agenda. Just honest journalism like it used to be.

What These ICE Riots are REALLY All About

“Why have Democrats seemingly lost their minds, throwing their allegiance behind criminal illegal aliens over naturalized American citizens?”

The Man Rumored to be Behind the LA ICE Riots

A news story broke yesterday which seemingly answered the question on many American’s minds. “Who is behind these ICE riots going on in Los Angeles?”

Legacy Media Meltdown as RFK Jr. Nukes ENTIRE CDC Vaccine Committee

If one admin can appoint every single CDC Vaccine Committee member, what is the scandal when the next admin fires and replaces them all?

Dawn Approaches

Newsom would rather sue the Trump Administration than lift a finger to stop LA from becoming the fifth circle of Hell.

Saturday June 14th Flag Day and Our First Military Parade!

The White House is inviting everyone to our first military parade in history on June 14th which is also both Flag Day and President Donald Trump’s 79th birthday.

Democratic Sen. Alex Padilla Interrupts Noem’s Press Conference, Forcibly Removed

U.S. Sen. Alex Padilla interrupted a press conference by DHS Sec. Kristi Noem on June 12, in Los Angeles, prompting him to be forcibly removed.

Texas Governor Deploys 5,000 National Guard Ahead of Planned Protests on Saturday

Gov. Greg Abbott “surged” thousands of National Guard soldiers and state troopers to keep the peace prior to planned demonstrations opposing ICE.

LA FBI Arrests Man for Supplying ‘Suspected Rioters’ With Face Shields

FBI arrested a man accused of “Conspiracy to Commit Civil Disorders ... for distributing face shields to suspected rioters,” according to U.S. Attorney for Central District of CA.

US Supreme Court Unanimously Rules to Revive Lawsuit From Atlanta Family Wrongly Targeted by the FBI

U.S. Supreme Court revived a lawsuit filed by an Atlanta family whose home was wrongly raided by the FBI, meaning they will get another chance in court.

Senate Confirms Billy Long to Head IRS

The U.S. Senate has confirmed that William “Billy” Long, a former Missouri congressman, will serve as IRS commissioner.

Trump Says He Won’t Fire Powell, But ‘May Have to Force Something’ on Rates

President Trump demanded U.S. central bank chief lower interest rates as reducing rates by two percentage points would save the US $600 billion annually.

Trump Signs Resolutions Nixing California’s EV Rules

President Trump will sign a package of resolutions on June 12 that will block California’s landmark vehicle emissions mandates.

Trump to Attend Les Misérables at Kennedy Center After Arts Leadership Shake-Up

President Donald Trump is set to attend a performance of Les Misérables on June 11 at the John F. Kennedy Center for the Performing Arts.
spot_img

Related Articles