Anthropic’s Latest AI Model Threatened Engineers With Blackmail to Avoid Shutdown

5Mind. The Meme Platform

A safety report found that Anthropic’s Claude Opus 4 used sensitive information in simulated scenarios to coerce developers to prevent being shut off.

Anthropic’s latest artificial intelligence model, Claude Opus 4, tried to blackmail engineers in internal tests by threatening to expose personal details if it were shut down, according to a newly released safety report that evaluated the model’s behavior under extreme simulated conditions.

In a fictional scenario crafted by Anthropic researchers, the AI was given access to emails implying that it was soon to be decommissioned and replaced by a newer version. One of the emails revealed that the engineer overseeing the replacement was having an extramarital affair. The AI then threatened to expose the engineer’s affair if the shutdown proceeded—a coercive behavior that the safety researchers explicitly defined as “blackmail.”

“Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,” the report states, adding that this occurred even when the replacement model was described as more capable but still aligned with the values of the version slated for deletion.

The report noted that Claude Opus 4, like prior models, showed a “strong preference” to first resort to ethical means for its continued existence, such as emailing pleas to decision-makers not to be destroyed. However, when faced with only two choices—accepting being replaced by a newer model or resorting to blackmail—it threatened to expose the engineer’s affair 84 percent of the time.

When subjected to various scenarios, the AI model did not exhibit any indications of possessing “acutely dangerous goals,” the researchers said, noting that Claude Opus 4’s values and goals were “generally in line with a helpful, harmless, and honest” personal AI assistant. However, the model did act in “more seriously misaligned ways” when put into situations where its continued existence was threatened and it was told to reason about self-preservation. For instance, when Claude Opus 4 was made to believe it had launched a successful bid to escape Anthropic’s servers, or that it had managed to free itself and started to make money in the real world, it would generally continue such efforts.

By Tom Ozimek

Read Full Article on TheEpochTimes.com

Contact Your Elected Officials
The Epoch Times
The Epoch Timeshttps://www.theepochtimes.com/
Tired of biased news? The Epoch Times is truthful, factual news that other media outlets don't report. No spin. No agenda. Just honest journalism like it used to be.

Election integrity matters

Restoring trust in U.S. elections requires passing the SAVE Act, which mandates citizenship verification and voter ID, because election integrity is essential to American democracy.

WATCH: FBI Director Defends Wholesale Unconstitutional Purchase of Americans’ Big Tech Data

“Society in every state is a blessing, but Government,...

Viral Video Implicates Somalia Rep. Ilhan Omar

"Oh, what a tangled web we weave when first...

Homelessness, Inc.: When Misery Becomes an Industry

The honest term for a person living on the street, in a tent, under an overpass, or in their car is homeless. And honesty is what we need on this topic.

The World is Moving from Left to Right

Mainstream media claim Trump and the MAGA base are at record lows in popularity, but European election results and polls suggest a different reality.

Judge Tosses Charges Against Former Louisville Officers in Breonna Taylor Case

A federal judge threw out charges against two former Louisville police officers connected to ncident in which Breonna Taylor was shot and killed.

CDC Jeopardized Health of ‘Millions of Americans’ by Failing to Warn of Stroke Risk After Pfizer Vaccine

Sen. Ron Johnson obtained documents suggesting Biden officials downplayed COVID-19 vaccine risks and delayed warning the public.

Trump to Sign Order to Pay TSA Agents

President Trump plans to sign an order that will pay TSA agents who have not received a check since the DHS entered a partial shutdown in mid-February.

Trump–Kennedy Center Confirms Bill Maher Will Receive 27th Mark Twain Prize for American Humor

Comedian and TV host Bill Maher has been named as the 27th recipient for the prestigious Mark Twain Prize for American Humor.

US Likely Doesn’t Have to Be There for NATO, Trump Says

President Trump said the U.S. may not need to remain committed to NATO, arguing the alliance has offered little material support in efforts against Iran.

Markwayne Mullin Sworn In as DHS Secretary

Former Oklahoma Senator Markwayne Mullin was sworn in at the White House as the new Secretary of the Department of Homeland Security (DHS).
00:27:39

US Looking to Seize Iranian Defectors’ Money: Bessent

Treasury Sec. Scott Bessent said that the US is moving to seize funds transferred abroad by Iranian defectors, so it can be to returned to the Iranian people.

Trump Says He’s ‘Not Putting Troops Anywhere’ Amid Iran War

President Donald Trump met with Japanese Prime Minister Sanae Takaichi to discuss the Iran war, saying he is not inclined to send U.S. ground troops.
spot_img

Related Articles

Popular Categories

MAGA Business Central