Anthropic’s Latest AI Model Threatened Engineers With Blackmail to Avoid Shutdown

Contact Your Elected Officials

A safety report found that Anthropic’s Claude Opus 4 used sensitive information in simulated scenarios to coerce developers to prevent being shut off.

Anthropic’s latest artificial intelligence model, Claude Opus 4, tried to blackmail engineers in internal tests by threatening to expose personal details if it were shut down, according to a newly released safety report that evaluated the model’s behavior under extreme simulated conditions.

In a fictional scenario crafted by Anthropic researchers, the AI was given access to emails implying that it was soon to be decommissioned and replaced by a newer version. One of the emails revealed that the engineer overseeing the replacement was having an extramarital affair. The AI then threatened to expose the engineer’s affair if the shutdown proceeded—a coercive behavior that the safety researchers explicitly defined as “blackmail.”

“Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,” the report states, adding that this occurred even when the replacement model was described as more capable but still aligned with the values of the version slated for deletion.

The report noted that Claude Opus 4, like prior models, showed a “strong preference” to first resort to ethical means for its continued existence, such as emailing pleas to decision-makers not to be destroyed. However, when faced with only two choices—accepting being replaced by a newer model or resorting to blackmail—it threatened to expose the engineer’s affair 84 percent of the time.

When subjected to various scenarios, the AI model did not exhibit any indications of possessing “acutely dangerous goals,” the researchers said, noting that Claude Opus 4’s values and goals were “generally in line with a helpful, harmless, and honest” personal AI assistant. However, the model did act in “more seriously misaligned ways” when put into situations where its continued existence was threatened and it was told to reason about self-preservation. For instance, when Claude Opus 4 was made to believe it had launched a successful bid to escape Anthropic’s servers, or that it had managed to free itself and started to make money in the real world, it would generally continue such efforts.

By Tom Ozimek

Read Full Article on TheEpochTimes.com

The Epoch Times
The Epoch Timeshttps://www.theepochtimes.com/
Tired of biased news? The Epoch Times is truthful, factual news that other media outlets don't report. No spin. No agenda. Just honest journalism like it used to be.

Off the radar

In the longstanding and brutal ledger of religious persecution, Nigeria now occupies its own grim chapter with its enduring pogrom against Christians.

If Federal Food Assistance Programs End, Theft Begins

Many patriotic Americans well versed in American History have...

Tucker EN FUEGO: Taking Stock of MAGA, Ten Years on

Draining of the Swamp has yet to come anything close to fruition, as the creatures that populate said Swamp yet cling bitterly to power.

Tylenol’s Headache

When President Trump, RFK Jr., and Dr. Mehmet Oz called for pregnant women to avoid using acetaminophen, Tylenol got the headache.

American Restaurants Are Going Broke

U.S. restaurant bankruptcies surged to 22% in 2020 amid COVID-19 and stayed elevated at 14% last year due to Biden’s “Build Back Better” policy.

Kamala Harris Teases New White House Run: ‘I Am Not Done’

Kamala Harris hinted in a BBC interview that she may run for president again in 2028, saying she hasn’t ruled out another White House bid.

TCM Classic Tour at Warner Bros. Studio: Finding Stardust in Today’s Hollywood

Warner Bros. launched the TCM Classic Films Tour in April 2024, offering visitors a nostalgic journey through the studio’s historic movie legacy.

NBA Player Explains Why He Refused to Kneel

Orlando Magic’s Jonathan Isaac drew attention in 2020 for standing during the anthem and refusing the COVID-19 vaccine, citing his Christian faith.

Trump Says No Plans to Name White House Ballroom After Himself

President Trump dismissed reports on Oct. 24 suggesting that he was planning to name the upcoming $300 million White House ballroom after himself.

Trump Rolls Back Emissions Rules on Copper Smelters

President Trump issued a proclamation aimed at reversing a Biden-era environmental rule that enforced stricter air emission standards on copper smelters.

Donor Gives $130 Million to Cover Shortfall in Troop Pay During Shutdown

Trump announced on Oct. 23 that an anonymous donor sent $130M to cover military pay during the ongoing government shutdown.

‘Frustration’ With Canada Led to Trump Scrapping Talks, Not Just Ontario’s Ad: US Official

President Trump cited Ontario’s TV ad as the reason for halting Canada trade talks, but officials say it stems from rising U.S. frustration with Ottawa.

Ontario to Pause Anti-Tariff Ad After Trump Terminates Trade Talks With Canada

Ontario pauses its TV ad campaign after Premier Doug Ford’s talk with PM Mark Carney, following backlash that halted U.S.-Canada trade talks.
spot_img

Related Articles

Popular Categories

MAGA Business Central