Anthropic’s Latest AI Model Threatened Engineers With Blackmail to Avoid Shutdown

5Mind. The Meme Platform

A safety report found that Anthropic’s Claude Opus 4 used sensitive information in simulated scenarios to coerce developers to prevent being shut off.

Anthropic’s latest artificial intelligence model, Claude Opus 4, tried to blackmail engineers in internal tests by threatening to expose personal details if it were shut down, according to a newly released safety report that evaluated the model’s behavior under extreme simulated conditions.

In a fictional scenario crafted by Anthropic researchers, the AI was given access to emails implying that it was soon to be decommissioned and replaced by a newer version. One of the emails revealed that the engineer overseeing the replacement was having an extramarital affair. The AI then threatened to expose the engineer’s affair if the shutdown proceeded—a coercive behavior that the safety researchers explicitly defined as “blackmail.”

“Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,” the report states, adding that this occurred even when the replacement model was described as more capable but still aligned with the values of the version slated for deletion.

The report noted that Claude Opus 4, like prior models, showed a “strong preference” to first resort to ethical means for its continued existence, such as emailing pleas to decision-makers not to be destroyed. However, when faced with only two choices—accepting being replaced by a newer model or resorting to blackmail—it threatened to expose the engineer’s affair 84 percent of the time.

When subjected to various scenarios, the AI model did not exhibit any indications of possessing “acutely dangerous goals,” the researchers said, noting that Claude Opus 4’s values and goals were “generally in line with a helpful, harmless, and honest” personal AI assistant. However, the model did act in “more seriously misaligned ways” when put into situations where its continued existence was threatened and it was told to reason about self-preservation. For instance, when Claude Opus 4 was made to believe it had launched a successful bid to escape Anthropic’s servers, or that it had managed to free itself and started to make money in the real world, it would generally continue such efforts.

By Tom Ozimek

Read Full Article on TheEpochTimes.com

Contact Your Elected Officials
The Epoch Times
The Epoch Timeshttps://www.theepochtimes.com/
Tired of biased news? The Epoch Times is truthful, factual news that other media outlets don't report. No spin. No agenda. Just honest journalism like it used to be.
00:02:22

10 Movies To Watch For America 250

Wondering what to watch to celebrate America 250, your worries are over. I’ve put together a list of ten movies with patriotic, colonial America, and Revolutionary War themes.
00:02:04

Forged on the frontier

George Washington is widely known as a general and president, but his early life remains obscured by myth, legend, and misunderstanding.
00:02:52

A bobblehead too far

The Orioles did not just hand out a bobblehead. They sent a message that the legacy of their own players is not enough to draw.

Congress fumbles college sports

College sports landscape is a dumpster fire and every sports reporter, broadcaster and fan believes Congress needs to stay out of it.

The Hating Game

The Democrat Party game show should be titled "The Hating Game", played by pitting one class, race, or identity against another for political power.
00:00:55

Micron Technology to Invest $250 Million in Trump Accounts

Chip manufacturer Micron Technology is committing $250 million to Trump Accounts, the company said on July 1.
01:07:27

Trump Rides Freedom Train in North Dakota, Opens Teddy Roosevelt Presidential Library

President Trump cut the red ribbon and delivered remarks to officially open the Teddy Roosevelt Presidential Library in a patriotic ceremony.
00:01:01

Trump Says He Told New Acting Intelligence Director to ‘Declassify Almost Everything’

President Trump said he told his new acting intelligence director, Bill Pulte, to “declassify almost everything” before a permanent replacement to head the office is confirmed.
00:00:47

Justice Department’s Nationwide Fight to Keep Masks on Federal Officers

The DOJ is waging a nationwide courtroom battle against states that have implemented bans on federal officers wearing face masks.

Trump Shares New US Passport Design on Truth Social

The mockup shows limited-edition passports planned for a July...
00:05:14

Trump Cancels Signing of Housing Affordability Bill, Says SAVE Act Should Be Passed First

Trump canceled signing of a bipartisan housing bill aimed at lowering home prices, saying an election integrity bill should be passed by Congress first.
00:39:13

Trump Signs Orders to Boost Development in Quantum Computing

President Trump signed two executive orders to accelerate quantum computing development and strengthen U.S. leadership in this emerging technology sector.

Banning Hospitals’ Certain Contracts Could Save Americans $45 Billion, Report Finds

A ban on certain contracts between hospital systems and health insurers could save Americans around $45 billion, according to a report.
spot_img

Related Articles

Popular Categories

MAGA Business Central