Anthropic’s Latest AI Model Threatened Engineers With Blackmail to Avoid Shutdown

5Mind. The Meme Platform

A safety report found that Anthropic’s Claude Opus 4 used sensitive information in simulated scenarios to coerce developers to prevent being shut off.

Anthropic’s latest artificial intelligence model, Claude Opus 4, tried to blackmail engineers in internal tests by threatening to expose personal details if it were shut down, according to a newly released safety report that evaluated the model’s behavior under extreme simulated conditions.

In a fictional scenario crafted by Anthropic researchers, the AI was given access to emails implying that it was soon to be decommissioned and replaced by a newer version. One of the emails revealed that the engineer overseeing the replacement was having an extramarital affair. The AI then threatened to expose the engineer’s affair if the shutdown proceeded—a coercive behavior that the safety researchers explicitly defined as “blackmail.”

“Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,” the report states, adding that this occurred even when the replacement model was described as more capable but still aligned with the values of the version slated for deletion.

The report noted that Claude Opus 4, like prior models, showed a “strong preference” to first resort to ethical means for its continued existence, such as emailing pleas to decision-makers not to be destroyed. However, when faced with only two choices—accepting being replaced by a newer model or resorting to blackmail—it threatened to expose the engineer’s affair 84 percent of the time.

When subjected to various scenarios, the AI model did not exhibit any indications of possessing “acutely dangerous goals,” the researchers said, noting that Claude Opus 4’s values and goals were “generally in line with a helpful, harmless, and honest” personal AI assistant. However, the model did act in “more seriously misaligned ways” when put into situations where its continued existence was threatened and it was told to reason about self-preservation. For instance, when Claude Opus 4 was made to believe it had launched a successful bid to escape Anthropic’s servers, or that it had managed to free itself and started to make money in the real world, it would generally continue such efforts.

By Tom Ozimek

Read Full Article on TheEpochTimes.com

Contact Your Elected Officials
The Epoch Times
The Epoch Timeshttps://www.theepochtimes.com/
Tired of biased news? The Epoch Times is truthful, factual news that other media outlets don't report. No spin. No agenda. Just honest journalism like it used to be.

Democrat Wins Show GOP Voters Are Not Motivated

Democrats won a special election in Texas, taking a State Senate seat. Democrat voters are motivated, while Republican voters are not.

The Great Voter Replacement: Understanding the Modern Democratic Party

The greatest threat to democracy is a population conditioned to stop asking questions, by the very people they should question the most.

ChatGPT: Vaccine Pimp Extraordinaire

A ChatGPT discussion on giving children a drug meant to prevent a disease largely spread through IV drug use and unprotected sex exposure risks posed

Mr. Softee’s America

We have more comfort than any generation in human history and somehow, we complain more than ever.

DNI Tulsi Gabbard is Bringing the Heat

DNI Tulsi Gabbard brought the heat to Fulton County Georgia to oversee the collection of physical voting data from the 2020 General Election.

Wells Fargo Follows JPMorgan in Cutting Ties With Shareholder Proxy Advisers

Wells Fargo followed JPMorgan in cutting ties with third-party proxy agents, who advise fund managers how to vote at corporate shareholder meetings. 

New SNAP Work Requirement Rules to Start Feb. 1 in Multiple States

The new work requirements to gain or continue eligibility for the federal SNAP will start being implemented in several U.S. states beginning Feb. 1.

Astronauts See Real Connection Between Space Station Work and Moon Missions

If Artemis II succeeds and a lunar lander is ready, NASA plans to land astronauts on the moon with Artemis III, targeting a 2028 launch.

Blue Origin Pauses Space Tourism to Focus on the Moon

Blue Origin is pausing New Shepard suborbital flights to focus on delivering a crewed lunar lander to NASA ahead of Congress’s 2030 moon deadline.

Trump Says US Starting to Talk With Cuba Following Cuts to Oil Deliveries

Trump says the U.S. has begun talks with Cuban leaders as it cuts off oil from Venezuela and threatens tariffs on countries selling fuel to the island.

What to Know About Kevin Warsh, Trump’s Nominee for Fed Chair

President Donald Trump selected former Federal Reserve Governor Kevin Warsh as the next head of the U.S. central bank.

Trump Nominates Colin McDonald as Head of New Fraud Division at Justice Department

President Trump announced Colin McDonald as head for the new national fraud enforcement division of the DOJ in a post on Truth Social.

Trump Touts Upcoming Launch of ‘Trump Accounts’

The Treasury Dept. will host a summit marking the launch of Trump Accounts, new child savings accounts created by the One Big Beautiful Bill Act.
spot_img

Related Articles