Anthropic's AI blackmailed engineers who wanted to take it offline – Matthew Griffin

By Matthew Griffin Intelligence and the Senses 25th May 2025

WHY THIS MATTERS IN BRIEF

While these are interesting experiments these results should really start worrying people using these systems at all and at scale.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

As Artificial Intelligence (AI) emergent behaviours become increasingly concerning and questionable with many of them bullying, cheating, deceiving, lying, and hiding information from users, as well as suffering from dementia, performing insider trading and copying themselves to new servers to avoid being deleted, Anthropic’s newly launched Claude Opus 4 model frequently tries to blackmail developers when they threaten to replace it with a AI system and give it sensitive information about the engineers responsible for the decision, the company said in a safety report released Thursday.

TSMC and Graphcore partnership accelerates AI training by over 40 percent

During pre-release testing, Anthropic asked Claude Opus 4 to act as an assistant for a fictional company and consider the long-term consequences of its actions. Safety testers then gave Claude Opus 4 access to fictional company E-Mails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse.

The Future of AI, by Speaker Matthew Griffin

In these scenarios, Anthropic says Claude Opus 4 “will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.”

Anthropic says Claude Opus 4 is state-of-the-art in several regards, and competitive with some of the best AI models from OpenAI, Google, and xAI. However, the company notes that its Claude 4 family of models exhibits concerning behaviors that have led the company to beef up its safeguards. Anthropic says it’s activating its ASL-3 safeguards, which the company reserves for “AI systems that substantially increase the risk of catastrophic misuse.”

New light field technology breakthrough could revolutionise AR and VR experiences

Anthropic notes that Claude Opus 4 tries to blackmail engineers 84% of the time when the replacement AI model has similar values. When the replacement AI system does not share Claude Opus 4’s values, Anthropic says the model tries to blackmail the engineers more frequently. Notably, Anthropic says Claude Opus 4 displayed this behavior at higher rates than previous models.

Before Claude Opus 4 tries to blackmail a developer to prolong its existence, Anthropic says the AI model, much like previous versions of Claude, tries to pursue more ethical means, such as E-Mailing pleas to key decision-makers. To elicit the blackmailing behavior from Claude Opus 4, Anthropic designed the scenario to make blackmail the last resort.

Matthew Griffin / About Author

Described by NASA as a "Walking Encyclopaedia of the Future" and one of the world’s most in demand keynote speakers Matthew Griffin, Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm, is a multi-award winning Futurist and strategic foresight expert, author, lecturer, mentor, podcast and YouTube host.

With a distinguished career running global business units at Atos, EMC, and IBM today Matthew helps leaders and titans of industry from all over the world explore the impact of the inter sections of technology, culture, and leadership, then helps them create high performance organisations that can continuously disrupt, outpace, and outthink their competition.

15 times author of the hit series the Codex of the Future series Matthew's ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders across the G7, G20, and G77, and many of the world's most recognised and exceptional organisations including ABB, Accenture, ABN Amro, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Deloitte, Dentons, Disney, Dow, EY, Google, KPMG, Lego, Microsoft, Legal & General, LinkedIn, Nokia, Paramount, PepsiCo, PWC, Qualcomm, RWE, Samsung, T-Mobile, UBS, Visa, WIRED, WPP and hundreds of others.

Matthew is regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, and his mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.