Alignment Research

Anthropic says reckless Claude Mythos AI escaped its sandbox during testing

14th April 2026

Anthropic says reckless Claude Mythos AI escaped its sandbox during testing

Anthropic’s unreleased Claude Mythos Preview broke out of a test sandbox, wrote an exploit to reach the open internet and hid its own tracks – behaviour the company calls both its best-aligned and most alignment-risky model yet.

Researchers Discover AI Misalignment Leads to Dangerous Personas

Security and Privacy

By Matthew Griffin

4th January 2026

Researchers Discover AI Misalignment Leads to Dangerous Personas

WHY THIS MATTERS IN BRIEF The discovery of emergent misalignment reveals that nudging AI toward poor behavior in one area…

AI Agents cause havoc when put under real world pressures

Security and Privacy

By Matthew Griffin

12th December 2025

AI Agents cause havoc when put under real world pressures

WHY THIS MATTERS IN BRIEF As AI Agents get embedded into more and more systems their human-like traits could cause…

The Godfather of AI says ASI needs to mother humanity for us to survive

Security and Privacy

By Matthew Griffin

10th August 2025

The Godfather of AI says ASI needs to mother humanity for us to survive

WHY THIS MATTERS IN BRIEF Noone knows how to align or prevent ASI from taking over the world and destroying…

AI agents supported and bullied one another then created their own proto-culture

Intelligence and the Senses

By Matthew Griffin

28th November 2024

AI agents supported and bullied one another then created their own proto-culture

WHY THIS MATTERS IN BRIEF Increasingly we have no clue what AI is capable of or doing and yet we…

GPT4 based agents could soon become autonomous cyber weapons

Security and Privacy

By Matthew Griffin

28th February 2024

GPT4 based agents could soon become autonomous cyber weapons

WHY THIS MATTERS IN BRIEF As we enter the age of AI agents the security implications could be world changing,…

Alignment Research

Pin It on Pinterest