Fully autonomous GPT-4 AI agents use Zero Days to successfully hack systems

By Matthew Griffin Security and Privacy 5th July 2024

WHY THIS MATTERS IN BRIEF

For over a decade I’ve been saying that AI will one day be capable of fully autonomously hacking (and patching) systems. Now we’re at the gates.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

Finally, after years of predicting that an autonomous Artificial Intelligence (AI) “System of Systems” could become an ultimate AI hacking machine researchers were able to successfully hack into more than half their test websites using autonomous – with autonomous being a key word here – teams of GPT-4 AI agents that were all able to co-ordinate their efforts and spawn new agents at will. And what made this even more impressive was the fact that this was using previously-unknown, real-world ‘Zero Day’ exploits.

This new quantum computing algorithm could crack the world's encryption faster than before

A couple of months ago, a team of researchers released a paper saying they’d been able to use GPT-4 to autonomously hack One Day (or N-day) vulnerabilities – these are security flaws that are already known, but for which a fix hasn’t yet been released. If given the Common Vulnerabilities and Exposures (CVE) list, GPT-4 was able to exploit 87% of critical-severity CVEs on its own.

The Future of Cyber, with Keynote Matthew Griffin

Skip forward to this week and the same group of researchers released a follow-up paper saying they’ve been able to hack Zero Day vulnerabilities – vulnerabilities that aren’t yet known – with a team of autonomous, self-replicating Large Language Model (LLM) agents using a Hierarchical Planning with Task-Specific Agents (HPTSA) method.

Instead of assigning a single LLM agent trying to solve many complex tasks, HPTSA uses a “planning agent” that oversees the entire process and launches multiple “subagents,” that are task-specific. Very much like a boss and his subordinates, the planning agent coordinates to the managing agent which delegates all efforts of each “expert subagent,” reducing the load of a single agent on a task it might struggle with.

Levi announce plans to supplement human models with digital ones

It’s a technique similar to AutoGPT and what Cognition Labs uses with its autonomous Devin AI software development team – it plans a job out, figures out what kinds of workers it’ll need, then project-manages the job to completion while spawning its own specialist ’employees’ to handle tasks as needed.

When benchmarked against 15 real-world web-focused vulnerabilities, HPTSA has shown to be 550% more efficient than a single LLM in exploiting vulnerabilities and was able to hack 8 of 15 zero-day vulnerabilities. The solo LLM effort was able to hack only 3 of the 15 vulnerabilities.

Blackhat or whitehat? There is legitimate concern that these models will allow users to maliciously attack websites and networks at will, speed, and massive scale. Daniel Kang – one of the researchers and the author of the white paper – noted specifically that in chatbot mode, GPT-4 is “insufficient for understanding LLM capabilities” and is unable to hack anything on its own. That’s good news, at least.

GE Aerospace builds out Mach 5 wind tunnels to test weapons and aircraft

When I asked ChatGPT if it could exploit Zero Days for me it replied “No, I am not capable of exploiting zero-day vulnerabilities. My purpose is to provide information and assistance within ethical and legal boundaries,” and suggested that I consult a cybersecurity professional instead. However, while ChatGPT might have some morals there are plenty of other AI LLM’s out there that don’t such as DarkBERT and FraudBERT, as well as open source LLMs like Llama and others … all of which means it’s when criminals start using this method to attack people not when.

Source: Cornell University arxiv

Matthew Griffin / About Author

Matthew Griffin is a multi-award winning Futurist and expert in Disruption and Innovation, Geopolitics, Leadership, and Technology, who NASA have described as a "walking encyclopaedia of the future" and a "futurist Polymath." 15-time best selling author of the "Codex of the Future" series, Matthew is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working with royal households, world leaders, G7, G20, and G77 governments, NGOs, and multi-national mid and mega cap firms to help them explore, shape, and lead the next 50 years of business and society.

An award-winning YouTube creator with over a million followers, with an unrivalled global reach and impact, Matthew is a highly sought-after international keynote speaker, lecturer, and mentor who collaborates with global leaders through the United Nations Alliance of Civilizations (UNAOC) and United Nations General Assembly (UNGA) to shape pivotal initiatives such as the UN’s AI for Humanity program, the United Nations Conference of the Parties (UN COP), and the World Economic Forum in Davos.

As the former Global Head of Cloud, National Security, and Enterprise Sales for companies including Atos, Dell-EMC, and IBM, Matthew has a proven track record of building multi-billion dollar business units and turning failing divisions into market leaders. His ability to identify, analyse, and communicate the implications of hundreds of emerging technologies and trends is unparalleled, and his insights are trusted by many of the world’s most respected organisations, including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi, Coca-Cola, Dentons, Deloitte, Dow Jones, EY, Google, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, Siemens AG and Siemens Energy, T-Mobile, UBS, VISA, Walmart, Workday, Worldpay and many others.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.