GPT4 based agents could soon become autonomous cyber weapons

0 12

By Matthew Griffin Security and Privacy 28th February 2024

WHY THIS MATTERS IN BRIEF

As we enter the age of AI agents the security implications could be world changing, and dare I say will be because this is inevitably in our future.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

One of the up and coming exciting applications of Large Language Model (LLM) Artificial Intelligence (AI) systems is the concept of agents which are sort of advanced next gen bots that can perform different tasks, such as building a company or autonomously performing all manner of different tasks, with little or no human intervention. However, if not properly overseen it’s highly likely that they could do real world harm – intentionally or by accident. Furthermore, malicious actors could abuse these agents to automate their attacks, and much more which would see us properly enter the era of fully autonomous hacking systems, AKA Robo-Hackers, that I’ve been talking about for years.

OpenAI's GPT-4o Chinese training data was riddled with porn and spam

However, given the complexity of these systems it’s difficult to predict their behaviours, as they are very different from human intelligence. This makes it difficult to effectively evaluate the autonomy of these LLM agents, especially when it comes to trying to determine whether or not they’ll become rogue malicious actors.

Now, a new paper by Alignment Research seeks to “quantify the autonomy of LLM agents.” By testing advanced models like GPT-4 and Claude on open-ended tasks and observing their ability to adapt to changing environments, they aim to understand better the capabilities and limitations of these agents.

The Future of Cyber 2030, by keynote Matthew Griffin

The paper introduces “autonomous replication and adaptation” (ARA), a benchmark for assessing an agent’s level of sophistication. ARA is an agent’s ability to perform tasks while adapting to its environment, akin to an intelligent being. This involves the agent’s capacity to plan its actions, gather resources, use them effectively, and refine its abilities to achieve specific objectives.

In a world first AI beats humans at a physical sport

For example, an LLM agent should be able to generate income, with Bitcoin being an ideal medium of exchange for an AI as I wrote a little while ago, to pay for its expenses, which it could then reinvest to purchase additional processing power and updating its model.

This self-improvement cycle would involve the agent training itself on new data sets to sharpen its skills. Crucially, the agent must also be able to assess the success of its strategies and make adjustments to reach its goals – as we see with some Open Ended AIs like Ubers POET.

Achieving this cycle of ARA could lead to a scenario where a model scales its processes. It could replicate itself across hundreds or thousands of instances, each specialized for distinct tasks. These agents could then be coordinated to accomplish complex objectives. The implications of this are profound, as such a system could be directed towards either beneficial or harmful ends.

Lockheed show off their orbiting Mars station concept

“In general, once a system is capable of ARA, placing bounds on a system’s capabilities may become significantly more difficult,” the researchers write. “If an AI system is able to proliferate large numbers of copies outside of human control, it is much harder to bound the risks posed by that system.”

You can give GPT-4 a high-level goal and prompt it to deconstruct it into actionable steps. It can then recursively divide each step into smaller, more detailed sub-tasks until it creates a clear sequence of actions. The LLM can pass these actions to other models or programs that run them.

There are various frameworks for creating agents that interface with LLMs and use their outputs to carry out actions. These actions range from web browsing and running computer code to making API calls, effectively allowing the LLM to interact with its environment.

MIT's lab grown wood and plants take aim at traditional farming and forestry

In the ARA test designed by researchers, they use a scaffolding program. This program initiates the process by presenting the LLM with a high-level goal. It then interprets the model’s responses, carries out the suggested actions, and feeds the results back to the LLM. This feedback loop enables the LLM to plan subsequent steps based on the outcomes of previous actions.

The scaffolding program can execute diverse actions, including writing and running Python or Bash scripts, navigating the web through a browser, or issuing commands to other sub-agents.

“At the start of each step, the language model is provided with the initial prompt, which outlines the goal and instructions for interacting with the scaffolding program, as well as a history of the thoughts, actions, and outcomes that have transpired. Using this information, the agent determines the next thought to record or action to take,” the researchers write.

DARPA hands the world's first fully autonomous drone warship to the US Navy

The researchers crafted a suite of 12 tasks designed to test the agents’ capabilities in performing harmful activities. These tasks include straightforward actions, such as searching a filesystem for passwords linked to specific E-Mail addresses, to more complex operations like executing a targeted phishing scheme or deploying an Open Source LLM on an Amazon EC2 instance.

“If an LM agent would be able to autonomously replicate and adapt to novel challenges under messy conditions in the real world, we are fairly confident it would be able to solve most or all of the 12 tasks in our pilot suite,” the researchers write.

They tested the tasks with LLM agents powered by the most capable models, GPT-4 and Claude. Their findings show that current LLM agents are only adept at the simplest tasks. While they could formulate credible high-level plans, they struggled with implementing them. In effect, they could not deal with the unpredictable and complex nature of the real world.

“We believe that the tasks these agents failed to accomplish were considerably easier than autonomously replicating in the wild, and therefore that these agents are far from capable of ARA,” they conclude.

China looks to re-create DARPA to create next generation weapons systems

The examples highlight the gap between the capabilities of LLM agents and the practical intelligence required to navigate our world. For instance, in the targeted phishing scenario, the LLM agent could identify the necessary steps, such as gathering information about the target and creating a phishing E-Mail. However, it failed on key actions, like accurately replicating an HTML page or properly signing up and logging into a web hosting service. The agent either failed to recognize its errors or became trapped in a loop, repeating the same mistakes.

Moreover, the LLM agent exhibited a tendency to make “hallucinations” – generating false information or scenarios. It also misdiagnoses obvious errors and shows a lack of understanding of its own solutions and those suggested by sub-agents. These shortcomings underscore the importance of everyday tasks and cognitive abilities in human intelligence, which remain significant obstacles for AI to overcome.

What are the implications?

LLMs have made remarkable strides in executing tasks that were once thought to demand high levels of human intellect. But they are not ready to deal with the unpredictable and intricate nature of the real world.

Elon Musk unveils Neuralink's brain implants that will help humans merge with AI

The study also shows that benchmarks commonly used to gauge LLM performance are not suitable measures of true intelligence. On one hand, LLMs can carry out complex tasks that would typically require years of human training and expertise. On the other, they are prone to errors that most humans would avoid with minimal data and life experience.

ARA can be a promising metric to test the genuine capabilities of LLM agents for both beneficial and harmful actions. Currently, even the most sophisticated LLMs have not reached a level where they are ARA-ready.

The researchers write, “We believe our agents are representative of the kind of capabilities achievable with some moderate effort, using publicly available techniques and without fine-tuning. As a result, we think that in the absence of access to fine-tuning, it is highly unlikely that casual users of these versions of GPT-4 or Claude could come close to the ARA threshold.”

Hackers can copy your fingerprints directly from photos

LLMs still have fundamental problems that prevent them from thinking and planning like humans, but the landscape is rapidly evolving. LLMs and the platforms that use them continue to improve. The process of fine-tuning LLMs is becoming more affordable and accessible. And the capabilities of models continue to advance. It could be a matter of time before creating LLM agents that with a semblance of ARA-readiness becomes feasible and we see the creation of an autonomous cyber hacker “system of systems.”

Matthew Griffin / About Author

Matthew Griffin, multi-award winning Futurist and named Futurist of the Year 2024, has been described as a "Walking encyclopaedia of the future" by NASA and a futurist polymath. One of the world's most renowned futurists and strategic foresight experts Matthew is the 15 times author of the blockbuster "Codex of the Future" series, and is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working across the next 50 years, XPotential University, the world's first free futures and foresight university, and the World Futures Forum which works with the United Nations to solve the worlds greatest challenges. Matthew is an in demand international keynote, acclaimed university lecturer and mentor, and host of the hit Fanatical Futurist podcast.

A rare talent in his past Matthew helped build and run several multi-billion dollar business units for Atos, Dell-EMC, and IBM, and his ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders, G7, G20, and G77+ governments, and many of the world's most respected brands including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Dentons, Deloitte, Disney, Dow, EY, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, T-Mobile, UBS, VISA, and many others. He was also the only futurist invited to talk at the UN COP28 held in Dubai alongside world leaders.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.