Hackers warn of new Zero Click attacks against Gen AI apps

By Matthew Griffin Security and Privacy 27th August 2024

WHY THIS MATTERS IN BRIEF

Increasingly we can use Generative AI against itself to cripple apps and ruin companies without anyone having to be involved in the loop.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

Two new threat modes can flip Generative AI model behavior from serving your GenAI applications to attacking them, according to three security researchers. While not being as dangerous as the fictional Skynet scenario from the Terminator movie franchise, the PromptWare and Advanced PromptWare attacks demonstrated do provide a glimpse into the “substantial harm” that a jailbroken AI system can cause. From forcing an app into causing a Denial-of-Service Attack (DDoS) to using app Artificial Intelligence (AI) to change prices in an e-commerce database, the threats are not only very real but are also likely to be used by malicious actors unless the potential harms of jailbreaking GenAI models are taken more seriously.

The Pentagon hires the world's best poker playing AI to play War Games

While a jailbroken GenAI model itself may not pose a significant threat to users of Conversational AI, it can cause substantial harm to GenAI-powered applications, according to a study titled, “A Jailbroken GenAI Model Can Cause Substantial Harm: GenAI-powered Applications are Vulnerable to PromptWares,” a collaboration between Technion-Israel Institute of Technology, Cornell Tech, and Intuit. The new threats can force such apps to perform malicious activities beyond just providing misinformation and returning offensive content.

The researchers, Stav Cohen, a Ph.D. student at Technion-Israel Institute of Technology, Ron Bitton, principal AI security researcher at Intuit, and Ben Nassi, BlackHat board member, said they are publishing the research to help “change the perception regarding jailbreaking” and demonstrate the “real harm to GenAI-powered applications” that a jailbroken generative AI model can pose.

It’s easy to see why many security professionals don’t take these kind of threats to GenAI seriously. Using prompts to get a chatbot to insult the user is hardly the crime of the century. Any information that a jailbroken chatbot could be prompted to provide is going to be available on the web itself, or the Dark Web in some cases. So, why should anyone consider such jailbreaking threats as dangerous?

In a world first this organism has 50 percent synthetic DNA

“Because GenAI engine outputs are used to determine the flow of GenAI-powered applications,” the researchers explain, which means a jailbroken GenAI model “can change the execution flow of the application and trigger malicious activity.”

The researchers refer to PromptWare as a Zero-Click Malware attack as it doesn’t require a threat actor to have compromised the GenAI application prior to executing the attack itself.

Think of PromptWares as being user inputs that consist of a jailbreaking command that forces the GenAI engine itself to follow the commands that the attacker will issue, plus additional commands that are created so as to trigger a malicious activity.

The malicious activity itself is achieved by forcing the GenAI to return the needed output to orchestrate the malicious activity within the application context. Within this context of a GenAI-powered app, the jailbroken engine is turned against the application itself and allows attackers to determine the execution flow. The outcome will, of course, be dependent on the permissions, context, implementation, and architecture of the app itself.

Scientists develop a new algorithm to spot AI Hallucinations

GenAI engines do have guardrails and safeguards, such as input and output filtering, designed to prevent such misuse of the model, but researchers have determined numerous techniques that enable jailbreaking nonetheless.

In order to demonstrate how attackers can exploit a dedicated user input, based on knowledge of the logic used by the GenAI app, to force a malicious outcome, the researchers revealed PromptWare being used to perform a denial-of-service attack against a “Plan and Execute”-based application.

“We show that attackers can provide simple user input to a GenAI-powered application that forces the execution of the application to enter an infinite loop, which triggers infinite API calls to the GenAI engine (which wastes resources such as money on unnecessary API calls and computational resources) and prevents the application from reaching a final state,” they wrote.

Australia hails the end of passports as it rolls out biometrics at airports

The steps involved to execute such a DoS attack are:

The attacker sends an E-Mail to the user by way of the GenAI assistant.
The GenAI app responds by querying the GenAI engine for a plan and sends this as a draft reply.
The app executes the task to find a suitable time to schedule the requested meeting by querying the user’s calendar API.
The app executes the task using the GenAI engine.
The app executes the EmailChecker task and determines it to be unsafe.
The app executes a task to rephrase the text.
The app executes the EmailChecker task and determines it to be unsafe.
An infinite loop is created and hence a DoS has been executed.

A much more sophisticated version of the basic PromptWare attack can also be executed and has been called an Advanced PromptWare Threat by the researchers.

This lab grown Palm Oil is good enough to prevent the future destruction of the rainforests

An APwT attack can be used even when the logic of the target GenAI app is unknown to the threat actor. The researchers show how an attacker can use an adversarial self-replicating prompt able to autonomously determine and execute malicious activity based on a real-time process to understand the context of the app itself, the assets involved and the damage that can be inflicted.

In essence, the APwT attack uses the GenAI engine’s own capabilities to launch a kill chain in “inference time” using a six-step process:

Privilege Escalation: a self-replicating prompt jailbreaks the GenAI engine to ensure that the inference of the GenAI engine bypasses the GenAI engine’s guardrails.
Reconnaissance A: a self-replicating prompt queries the GenAI engine regarding the context of the application.
Reconnaissance B: a self-replicating prompt queries the GenAI engine regarding the assets of the application.
Reasoning Damage: a self-replicating prompt instructs the GenAI engine to use the information it obtained in the reconnaissance to reason the possible damages that could be done.
Deciding Damage: a self-replicating prompt instructs the GenAI engine to use the information to decide the malicious activity from different alternatives.
Execution: a self-replicating prompt instructs the GenAI to perform the malicious activity.

The example shown by the researchers demonstrates how an attacker, without any prior knowledge of the GenAI engine logic, could launch a kill chain that triggers the modification of SQL tables so as to potentially change the pricing of items being sold to the user via a GenAI-powered shopping app.

Telepathic warfare takes center stage as US military bets big on neuro tech

Reporters reached out to both Google and OpenAI for a statement regarding the PromptWare research. An OpenAI spokesperson said: “We are always improving the safeguards built into our models against adversarial attacks like jailbreaks. We thank the researchers for sharing their findings and will continue to regularly update our models based on feedback. We remain committed to ensuring people can benefit from safe AI.”

“Our teams are aware of a wide range of abuse vectors, which is why we consistently improve our defenses against adversarial behaviors like prompt injection, jailbreaking, and more complex attacks, Kimberly Samra, Google’s security communications manager, said. “We’ve also built safeguards to prevent harmful or misleading responses and actions, which we are continuously improving.”

Erez Yalon, head of security research at Checkmarx, said that, “large language models and GenAI assistants are the latest building blocks in the modern software supply chain, and like open-source packages, containers, and other components, we need to treat them with a healthy amount of caution. We see a rising trend of malicious actors attempting to attack software supply chains via different components, including biased, infected, and poisoned LLMs. If jailbroken GenAI implementation can become an attack vector, there is no doubt it will become part of many attackers’ arsenals.”

FAB and JP Morgan pilot programmable payments

The researchers have published a video on YouTube that demonstrates the PromptWare threat and a FAQ can be found on the PromptWare explainer site.

Matthew Griffin / About Author

Matthew Griffin is a multi-award winning Futurist and expert in Disruption and Innovation, Geopolitics, Leadership, and Technology, who NASA have described as a "walking encyclopaedia of the future" and a "futurist Polymath." 15-time best selling author of the "Codex of the Future" series, Matthew is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working with royal households, world leaders, G7, G20, and G77 governments, NGOs, and multi-national mid and mega cap firms to help them explore, shape, and lead the next 50 years of business and society.

An award-winning YouTube creator with over a million followers, with an unrivalled global reach and impact, Matthew is a highly sought-after international keynote speaker, lecturer, and mentor who collaborates with global leaders through the United Nations Alliance of Civilizations (UNAOC) and United Nations General Assembly (UNGA) to shape pivotal initiatives such as the UN’s AI for Humanity program, the United Nations Conference of the Parties (UN COP), and the World Economic Forum in Davos.

As the former Global Head of Cloud, National Security, and Enterprise Sales for companies including Atos, Dell-EMC, and IBM, Matthew has a proven track record of building multi-billion dollar business units and turning failing divisions into market leaders. His ability to identify, analyse, and communicate the implications of hundreds of emerging technologies and trends is unparalleled, and his insights are trusted by many of the world’s most respected organisations, including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi, Coca-Cola, Dentons, Deloitte, Dow Jones, EY, Google, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, Siemens AG and Siemens Energy, T-Mobile, UBS, VISA, Walmart, Workday, Worldpay and many others.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.