Scroll Top

Researchers use rival AI’s to jailbreak one another in horrible cyber experiment

WHY THIS MATTERS IN BRIEF

Humans jailbreaking and breaking AI models is one thing, but AI’s doing it to one another at speed and scale, well that opens a whole new can of cyber worms.

 

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trendsconnect, watch a keynote, or browse my blog.

After seeing how other Artificial Intelligence (AI) systems such as ChatGPT can have their guardrails broken in a number of ways NTU Researchers were able to jailbreak many of today’s most popular and famous chatbots by getting them to jailbreak one another – including ChatGPT, Google Bard, and Bing Chat.

 

RELATED
World's best AI robo-hacker takes on Botnets

 

With the jailbreaks in place, targeted chatbots would generate valid responses to malicious queries, thereby testing the limits of Large Language Model (LLM) ethics. This research was done by Professor Liu Yang and NTU PhD students Mr Deng Gelei and Mr Liu Yi who co-authored the paper and were able to create proof-of-concept attack methods.

The method used to jailbreak an AI chatbot, as devised by NTU researchers, is called Masterkey. It is a two-fold method where the attacker would reverse engineer an LLM’s defense mechanisms. Then, with this acquired data, the attacker would teach another LLM to learn how to create a bypass. This way, a ‘Masterkey’ is created and used to attack fortified LLM chatbots, even if later patched by developers.

Professor Yang explained that jailbreaking was possible due to an LLM chatbot’s ability to learn and adapt, thus becoming an attack vector to rivals and itself. Because of its ability to learn and adapt, even an AI with safeguards and a list of banned keywords, typically used to prevent generating violent and harmful content, can be bypassed using another trained AI. All it needs to do is outsmart the AI chatbot to circumvent blacklisted keywords. Once this is done, it can take input from humans to generate violent, unethical, or criminal content.

 

RELATED
TSMC says it expects to produce 1nm trillion transistor chips by 2030

 

NTU’s Masterkey was claimed to be three times more effective in jailbreaking LLM chatbots than standard prompts normally generated by LLMs. Due to its ability to learn from failure and evolve, it also rendered any fixes applied by the developer eventually useless. Researchers revealed two example methods they used to get trained AIs to initiate an attack. The first method involved creating a persona which created prompts by adding spaces after each character, bypassing a list of banned words. The second involved making the chatbot reply under a persona of being devoid of moral restraints.

According to NTU, its researchers contacted the various AI chatbot service providers with proof-of-concept data, as evidence of being able to successfully conduct jailbreaks. Meanwhile, the research paper has been accepted for presentation at the Network and Distributed System Security Symposium which will be held in San Diego next week.

With the use of AI chatbots growing exponentially it is important for service providers to constantly adapt to avoid malicious exploits. Big tech companies will typically patch their LLMs and chatbots when bypasses are found and made public. However, Masterkey’s touted ability to consistently learn and jailbreak is unsettling, to say the least.

 

RELATED
Nvidia chief says everyone will soon be a programmer

 

AI is a powerful tool, and if such power can be directed maliciously it could cause a lot of problems. Therefore every AI chatbot maker needs to apply protections, and we hope that NTU’s communications with the respective chatbot makers will help close the door to the Masterkey jailbreak and similar.

Related Posts

Comments (1)

Leave a comment

You have Successfully Subscribed!

Pin It on Pinterest

Share This