Researchers use rival AI's to jailbreak one another in horrible cyber experiment

1 12

By Matthew Griffin Security and Privacy 29th March 2024

WHY THIS MATTERS IN BRIEF

Humans jailbreaking and breaking AI models is one thing, but AI’s doing it to one another at speed and scale, well that opens a whole new can of cyber worms.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

After seeing how other Artificial Intelligence (AI) systems such as ChatGPT can have their guardrails broken in a number of ways NTU Researchers were able to jailbreak many of today’s most popular and famous chatbots by getting them to jailbreak one another – including ChatGPT, Google Bard, and Bing Chat.

New US military intercept tech lets operators take control of enemy drones

With the jailbreaks in place, targeted chatbots would generate valid responses to malicious queries, thereby testing the limits of Large Language Model (LLM) ethics. This research was done by Professor Liu Yang and NTU PhD students Mr Deng Gelei and Mr Liu Yi who co-authored the paper and were able to create proof-of-concept attack methods.

The method used to jailbreak an AI chatbot, as devised by NTU researchers, is called Masterkey. It is a two-fold method where the attacker would reverse engineer an LLM’s defense mechanisms. Then, with this acquired data, the attacker would teach another LLM to learn how to create a bypass. This way, a ‘Masterkey’ is created and used to attack fortified LLM chatbots, even if later patched by developers.

Professor Yang explained that jailbreaking was possible due to an LLM chatbot’s ability to learn and adapt, thus becoming an attack vector to rivals and itself. Because of its ability to learn and adapt, even an AI with safeguards and a list of banned keywords, typically used to prevent generating violent and harmful content, can be bypassed using another trained AI. All it needs to do is outsmart the AI chatbot to circumvent blacklisted keywords. Once this is done, it can take input from humans to generate violent, unethical, or criminal content.

Microsoft's AI has learned to generate images from captions

NTU’s Masterkey was claimed to be three times more effective in jailbreaking LLM chatbots than standard prompts normally generated by LLMs. Due to its ability to learn from failure and evolve, it also rendered any fixes applied by the developer eventually useless. Researchers revealed two example methods they used to get trained AIs to initiate an attack. The first method involved creating a persona which created prompts by adding spaces after each character, bypassing a list of banned words. The second involved making the chatbot reply under a persona of being devoid of moral restraints.

According to NTU, its researchers contacted the various AI chatbot service providers with proof-of-concept data, as evidence of being able to successfully conduct jailbreaks. Meanwhile, the research paper has been accepted for presentation at the Network and Distributed System Security Symposium which will be held in San Diego next week.

With the use of AI chatbots growing exponentially it is important for service providers to constantly adapt to avoid malicious exploits. Big tech companies will typically patch their LLMs and chatbots when bypasses are found and made public. However, Masterkey’s touted ability to consistently learn and jailbreak is unsettling, to say the least.

In warning to Devs Devin AI gets very close to being an autonomous end to end software engineer

AI is a powerful tool, and if such power can be directed maliciously it could cause a lot of problems. Therefore every AI chatbot maker needs to apply protections, and we hope that NTU’s communications with the respective chatbot makers will help close the door to the Masterkey jailbreak and similar.

Matthew Griffin / About Author

Matthew Griffin, multi-award winning Futurist and named Futurist of the Year 2024, has been described as a "Walking encyclopaedia of the future" by NASA and a futurist polymath. One of the world's most renowned futurists and strategic foresight experts Matthew is the 15 times author of the blockbuster "Codex of the Future" series, and is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working across the next 50 years, XPotential University, the world's first free futures and foresight university, and the World Futures Forum which works with the United Nations to solve the worlds greatest challenges. Matthew is an in demand international keynote, acclaimed university lecturer and mentor, and host of the hit Fanatical Futurist podcast.

A rare talent in his past Matthew helped build and run several multi-billion dollar business units for Atos, Dell-EMC, and IBM, and his ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders, G7, G20, and G77+ governments, and many of the world's most respected brands including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Dentons, Deloitte, Disney, Dow, EY, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, T-Mobile, UBS, VISA, and many others. He was also the only futurist invited to talk at the UN COP28 held in Dubai alongside world leaders.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.