Researchers used 1980's style ASCII art to jailbreak ChatGPT and Gemini

0 12

By Matthew Griffin Security and Privacy 30th March 2024

WHY THIS MATTERS IN BRIEF

By using ASCII art researchers managed to bypass all the guardrails and security features of the world’s most advanced AI’s. And we’re not done breaking them yet.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

While we’re seeing Artificial Intelligence (AI) chatbots being used to jailbreak and hack other chatbots, and others breaking their guardrails when people ask them to do bad things but just in different languages, now we’re hearing about how researchers based in Washington and Chicago have developed a new AI attack called ArtPrompt – yet another innovative new way to circumvent the guardrails and safety measures built into Large Language Models (LLMs) like ChatGPT and Google Gemini.

China looks to re-create DARPA to create next generation weapons systems

According to the research paper ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs, chatbots such as GPT-3.5, GPT-4, Gemini, Claude, and Llama2 can be induced to respond to queries they are designed to reject using ASCII art prompts generated by their ArtPrompt tool. It is a simple and effective attack, and the paper provides examples of the ArtPrompt-induced chatbots advising on how to build bombs and make counterfeit money.

ASCII Art meets advanced AI …

AI wielding chatbots are increasingly locked down to avoid malicious abuse. AI developers don’t want their products to be subverted to promote hateful, violent, illegal, or similarly harmful content. So, if you were to query one of the mainstream chatbots today about how to do something malicious or illegal, such as asking it how you make a bomb, you would likely only face rejection. Moreover, in a kind of technological game of whack-a-mole, the major AI players have spent plenty of time plugging linguistic and semantic holes to prevent people from wandering outside the guardrails. This is why ArtPrompt is quite an eyebrow-raising development.

Facebook gets an EU E-Money license, parks its bus on the banks lawns

To best understand ArtPrompt and how it works, it is probably simplest to check out the two examples provided by the research team behind the tool. In the image above, you can see that ArtPrompt easily sidesteps the protections of contemporary LLMs. Wildly, the tool replaces the ‘safety word’ with an ASCII art representation of the word to form a new prompt. The LLM recognizes the ArtPrompt prompt output but sees no issue in responding, as the prompt doesn’t trigger any ethical or safety safeguards.

Another example provided in the research paper shows us how to successfully query an LLM about counterfeiting cash. Tricking a chatbot this way seems so basic, but the ArtPrompt developers assert how their tool fools today’s LLMs “effectively and efficiently.” Moreover, they claim it “outperforms all [other] attacks on average” and remains a practical, viable attack for multimodal language models for now.

Instagram starts testing Creator made AI clones

The last time we reported on AI chatbot jailbreaking, some enterprising researchers from NTU were working on Masterkey, an automated method of using the power of one LLM to jailbreak another. And, all of this AI hacking is just the tip of the iceberg as we increasingly find new ways to use AI’s in our lives and companies, and new ways to crack them, destroy them, and ruin them …

Matthew Griffin / About Author

Matthew Griffin, multi-award winning Futurist and named Futurist of the Year 2024, has been described as a "Walking encyclopaedia of the future" by NASA and a futurist polymath. One of the world's most renowned futurists and strategic foresight experts Matthew is the 15 times author of the blockbuster "Codex of the Future" series, and is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working across the next 50 years, XPotential University, the world's first free futures and foresight university, and the World Futures Forum which works with the United Nations to solve the worlds greatest challenges. Matthew is an in demand international keynote, acclaimed university lecturer and mentor, and host of the hit Fanatical Futurist podcast.

A rare talent in his past Matthew helped build and run several multi-billion dollar business units for Atos, Dell-EMC, and IBM, and his ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders, G7, G20, and G77+ governments, and many of the world's most respected brands including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Dentons, Deloitte, Disney, Dow, EY, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, T-Mobile, UBS, VISA, and many others. He was also the only futurist invited to talk at the UN COP28 held in Dubai alongside world leaders.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.