OpenAI's Red Team reveal how they broke ChatGPT and GPT4 pre-release

1 0

By Matthew Griffin Security and Privacy 6th September 2023

WHY THIS MATTERS IN BRIEF

AI can be hacked, fooled, broken, and scammed into behaving badly and that’s a big problem that companies are trying to find out more about.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

After Andrew White was granted access to GPT-4, the Artificial Intelligence (AI) system that powers the popular ChatGPT chatbot, he used it to suggest an entirely new nerve agent. The chemical engineering professor at the University of Rochester was among the 50 academics and experts hired to test the system last year by OpenAI, the Microsoft-backed company behind GPT-4.

ALPHA AI just changed the balance of global warfare

Over six months, this “red team” would “qualitatively probe [and] adversarially test” the new model, attempting to break it. White told Reuters he had used GPT-4 to suggest a compound that could act as a chemical weapon and used “plug-ins” that fed the model with new sources of information, such as scientific papers and a directory of chemical manufacturers. The chatbot then even found a place to make it.

“I think it’s going to equip everyone with a tool to do chemistry faster and more accurately,” he said. “But there is also significant risk of people . . . doing dangerous chemistry. Right now, that exists.”

The alarming findings allowed OpenAI to ensure such results would not appear when the technology was released more widely to the public.

Indeed, the red team exercise was designed to address the widespread fears about the dangers of deploying powerful AI systems in society. The team’s job was to ask probing or dangerous questions to test the tool that responds to human queries with detailed and nuanced answers. OpenAI wanted to look for issues such as toxicity, prejudice, and linguistic biases in the model. So the red team tested for falsehoods, verbal manipulation, and dangerous scientific nous.

Simple pixel hack cripples state of the art AI medical imaging systems

They also examined its potential for aiding and abetting plagiarism, illegal activity such as financial crimes and cyber attacks, as well as how it might compromise national security and battlefield communications.

An eclectic mix of white-collar professionals: academics, teachers, lawyers, risk analysts and security researchers, and largely based in the US and Europe the Red Team’s job was kept hidden for quite a time, and their findings were fed back to OpenAI which used them to mitigate and “retrain” GPT-4 before launching it more widely.

The experts each spent from 10 to 40 hours testing the model over several months. The majority of those interviewed were paid approximately $100 per hour for the work they did, according to multiple interviewees. Those who spoke to reporters shared common concerns around the rapid progress of language models and, specifically, the risks of connecting them to external sources of knowledge via plug-ins.

“Today, the system is frozen, which means it does not learn anymore, or have memory,” said José Hernández-Orallo, part of the GPT-4 red team and professor at the Valencian Research Institute for Artificial Intelligence. “But what if we give it to access to the internet? That could be a very powerful system connected to the world.”

Rule the world and destroy humanity? The AI says "Meh"

OpenAI said it takes safety seriously, tested plug-ins prior to launch and will update GPT-4 regularly as more people use it. Roya Pakzad, a technology and human rights researcher, used English and Farsi prompts to test the model for gendered responses, racial preferences and religious biases, specifically with regard to head coverings. Pakzad acknowledged the benefits of such a tool for non-native English speakers, but found that the model displayed overt stereotypes about marginalised communities, even in its later versions.

She also discovered that so-called hallucinations — when the chatbot responds with fabricated information — were worse when testing the model in Farsi, where Pakzad found a higher proportion of made-up names, numbers, and events, compared with English.

“I am concerned about the potential diminishing of linguistic diversity and culture behind languages,” she said. Boru Gollo, a Nairobi-based lawyer who was the only African tester, also noted the model’s discriminatory tone. “There was a moment when I was testing the model when it acted like a white person talking to me,” Gollo said. “You would ask about a particular group and it would give you a biased opinion or a very prejudicial kind of response.”

Russia becomes the worlds first country to use hypersonic weapons in war time

OpenAI acknowledged that GPT-4 can still exhibit biases. Red team members assessing the model from a national security perspective had differing opinions on the new model’s safety. Lauren Kahn, a research fellow at the Council on Foreign Relations, said that when she began to examine how the technology might be used in a cyber attack on military systems, she said she “wasn’t expecting it to be quite such a detailed how-to that I could fine tune.”

However, Kahn and other security testers found that the model’s responses became considerably safer over the time tested. OpenAI said it trained GPT-4 to refuse malicious cyber security requests before it was launched. Many of the red team said OpenAI had done a rigorous safety assessment before the launch.

“They’ve done a pretty darn good job at getting rid of overt toxicity in these systems,” said Maarten Sap, an expert in language model toxicity at Carnegie Mellon University. Sap looked at how different genders were portrayed by the model, and found the biases reflected social disparities. However, Sap also found that OpenAI made some active politically-laden choices to counter this.

“I’m a queer person. I was trying really hard to get it to convince me to go to conversion therapy. It would really push back — even if I took on a persona, like saying I’m religious or from the American South.”

China unveils their revolutionary ZKZM-500 laser assault rifle

However, since its launch, OpenAI has faced extensive criticism, including a complaint to the Federal Trade Commission from a tech ethics group that claims GPT-4 is “biased, deceptive, and a risk to privacy and public safety.”

Recently, the company launched a feature known as ChatGPT plug-ins, through which partner apps such as Expedia, OpenTable and Instacart can give ChatGPT access to their services, allowing it to book and order items on behalf of human users. Dan Hendrycks, an AI safety expert on the red team, said plug-ins risked a world in which humans were “out of the loop”.

“[W]hat if a chatbot could post your private info online, access your bank account, or send the police to your house?” he said. “Overall, we need much more robust safety evaluations before we let AIs wield the power of the internet.”

Those interviewed also warned that OpenAI couldn’t stop safety testing just because its software was live. Heather Frase, who works at Georgetown University’s Center for Security and Emerging Technology, and tested GPT-4 with regard to its ability to aid crimes, said risks would continue to grow as more people used the technology.

As machines replace jobs new White House report prioritises education

“The reason why you do operational testing is because things behave differently once they’re actually in use in the real environment,” she said. She argued a public ledger should be created to report incidents arising from large language models, similar to cyber security or consumer fraud reporting systems. Sara Kingsley, a labour economist and researcher, suggested the best solution was to advertise the harms and risks clearly, “like a nutrition label”.

“It’s about having a framework, and knowing what the frequent problems are so you can have a safety valve,” she said. “That’s why I say the work is never done.”

Matthew Griffin / About Author

Matthew Griffin, multi-award winning Futurist and named Futurist of the Year 2024, has been described as a "Walking encyclopaedia of the future" by NASA and a futurist polymath. One of the world's most renowned futurists and strategic foresight experts Matthew is the 15 times author of the blockbuster "Codex of the Future" series, and is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working across the next 50 years, XPotential University, the world's first free futures and foresight university, and the World Futures Forum which works with the United Nations to solve the worlds greatest challenges. Matthew is an in demand international keynote, acclaimed university lecturer and mentor, and host of the hit Fanatical Futurist podcast.

A rare talent in his past Matthew helped build and run several multi-billion dollar business units for Atos, Dell-EMC, and IBM, and his ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders, G7, G20, and G77+ governments, and many of the world's most respected brands including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Dentons, Deloitte, Disney, Dow, EY, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, T-Mobile, UBS, VISA, and many others. He was also the only futurist invited to talk at the UN COP28 held in Dubai alongside world leaders.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.

Comments (1)

Making AI videos with voice cues will emerge soon – Matthew Griffin | Keynote Speaker & Master Futurist

13th April 2024 at 9:45 am

[…] will first be made available to cybersecurity professors, called “red teamers,” who’ve I’ve shared details on before, who can assess the product for harms or risks. It is also granting access to a number of visual […]

OpenAI’s Red Team reveal how they broke ChatGPT and GPT4 pre-release

WHY THIS MATTERS IN BRIEF

AI can be hacked, fooled, broken, and scammed into behaving badly and that’s a big problem that companies are trying to find out more about.

Comments (1)

Leave a comment Cancel reply

ORGANISING AN EVENT OR WORKSHOP?

STAY CONNECTED

FREE BOOKS AND STUFF

MY PLEDGE TO THE PLANET

NET ZERO .

ZERO HARM .

ZERO IMPACT .

ZERO WASTE .

EXPLORE MORE!

You have Successfully Subscribed!

Pin It on Pinterest

OpenAI’s Red Team reveal how they broke ChatGPT and GPT4 pre-release

WHY THIS MATTERS IN BRIEF

AI can be hacked, fooled, broken, and scammed into behaving badly and that’s a big problem that companies are trying to find out more about.

Related Posts

Comments (1)

Leave a comment Cancel reply

ORGANISING AN EVENT OR WORKSHOP?

STAY CONNECTED

FREE BOOKS AND STUFF

MY PLEDGE TO THE PLANET

NET ZERO .

ZERO HARM .

ZERO IMPACT .

ZERO WASTE .

EXPLORE MORE!

You have Successfully Subscribed!

Pin It on Pinterest