Research shows GPT4 becomes 30 percent better when it critiques itself

0 0

By Matthew Griffin Intelligence and the Senses 9th April 2023

WHY THIS MATTERS IN BRIEF

By getting AI to critique itself researchers have not only given it a vital human skill but found new ways to improve AI’s without having to recode or redevelop them.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

Even if the unlikely six-month moratorium on Artificial Intelligence (AI) development asked for by Elon Musk and the Life Foundation goes ahead, it seems GPT4 has the capability for huge leaps forward if it just takes a good hard look at itself after researchers asked it to critique its own work and saw a 30% performance boost.

NASA gets ready to trial SEXTANT, its galactic navigation technology

“It’s not everyday that humans develop novel techniques to achieve state-of-the-art standards using decision-making processes once thought to be unique to human intelligence,” wrote researchers Noah Shinn and Ashwin Gopinath. “But, that’s exactly what we did.”

The Future of AI, by keynote Matthew Griffin

The “Reflexion” technique takes GPT4’s already impressive ability to perform various tests, and introduces “a framework that allows AI agents to emulate human-like self-reflection and evaluate its performance.” Effectively, it introduces extra steps in which GPT4 designs tests to critique its own answers, looking for errors and missteps, then rewrites its solutions based on what it’s found.

The team used its technique against a few different performance tests. In the HumanEval test, which consists of 164 Python programming problems the model has never seen, GPT4 scored a record 67%, but with the Reflexion technique, its score jumped to a very impressive 88%.

New breakthrough gives AI a human memory

In the Alfworld test, which challenges an AI’s ability to make decisions and solve multi-step tasks by executing several different allowable actions in a variety of interactive environments, the Reflexion technique boosted GPT4’s performance from around 73% to a near-perfect 97%, failing on only 4 out of 134 tasks.

In another test called HotPotQA, the language model was given access to Wikipedia, and then given 100 out of a possible 13,000 question/answer pairs that “challenge agents to parse content and reason over several supporting documents.” In this test, GPT4 scored just 34% accuracy, but GPT4 with Reflexion managed to do significantly better with 54%.

More and more often, the solution to AI problems appears to be more AI. In some ways, this feels a little like a Generative Adversarial Network (GAN), in which two AIs hone each other’s skills, one trying to generate images, for example, that can’t be distinguished from “real” images, and the other trying to tell the fake ones from the real ones. But in this case, GPT is both the writer and the editor, working to improve its own output.

Google has taught its DeepMind AI to dream

The paper is available at Arxiv.

Source: Nano Thoughts via AI Explained

Matthew Griffin / About Author

Matthew Griffin, multi-award winning Futurist and named Futurist of the Year 2024, has been described as a "Walking encyclopaedia of the future" by NASA and a futurist polymath. One of the world's most renowned futurists and strategic foresight experts Matthew is the 15 times author of the blockbuster "Codex of the Future" series, and is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working across the next 50 years, XPotential University, the world's first free futures and foresight university, and the World Futures Forum which works with the United Nations to solve the worlds greatest challenges. Matthew is an in demand international keynote, acclaimed university lecturer and mentor, and host of the hit Fanatical Futurist podcast.

A rare talent in his past Matthew helped build and run several multi-billion dollar business units for Atos, Dell-EMC, and IBM, and his ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders, G7, G20, and G77+ governments, and many of the world's most respected brands including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Dentons, Deloitte, Disney, Dow, EY, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, T-Mobile, UBS, VISA, and many others. He was also the only futurist invited to talk at the UN COP28 held in Dubai alongside world leaders.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.