Skip to main content Scroll Top

Leading AI Models Now Surpass Average People in Linguistic Creativity

WHY THIS MATTERS IN BRIEF

AI now exceeds average human creativity in linguistic tests, signaling a paradigm shift in content generation and cognitive labor.

 

Matthew Griffin is the World’s #1 Futurist Keynote Speaker and Global Advisor for the G7 and Fortune 500, specializing in exponential disruption across 100 countries. Book a Keynote or Advisory SessionJoin 1M+ followers on YouTube and explore his 15-book Codex of the Future series.

 


 

Creativity is a trait that Artificial Intelligence (AI) critics say is likely to remain the preserve of humans for the foreseeable future – even after a study a couple of years ago suggested that AI was increasingly becoming more creative, especially in writing, than many humans. But a large-scale study finds that leading generative language models can now exceed the average human performance on linguistic creativity tests.

The question of whether machines can be creative has gained new salience in recent years thanks to the rise of AI tools that can generate text and images with both fluency and style. While many experts say true creativity is impossible without lived experience of the world – which is one reason why Meta is doubling down on embodied AI for robots – the increasingly sophisticated outputs of these models challenge that idea.

 

RELATED
Andrew Ng has started a new AI company: DeepLearning.ai

 

In an effort to take a more objective look at the issue, researchers at the Université de Montréal, including AI pioneer Yoshua Bengio, conducted what they say is the largest ever comparative evaluation of machine and human creativity to date. The team compared outputs from leading AI models against responses from 100,000 human participants using a standardized psychological test for creativity and found that the best models now outperform the average human, though they still trail top performers by a significant margin.

 

The Future of AI and Education | Education Keynote Speaker Matthew Griffin

 

“This result may be surprising – even unsettling – but our study also highlights an equally important observation: even the best AI systems still fall short of the levels reached by the most creative humans,” Karim Jerbi, who led the study, said in a press release.

The test at the heart of the study, published in Scientific Reports, is known as the Divergent Association Task and involves participants generating 10 words with meanings as distinct from one another as possible. The higher the average semantic distance between the words, the higher the score.

Performance on this test in humans correlates with other well-established creativity tests that focus on idea generation, writing, and creative problem solving. But crucially, it is also quick to complete, which allowed the researchers to test a much larger cohort of humans over the internet.

 

RELATED
Amazon saved 4,500 years of work and $260 Million using Gen AI Robo-Coders

 

What they found was striking. OpenAI’s GPT-4, Google’s Gemini Pro 1.5 and Meta’s Llama 3 and Llama 4, all outperformed the average human. However, when they measured the average performance of the top 50 percent of human participants, it exceeded all tested models. The gap widened further when they took the average of the top 25 percent and top 10 percent of humans.

The researchers wanted to see if these scores would translate to more complex creative tasks, so they also got the models to generate haikus, movie plot synopses, and flash fiction. They analyzed the outputs using a measure called Divergent Semantic Integration, which estimates the diversity of ideas integrated into a narrative. While the models did relatively well, the team found that human-written samples were still significantly more creative than AI-written ones.

However, the team also discovered they could boost the AI’s creativity with some simple tweaks. The first involved adjusting a model setting called temperature, which controls the randomness of the model’s output. When this was turned all the way up on GPT-4, the model exceeded the creativity scores of 72 percent of human participants.

The researchers also found that carefully tuning the prompt given to the model helped too. When explicitly instructed to use “a strategy that relies on varying etymology,” both GPT-3.5 and GPT-4 did better than when given the original, less-specific task prompt.

 

RELATED
AI Agents cause havoc when put under real world pressures

 

For creative professionals, Jerbi says the persistent gap between top human performers and even the most advanced models should provide some reassurance. But he also thinks the results suggest  people should take these models seriously as potential creative collaborators.

“Generative AI has above all become an extremely powerful tool in the service of human creativity,” he says. “It will not replace creators, but profoundly transform how they imagine, explore, and create – for those who choose to use it.”

Either way, the study adds to a growing body of research that is raising uncomfortable questions about what it means to be creative and whether it is a uniquely human trait. Given the strength of feeling around the issue, the study is unlikely to settle the matter, but the findings do mark one of the more concrete attempts to measure the question objectively.

 


 

How does the creative performance of leading AI models compare to human participants in recent large-scale studies? While GPT-4, Gemini Pro 1.5, and Llama 4 outperform the average human in divergent association tasks, they still trail significantly behind the top 10 percent of creative humans and struggle with complex narrative creativity.

Related Posts

Leave a comment

Pin It on Pinterest

Share This