Google's new Gemini AI beats OpenAI's GPT-4 and humans at 57 subjects

0 13

By Matthew Griffin Intelligence and the Senses 12th December 2023

WHY THIS MATTERS IN BRIEF

We are seeing rapid improvements in AI capability, may of which are now starting to outperform human experts in numerous fields.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

Google has unveiled its awesome next-gen Gemini Artificial Intelligence (AI), claiming it outperforms OpenAI’s GPT-4 – as well as human experts – on nearly all major tests. It understands images, video and audio as well as text and code, and will gain other senses over time.

The internet pioneer behind Apache pushes blockchain to fuel the next big internet revolution

With a score of 90.0% on the MMLU (Massive Multitask Language Understanding) test, it’s the first model to outperform human experts (89.8%), as well as GPT-4 (86.4%) in a range of knowledge and problem solving tasks across a range of 57 subjects including math, physics, history, law, medicine and ethics. That’s experts, not the average human.

The Future of AI, by Keynote Matthew Griffin

Gemini is multimodal from the ground up – meaning that its original training data set contained a ton of other media in addition to text. Thus, you could say it’s as fluent in visual and auditory “understanding” as it is with text. Where other language models have tended to “think” in textual terms when looking at video and images, Gemini retains all the tone and nuance of the original video, audio and image sources.

Google's revolutionary new "Codeless AI" lets non programmers build their own AI's

While the video below is a slick product demo, and thus should be taken with a large grain of salt, it’s worth watching to give you a sense of what this multimodality really means.

What’s the upshot here? Well, AIs are being trained with wider and wider sensory datasets, to mimic the processes by which humans learn to interact with the world. With next-level visual and auditory understanding, Gemini’s perception and reasoning take a step forward. Once this thing lands in Google devices – beginning with the next Pixel phones – it’ll be able to help with all sorts of daily tasks.

World's biggest meat producer begins rolling out robot butchers

And as Google Deepmind CEO Demis Hassabis told Wired, this will soon extend into the next logical sensory realm: touch and tactile feedback. Google is already a major player in AI robotics with their Everyday robotics projects, but embedding a super-knowledgeable model like Gemini with the ability to understand the world through touch will take robotics – humanoid and otherwise – into uncharted territory.

Multimodality is far from the only banner feature here, but as with GPT-4, Gemini is such an anything machine that it’s hard to know where to start. Perhaps with the contributions it could make to science? In the video below, Deepmind scientists demonstrate how Gemini is able to generate its own code to read and interpret 200,000 scientific studies, filtering them for relevance using its own reasoning capabilities, and then collate data and effectively create new meta-knowledge. The team says it did this all over their lunch break, and that it’ll be relevant to other fields like law in which huge datasets need to be examined.

Speaking of coding, Gemini is fluent in Python, Java, C++ and Go programming. Indeed, Google is already showing off how it can create websites that dynamically code themselves as you use them, in response to what you seem to want from them. This feels like a whole new approach to the internet; you go to a single page that grows into what you need as soon as it figures out what that is.

The demo video here uses a pretty lightweight use case: planning a kid’s birthday party. But you can see the extraordinary power it encapsulates, and imagine how it might create graphical user interfaces – a kind of what I’ll call here a Generative User Interface – for nearly any task you could imagine. This is the sort of thing only AI can do; it’s like having a web app programmer sitting right next to you, but capable of working hundreds of times faster to create and adapt the UI’s you’re using in real time according to your actions and needs.

Researchers created self-assembling proteins that can record the memories of cells

And as with any AI tool, it’s super interactive; if it’s not giving you exactly what you want, you can just tell it, and it’ll adjust itself to fit your desires, or engage in a conversation about the best way to proceed. Stunning stuff, and a glimpse into how our interactions with technology are fundamentally shifting.

On the topic of coding, Deepmind has done some other interesting work with Gemini in a project called AlphaCode 2, which takes several different Gemini models and trains them specifically in different parts of the programming process.

In essence, AlphaCode 2 creates a swarm of programming agents, and gets them to generate up to a million different chunks of code to solve a problem. It then uses a separate Gemini model to examine these code samples, check if they compile, and rank them on how well they do their portion of the overall coding work, discarding around 95% of the samples created.

The size of a grain of rice, researchers unveil the world's smallest computers

Then, another Gemini model develops a code-testing regime and sample test data, and runs a thorough testing process on all the remaining code samples, ranking them on “correctness,” to find the top pieces of code. Effectively, Deepmind has split Gemini into a multifunctional software team, with specialist AIs working on requirements analysis, system design, testing, deployment and maintenance as well as a giant army of coders.

How does it perform? Well, in a coding competition against humans, it beat 87% of other entrants, ranking it “just between the ‘Expert’ and ‘Candidate Master’ categories on Codeforces.”

As Deepmind scientists explain in the video below, these kind of contests require a ton more than just coding skills – they require extraordinary degrees of rational understanding and creative use of the available software tools.

Mind you, AlphaCode 2 isn’t going to be available to the public immediately, or indeed ever in its current form. Generating a million code snippets, as you might imagine, burns a ton of computing power and is way too expensive for general release. But what’s interesting here is that the success rate doesn’t appear to have tapered off at a million snippets – indeed, it seems that AlphaCode would continue to improve its results if it went well into the billions, or trillions. That’s an incredibly inefficient way to do things, but with the blinding speed of progress in this area, a smarter way is sure to come along very soon.

OpenAI upgrades ChatGPT with persistent memory and temporary chat

Deepmind says it’s looking at how a streamlined version can be brought into the public models.

And there’s more, a ton more. But this should give you a sense of what Google is promising here. Google is planning to release it in three model sizes: Gemini Nano, built for installation right on board mobile devices, Gemini Pro – a rough equivalent of GPT 3.5, which will be the main workhorse model for most tasks, and Gemini Ultra, the largest model, which Google says beats GPT-4 handily across a broad swathe of benchmark tests – gapping it even more substantially on multimodal testing than on text-based challenges.

Gemini Ultra is scheduled for public launch next year, once it’s been more thoroughly vetted for safety and alignment issues. That’s when we’ll start getting a proper sense for where it outshines GPT and where it’s just not up to snuff. Gemini Nano, meanwhile, is already available on the Pixel 8 Pro smartphone, and will begin rolling out on others.

An American grad student just admitted using an advanced AI to write his term papers

Gemini Pro, though, is available right now, for free, to anyone with a Google account through the Google Bard service. It’s a slimmed-down version, unfortunately, with only the ability to upload images rather than documents, audio or video, but Google says it’ll gain new capabilities soon. It’s already got access, with your permission, to operate on your Gmail, Google Drive and Google Docs, as well as flight and hotel bookings, Google Maps, and YouTube, where it allows you to interact and ask questions about videos.

And yep, Google is working to integrate the Gemini model into pretty much every product it makes. Buckle up, y’all, this roller coaster only knows how to accelerate.

Matthew Griffin / About Author

Matthew Griffin, multi-award winning Futurist and named Futurist of the Year 2024, has been described as a "Walking encyclopaedia of the future" by NASA and a futurist polymath. One of the world's most renowned futurists and strategic foresight experts Matthew is the 15 times author of the blockbuster "Codex of the Future" series, and is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working across the next 50 years, XPotential University, the world's first free futures and foresight university, and the World Futures Forum which works with the United Nations to solve the worlds greatest challenges. Matthew is an in demand international keynote, acclaimed university lecturer and mentor, and host of the hit Fanatical Futurist podcast.

A rare talent in his past Matthew helped build and run several multi-billion dollar business units for Atos, Dell-EMC, and IBM, and his ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders, G7, G20, and G77+ governments, and many of the world's most respected brands including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Dentons, Deloitte, Disney, Dow, EY, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, T-Mobile, UBS, VISA, and many others. He was also the only futurist invited to talk at the UN COP28 held in Dubai alongside world leaders.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.