Scroll Top

Google’s new Gemini AI beats OpenAI’s GPT-4 and humans at 57 subjects

WHY THIS MATTERS IN BRIEF

We are seeing rapid improvements in AI capability, may of which are now starting to outperform human experts in numerous fields.

 

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trendsconnect, watch a keynote, or browse my blog.

Google has unveiled its awesome next-gen Gemini Artificial Intelligence (AI), claiming it outperforms OpenAI’s GPT-4 – as well as human experts – on nearly all major tests. It understands images, video and audio as well as text and code, and will gain other senses over time.

 

RELATED
Meet Lia, Soul Machine's scarily lifelike digital human that reacts to your emotions

 

With a score of 90.0% on the MMLU (Massive Multitask Language Understanding) test, it’s the first model to outperform human experts (89.8%), as well as GPT-4 (86.4%) in a range of knowledge and problem solving tasks across a range of 57 subjects including math, physics, history, law, medicine and ethics. That’s experts, not the average human.

 

The Future of AI, by Keynote Matthew Griffin

 

Gemini is multimodal from the ground up – meaning that its original training data set contained a ton of other media in addition to text. Thus, you could say it’s as fluent in visual and auditory “understanding” as it is with text. Where other language models have tended to “think” in textual terms when looking at video and images, Gemini retains all the tone and nuance of the original video, audio and image sources.

 

RELATED
AI's "Screams of the damned" is the future of music

 

While the video below is a slick product demo, and thus should be taken with a large grain of salt, it’s worth watching to give you a sense of what this multimodality really means.

 

 

What’s the upshot here? Well, AIs are being trained with wider and wider sensory datasets, to mimic the processes by which humans learn to interact with the world. With next-level visual and auditory understanding, Gemini’s perception and reasoning take a step forward. Once this thing lands in Google devices – beginning with the next Pixel phones – it’ll be able to help with all sorts of daily tasks.

 

RELATED
Bin your headphones, Noveto streams sound straight into your ears

 

And as Google Deepmind CEO Demis Hassabis told Wired, this will soon extend into the next logical sensory realm: touch and tactile feedback. Google is already a major player in AI robotics with their Everyday robotics projects, but embedding a super-knowledgeable model like Gemini with the ability to understand the world through touch will take robotics – humanoid and otherwise – into uncharted territory.

Multimodality is far from the only banner feature here, but as with GPT-4, Gemini is such an anything machine that it’s hard to know where to start. Perhaps with the contributions it could make to science? In the video below, Deepmind scientists demonstrate how Gemini is able to generate its own code to read and interpret 200,000 scientific studies, filtering them for relevance using its own reasoning capabilities, and then collate data and effectively create new meta-knowledge. The team says it did this all over their lunch break, and that it’ll be relevant to other fields like law in which huge datasets need to be examined.

 

 

Speaking of coding, Gemini is fluent in Python, Java, C++ and Go programming. Indeed, Google is already showing off how it can create websites that dynamically code themselves as you use them, in response to what you seem to want from them. This feels like a whole new approach to the internet; you go to a single page that grows into what you need as soon as it figures out what that is.

The demo video here uses a pretty lightweight use case: planning a kid’s birthday party. But you can see the extraordinary power it encapsulates, and imagine how it might create graphical user interfaces – a kind of what I’ll call here a Generative User Interface – for nearly any task you could imagine. This is the sort of thing only AI can do; it’s like having a web app programmer sitting right next to you, but capable of working hundreds of times faster to create and adapt the UI’s you’re using in real time according to your actions and needs.

 

RELATED
Microsoft launches ChatGPT powered Copilot to the masses

 

And as with any AI tool, it’s super interactive; if it’s not giving you exactly what you want, you can just tell it, and it’ll adjust itself to fit your desires, or engage in a conversation about the best way to proceed. Stunning stuff, and a glimpse into how our interactions with technology are fundamentally shifting.

On the topic of coding, Deepmind has done some other interesting work with Gemini in a project called AlphaCode 2, which takes several different Gemini models and trains them specifically in different parts of the programming process.

In essence, AlphaCode 2 creates a swarm of programming agents, and gets them to generate up to a million different chunks of code to solve a problem. It then uses a separate Gemini model to examine these code samples, check if they compile, and rank them on how well they do their portion of the overall coding work, discarding around 95% of the samples created.

 

RELATED
OpenAI develops curious AI's that learn for themselves

 

Then, another Gemini model develops a code-testing regime and sample test data, and runs a thorough testing process on all the remaining code samples, ranking them on “correctness,” to find the top pieces of code. Effectively, Deepmind has split Gemini into a multifunctional software team, with specialist AIs working on requirements analysis, system design, testing, deployment and maintenance as well as a giant army of coders.

How does it perform? Well, in a coding competition against humans, it beat 87% of other entrants, ranking it “just between the ‘Expert’ and ‘Candidate Master’ categories on Codeforces.”

As Deepmind scientists explain in the video below, these kind of contests require a ton more than just coding skills – they require extraordinary degrees of rational understanding and creative use of the available software tools.

 

 

Mind you, AlphaCode 2 isn’t going to be available to the public immediately, or indeed ever in its current form. Generating a million code snippets, as you might imagine, burns a ton of computing power and is way too expensive for general release. But what’s interesting here is that the success rate doesn’t appear to have tapered off at a million snippets – indeed, it seems that AlphaCode would continue to improve its results if it went well into the billions, or trillions. That’s an incredibly inefficient way to do things, but with the blinding speed of progress in this area, a smarter way is sure to come along very soon.

 

RELATED
USAF announces development of a new Mach 5 hypersonic missile

 

Deepmind says it’s looking at how a streamlined version can be brought into the public models.

And there’s more, a ton more. But this should give you a sense of what Google is promising here. Google is planning to release it in three model sizes: Gemini Nano, built for installation right on board mobile devices, Gemini Pro – a rough equivalent of GPT 3.5, which will be the main workhorse model for most tasks, and Gemini Ultra, the largest model, which Google says beats GPT-4 handily across a broad swathe of benchmark tests – gapping it even more substantially on multimodal testing than on text-based challenges.

 

 

Gemini Ultra is scheduled for public launch next year, once it’s been more thoroughly vetted for safety and alignment issues. That’s when we’ll start getting a proper sense for where it outshines GPT and where it’s just not up to snuff. Gemini Nano, meanwhile, is already available on the Pixel 8 Pro smartphone, and will begin rolling out on others.

 

RELATED
Watch your mouth, Google's DeepMind lip reads better than humans

 

Gemini Pro, though, is available right now, for free, to anyone with a Google account through the Google Bard service. It’s a slimmed-down version, unfortunately, with only the ability to upload images rather than documents, audio or video, but Google says it’ll gain new capabilities soon. It’s already got access, with your permission, to operate on your Gmail, Google Drive and Google Docs, as well as flight and hotel bookings, Google Maps, and YouTube, where it allows you to interact and ask questions about videos.

And yep, Google is working to integrate the Gemini model into pretty much every product it makes. Buckle up, y’all, this roller coaster only knows how to accelerate.

Related Posts

Leave a comment

EXPLORE MORE!

1000's of articles about the exponential future, 1000's of pages of insights, 1000's of videos, and 100's of exponential technologies: Get The Email from 311, your no-nonsense briefing on all the biggest stories in exponential technology and science.

You have Successfully Subscribed!

Pin It on Pinterest

Share This