Scroll Top

OpenAI thrashes DeepMind using an AI from the 1980’s

WHY THIS MATTERS IN BRIEF

Decades old AI algorithms are increasingly demonstrating that, with some fine tuning, they can thrash today’s best systems – and people are taking notice.

 

Artificial intelligence (AI) researchers have a long history of going back in time to explore old ideas, and now researchers at OpenAI, which is backed by Elon Musk, have revisited “Neuroevolution,” a field that has been around since the 1980s, and they’ve achieved state of the art results.

 

RELATED
MIT's breakthrough battery makes electricity from thin air, literally

 

The group, which was led by OpenAI’s research director Ilya Sutskever, explored the use of a set of algorithms called “Evolution strategies,” which are aimed at solving “optimisation” problems. Optimisation problems are just like they sound, think of something that needs optimising, such as your route to work, a flight plan, or even a healthcare treatment and optimise it.

On an abstract level, the technique the team used works by letting successful algorithms to pass their characteristics on to future generations – in short, each successive generation gets better and better at whatever tasks they’ve been assigned. However, coming back into the present day, the researchers took these algorithms and reworked them so they’d work better with today’s deep neural networks and run better on large scale distributed computing systems.

To validate the new systems effectiveness they set the algorithms to work on a series of challenges that are seen as benchmarks for reinforcement learning – the technique behind many of Google DeepMind’s most impressive feats that range from teaching their AI’s to learn as fast as humans, and giving them human like memory, through to creating new Artificial General Intelligence (AGI) architectures, and teaching them to dream and annihilate online Go players – to name but a few.

 

RELATED
France's railways to run semi-autonomous from 2023

 

One of the challenges was to train the algorithm to play a variety of Atari computer games, and the other was to get it to learn how to control a virtual humanoid walker in a physics engine.

First the algorithm started with a random policy – the set of rules that govern how the system should behave to achieve high score, and then it created several hundred copies of the policy, with some random variation that were then tested on the game. These policies were then mixed back together again, but with greater weight given to the ones that got the highest score in the game. The team repeated the process until it came up with a policy that played the game well.

In just an hour of training on the Atari challenge the algorithm achieved a level of mastery that took a DeepMind’s reinforcement learning system a whole day to learn, and on the walking problem it took just 10 minutes, compared to DeepMind’s 10 hours.

 

RELATED
IBM's "flying brain" Watson heads into space

 

One of the keys to this dramatic performance improvement was the fact that the new system was superb at processing workloads in parallel. To solve the walking simulation, for example, the system spread its computations over 1,440 CPU cores, while in the Atari challenge it used 720.

This was possible because the system only required limited communication between the various “worker” algorithms testing the candidate policies – scaling reinforcement algorithms like DeepMind’s have to communicate a lot more. Additionally, the new system didn’t need to use “backpropagation,” a common neural network learning technique – this effectively compares the network’s input with the desired output and then feeds the resulting information back into the network to help optimise it.

When combined this helped make the new systems code shorter, and the algorithm three to four times faster. But the approach has its limitations. These kinds of algorithms are usually compared based on their data efficiency – the number of iterations required to achieve a specific score in a game, and using this metric, the OpenAI approach did worse than the traditional reinforcement learning approaches, even though it carried out those iterations much quicker.

 

RELATED
Black box AI's learn to express themselves so researchers can read their minds

 

For supervised learning problems, for example, such as image classification and speech recognition it was up to 1,000 times slower than approaches that use backpropagation. And that’s bad.

Nevertheless, the work demonstrated promising new applications for what were once thought obsolete evolutionary approaches, and OpenAI isn’t the only group investigating them, Google has also been experimenting with older algorithms, so while I don’t know about dogs, it certainly looks like you can teach old algorithms new tricks.

Related Posts

Comments (2)

[…] some participants also made use of older, less fashionable approaches, something that seems to be coming back in vogue, that involved giving a virtual agent hard coded knowledge and […]

[…] how AI’s have also mastered a mixture of other Atari games, as well as the ancient game of Go,  Poker and even pig wrangling, which apparently helps them […]

Leave a comment

EXPLORE MORE!

1000's of articles about the exponential future, 1000's of pages of insights, 1000's of videos, and 100's of exponential technologies: Get The Email from 311, your no-nonsense briefing on all the biggest stories in exponential technology and science.

You have Successfully Subscribed!

Pin It on Pinterest

Share This