OpenAI thrashes DeepMind using an AI from the 1980’s



  • Decades old AI algorithms are increasingly demonstrating that, with some fine tuning, they can thrash today’s best systems – and people are taking notice


Artificial intelligence (AI) researchers have a long history of going back in time to explore old ideas, and now researchers at OpenAI, which is backed by Elon Musk, have revisited “Neuroevolution,” a field that has been around since the 1980s, and they’ve achieved state of the art results.


SpaceX announces 2018 mission to Mars


The group, which was led by OpenAI’s research director Ilya Sutskever, explored the use of a set of algorithms called “Evolution strategies,” which are aimed at solving “optimisation” problems. Optimisation problems are just like they sound, think of something that needs optimising, such as your route to work, a flight plan, or even a healthcare treatment and optimise it.

On an abstract level, the technique the team used works by letting successful algorithms to pass their characteristics on to future generations – in short, each successive generation gets better and better at whatever tasks they’ve been assigned. However, coming back into the present day, the researchers took these algorithms and reworked them so they’d work better with today’s deep neural networks and run better on large scale distributed computing systems.

To validate the new systems effectiveness they set the algorithms to work on a series of challenges that are seen as benchmarks for reinforcement learning – the technique behind many of Google DeepMind’s most impressive feats that range from teaching their AI’s to learn as fast as humans, and giving them human like memory, through to creating new Artificial General Intelligence (AGI) architectures, and teaching them to dream and annihilate online Go players – to name but a few.


Facebook creates a new intelligence test to test how smart AI's really are


One of the challenges was to train the algorithm to play a variety of Atari computer games, and the other was to get it to learn how to control a virtual humanoid walker in a physics engine.

First the algorithm started with a random policy – the set of rules that govern how the system should behave to achieve high score, and then it created several hundred copies of the policy, with some random variation that were then tested on the game. These policies were then mixed back together again, but with greater weight given to the ones that got the highest score in the game. The team repeated the process until it came up with a policy that played the game well.

In just an hour of training on the Atari challenge the algorithm achieved a level of mastery that took a DeepMind’s reinforcement learning system a whole day to learn, and on the walking problem it took just 10 minutes, compared to DeepMind’s 10 hours.


AI's new trick is predicting when you're going to die


One of the keys to this dramatic performance improvement was the fact that the new system was superb at processing workloads in parallel. To solve the walking simulation, for example, the system spread its computations over 1,440 CPU cores, while in the Atari challenge it used 720.

This was possible because the system only required limited communication between the various “worker” algorithms testing the candidate policies – scaling reinforcement algorithms like DeepMind’s have to communicate a lot more. Additionally, the new system didn’t need to use “backpropagation,” a common neural network learning technique – this effectively compares the network’s input with the desired output and then feeds the resulting information back into the network to help optimise it.

When combined this helped make the new systems code shorter, and the algorithm three to four times faster. But the approach has its limitations. These kinds of algorithms are usually compared based on their data efficiency – the number of iterations required to achieve a specific score in a game, and using this metric, the OpenAI approach did worse than the traditional reinforcement learning approaches, even though it carried out those iterations much quicker.


The White House just issued a Defcon scale for cyber attacks


For supervised learning problems, for example, such as image classification and speech recognition it was up to 1,000 times slower than approaches that use backpropagation. And that’s bad.

Nevertheless, the work demonstrated promising new applications for what were once thought obsolete evolutionary approaches, and OpenAI isn’t the only group investigating them, Google has also been experimenting with older algorithms, so while I don’t know about dogs, it certainly looks like you can teach old algorithms new tricks.

About author

Matthew Griffin

Matthew Griffin, Futurist and Founder of the 311 Institute is described as “The Adviser behind the Advisers.” Among other things Matthew keeps busy helping the world’s largest smartphone manufacturers ideate the next five generations of smartphones, and what comes beyond, the world’s largest chip makers envision the next twenty years of intelligent machines, and is helping Europe’s largest energy companies re-invent energy generation, transmission and retail.

Recognised in 2013, 2015 and 2016 as one of Europe’s foremost futurists, innovation and strategy experts Matthew is an award winning author, entrepreneur and international speaker who has been featured on the BBC, Discovery and other outlets. Working hand in hand with accelerators, investors, governments, multi-nationals and regulators around the world Matthew helps them envision the future and helps them transform their industries, products and go to market strategies, and shows them how the combination of new, democratised, powerful emerging technologies are helping accelerate cultural, industrial and societal change.

Matthew’s clients include Accenture, Bain & Co, Bank of America, Blackrock, Booz Allen Hamilton, Boston Consulting Group, Dell EMC, Dentons, Deutsche Bank, Deloitte, Deutsche Bank, Du Pont, E&Y, Fidelity, Goldman Sachs, HPE, Huawei, JP Morgan Chase, KPMG, Lloyds Banking Group, McKinsey & Co, PWC, Qualcomm, Rolls Royce, SAP, Samsung, Schroeder’s, Sequoia Capital, Sopra Steria, UBS, the UK’s HM Treasury, the USAF and many others.


Your email address will not be published. Required fields are marked *