WHY THIS MATTERS IN BRIEF
Everyone has their weaknesses, now researchers have developed AI’s that can identify and exploit them, one day though these AI’s will be able to identify and exploit any human flaw, as well as flaws in other AI’s, for better and worse.
Over the past couple of years Artificial Intelligence (AI) has been getting better and better at beating the world’s best gamers, whether they’re masters of chess, Dota 2.0, Go, Poker, Starcraft and many other games, and these AI’s are now so good that recently the world’s top Go champion, who was beaten by AI a while back, officially retired announcing that “AI is invincible.”
Normally though AI beats its adversaries by running millions, even tens of millions of game simulations and learning the best strategies as it goes along. Now though, in a twist, researchers at Google’s DeepMind outfit, who are arguably one of the global leaders in AI development and the company behind the AI’s who are taking most gamers to town, have turned that construct on its head and developed a new AI that’s designed to find and exploit weaknesses in players strategies – a trait that could also prove very useful in the field of cyber security and warfare especially when coupled with Google’s latest self-evolving AI system, and the US Department of Defense’s new war game AI and Skynet wannabe Robo-Hacking system.
In a paper published on the preprint server Arxiv.org the researchers describe a new framework that learns an approximate best response to players within games of many kinds. They claim that it achieves consistently high performance against “worst-case opponents” — that is, players who aren’t good, yet at least play by the rules and actually complete the game — in a number of games including chess, Go, and Texas Hold’em.
DeepMind CEO Demis Hassabis often asserts that games are a convenient proving ground to develop algorithms that can be translated into the real world to work on challenging problems. Innovations like this new framework, then, could lay the groundwork for Artificial General Intelligence (AGI), which is the holy grail of AI — a decision-making AI system that automatically completes not only mundane, repetitive enterprise tasks like data entry, but which reasons about its environment. That’s the long-term goal of other research institutions, like OpenAI who recently got given $1 billion in funding from Microsoft to help develop the world’s first true AGI’s.
The level of performance against players is known as exploitability. Computing that exploitability is often computationally intensive because the number of actions players might take is so large. For example, one variant of Texas Hold’em — Heads-Up Limit Texas Hold’em — has roughly 10 to the power of 14 (that’s 10 with 14 0’s after it) decision points, while Go has approximately 10 to the power of 170!
One way to get around this is with a policy that can exploit a player to be evaluated, using reinforcement learning — an AI training technique that spurs software agents to complete goals via a system rewards — to compute the best response.
The framework the DeepMind researchers propose, which they call Approximate Best Response Information State Monte Carlo Tree Search (ABR IS-MCTS), approximates an exact best response on an information-state basis. Actors within the framework follow an algorithm to play a game while a learner derives information from various game outcomes to train a policy. Intuitively, ABR IS-MCTS tries to learn a strategy that, when the exploiter is given unlimited access to the strategy of the opponent, can create a valid and exploiting counterstrategy; it simulates what would happen if someone trained for years to exploit the opponent.
The researchers report that in experiments involving 200 actors, which were trained on a PC with 4 processors and 8GB of RAM, and a learner, ABR IS-MCTS achieved a win rate above 50 percent in every game it played and a rate above 70 percent in games other than Hex or Go, like Connect Four and Breakthrough. In backgammon meanwhile it won 80 percent of the time after training for 1 million episodes.
The co-authors say they see evidence of “substantial learning” in that when the actors’ learning steps are restricted, they tend to perform worse even after 100,000 episodes of training. They also note, however, that ABR IS-MCTS is quite slow in certain contexts, taking on average 150 seconds to calculate the exploitability of a particular kind of strategy, such as UniformRandom, in Kuhn poker, which is a simplified form of two-player poker.
Now, having proven the basic theory the team say that they are going to be extending this new method and training it on even more complex games, all of which means that sooner rather than later there’ll be AI’s that not only know your weaknesses but that they know how to exploit them which increasingly sounds like the script from the Hollywood movie Ex Machina…