Everyone has their weaknesses, now researchers have developed AI’s that can identify and exploit them, one day though these AI’s will be able to identify and exploit any human flaw, as well as flaws in other AI’s, for better and worse.
Over the past couple of years Artificial Intelligence (AI) has been getting better and better at beating the world’s best gamers, whether they’re masters of chess, Dota 2.0, Go, Poker, Starcraft and many other games, and these AI’s are now so good that recently the world’s top Go champion, who was beaten by AI a while back, officially retired announcing that “AI is invincible.”
Normally though AI beats its adversaries by running millions, even tens of millions of game simulations and learning the best strategies as it goes along. Now though, in a twist, researchers at Google’s DeepMind outfit, who are arguably one of the global leaders in AI development and the company behind the AI’s who are taking most gamers to town, have turned that construct on its head and developed a new AI that’s designed to find and exploit weaknesses in players strategies – a trait that could also prove very useful in the field of cyber security and warfare especially when coupled with Google’s latest self-evolving AI system, and the US Department of Defense’s new war game AI and Skynet wannabe Robo-Hacking system.
In a paper published on the preprint server Arxiv.org the researchers describe a new framework that learns an approximate best response to players within games of many kinds. They claim that it achieves consistently high performance against “worst-case opponents” — that is, players who aren’t good, yet at least play by the rules and actually complete the game — in a number of games including chess, Go, and Texas Hold’em.
DeepMind CEO Demis Hassabis often asserts that games are a convenient proving ground to develop algorithms that can be translated into the real world to work on challenging problems. Innovations like this new framework, then, could lay the groundwork for Artificial General Intelligence (AGI), which is the holy grail of AI — a decision-making AI system that automatically completes not only mundane, repetitive enterprise tasks like data entry, but which reasons about its environment. That’s the long-term goal of other research institutions, like OpenAI who recently got given $1 billion in funding from Microsoft to help develop the world’s first true AGI’s.
The level of performance against players is known as exploitability. Computing that exploitability is often computationally intensive because the number of actions players might take is so large. For example, one variant of Texas Hold’em — Heads-Up Limit Texas Hold’em — has roughly 10 to the power of 14 (that’s 10 with 14 0’s after it) decision points, while Go has approximately 10 to the power of 170!
One way to get around this is with a policy that can exploit a player to be evaluated, using reinforcement learning — an AI training technique that spurs software agents to complete goals via a system rewards — to compute the best response.
The framework the DeepMind researchers propose, which they call Approximate Best Response Information State Monte Carlo Tree Search (ABR IS-MCTS), approximates an exact best response on an information-state basis. Actors within the framework follow an algorithm to play a game while a learner derives information from various game outcomes to train a policy. Intuitively, ABR IS-MCTS tries to learn a strategy that, when the exploiter is given unlimited access to the strategy of the opponent, can create a valid and exploiting counterstrategy; it simulates what would happen if someone trained for years to exploit the opponent.
The researchers report that in experiments involving 200 actors, which were trained on a PC with 4 processors and 8GB of RAM, and a learner, ABR IS-MCTS achieved a win rate above 50 percent in every game it played and a rate above 70 percent in games other than Hex or Go, like Connect Four and Breakthrough. In backgammon meanwhile it won 80 percent of the time after training for 1 million episodes.
The co-authors say they see evidence of “substantial learning” in that when the actors’ learning steps are restricted, they tend to perform worse even after 100,000 episodes of training. They also note, however, that ABR IS-MCTS is quite slow in certain contexts, taking on average 150 seconds to calculate the exploitability of a particular kind of strategy, such as UniformRandom, in Kuhn poker, which is a simplified form of two-player poker.
Now, having proven the basic theory the team say that they are going to be extending this new method and training it on even more complex games, all of which means that sooner rather than later there’ll be AI’s that not only know your weaknesses but that they know how to exploit them which increasingly sounds like the script from the Hollywood movie Ex Machina…
Matthew Griffin, described as “The Adviser behind the Advisers” and a “Young Kurzweil,” is the founder and CEO of the World Futures Forum and the 311 Institute, a global Futures and Deep Futures consultancy working between the dates of 2020 to 2070, and is an award winning futurist, and author of “Codex of the Future” series.
Regularly featured in the global media, including AP, BBC, Bloomberg, CNBC, Discovery, RT, Viacom, and WIRED, Matthew’s ability to identify, track, and explain the impacts of hundreds of revolutionary emerging technologies on global culture, industry and society, is unparalleled. Recognised for the past six years as one of the world’s foremost futurists, innovation and strategy experts Matthew is an international speaker who helps governments, investors, multi-nationals and regulators around the world envision, build and lead an inclusive, sustainable future.
A rare talent Matthew’s recent work includes mentoring Lunar XPrize teams, re-envisioning global education and training with the G20, and helping the world’s largest organisations envision and ideate the future of their products and services, industries, and countries.
Matthew's clients include three Prime Ministers and several governments, including the G7, Accenture, Aon, Bain & Co, BCG, Credit Suisse, Dell EMC, Dentons, Deloitte, E&Y, GEMS, Huawei, JPMorgan Chase, KPMG, Lego, McKinsey, PWC, Qualcomm, SAP, Samsung, Sopra Steria, T-Mobile, and many more.
FANATICALFUTURIST PODCAST! Hear about ALL the latest futures news and breakthroughs!SUBSCRIBE
1000's of articles about the exponential future, 1000's of pages of insights, 1000's of videos, and 100's of exponential technologies: Get The Email from 311, your no-nonsense briefing on all the biggest stories in exponential technology and science.