DeepMind's newest AI studies humans to exploit weaknesses in their strategies

0 0

By Matthew Griffin Intelligence and the Senses 1st May 2020

WHY THIS MATTERS IN BRIEF

Everyone has their weaknesses, now researchers have developed AI’s that can identify and exploit them, one day though these AI’s will be able to identify and exploit any human flaw, as well as flaws in other AI’s, for better and worse.

Interested in the Exponential Future? Connect, download a free E-Book, watch a keynote, or browse my blog.

Over the past couple of years Artificial Intelligence (AI) has been getting better and better at beating the world’s best gamers, whether they’re masters of chess, Dota 2.0, Go, Poker, Starcraft and many other games, and these AI’s are now so good that recently the world’s top Go champion, who was beaten by AI a while back, officially retired announcing that “AI is invincible.”

Google DeepMind launches a watermarking tool for generative content

Normally though AI beats its adversaries by running millions, even tens of millions of game simulations and learning the best strategies as it goes along. Now though, in a twist, researchers at Google’s DeepMind outfit, who are arguably one of the global leaders in AI development and the company behind the AI’s who are taking most gamers to town, have turned that construct on its head and developed a new AI that’s designed to find and exploit weaknesses in players strategies – a trait that could also prove very useful in the field of cyber security and warfare especially when coupled with Google’s latest self-evolving AI system, and the US Department of Defense’s new war game AI and Skynet wannabe Robo-Hacking system.

In a paper published on the preprint server Arxiv.org the researchers describe a new framework that learns an approximate best response to players within games of many kinds. They claim that it achieves consistently high performance against “worst-case opponents” — that is, players who aren’t good, yet at least play by the rules and actually complete the game — in a number of games including chess, Go, and Texas Hold’em.

A new AI tutor is going to be teaching coding at Harvard this semester

DeepMind CEO Demis Hassabis often asserts that games are a convenient proving ground to develop algorithms that can be translated into the real world to work on challenging problems. Innovations like this new framework, then, could lay the groundwork for Artificial General Intelligence (AGI), which is the holy grail of AI — a decision-making AI system that automatically completes not only mundane, repetitive enterprise tasks like data entry, but which reasons about its environment. That’s the long-term goal of other research institutions, like OpenAI who recently got given $1 billion in funding from Microsoft to help develop the world’s first true AGI’s.

The level of performance against players is known as exploitability. Computing that exploitability is often computationally intensive because the number of actions players might take is so large. For example, one variant of Texas Hold’em — Heads-Up Limit Texas Hold’em — has roughly 10 to the power of 14 (that’s 10 with 14 0’s after it) decision points, while Go has approximately 10 to the power of 170!

Google Translate has made up its own secret language

One way to get around this is with a policy that can exploit a player to be evaluated, using reinforcement learning — an AI training technique that spurs software agents to complete goals via a system rewards — to compute the best response.

The framework the DeepMind researchers propose, which they call Approximate Best Response Information State Monte Carlo Tree Search (ABR IS-MCTS), approximates an exact best response on an information-state basis. Actors within the framework follow an algorithm to play a game while a learner derives information from various game outcomes to train a policy. Intuitively, ABR IS-MCTS tries to learn a strategy that, when the exploiter is given unlimited access to the strategy of the opponent, can create a valid and exploiting counterstrategy; it simulates what would happen if someone trained for years to exploit the opponent.

Chinese workers are having their emotions monitored by "mind reading" tech

The researchers report that in experiments involving 200 actors, which were trained on a PC with 4 processors and 8GB of RAM, and a learner, ABR IS-MCTS achieved a win rate above 50 percent in every game it played and a rate above 70 percent in games other than Hex or Go, like Connect Four and Breakthrough. In backgammon meanwhile it won 80 percent of the time after training for 1 million episodes.

The co-authors say they see evidence of “substantial learning” in that when the actors’ learning steps are restricted, they tend to perform worse even after 100,000 episodes of training. They also note, however, that ABR IS-MCTS is quite slow in certain contexts, taking on average 150 seconds to calculate the exploitability of a particular kind of strategy, such as UniformRandom, in Kuhn poker, which is a simplified form of two-player poker.

Uber is building its own EV charging network in London

Now, having proven the basic theory the team say that they are going to be extending this new method and training it on even more complex games, all of which means that sooner rather than later there’ll be AI’s that not only know your weaknesses but that they know how to exploit them which increasingly sounds like the script from the Hollywood movie Ex Machina…

Matthew Griffin / About Author

Matthew Griffin, multi-award winning Futurist and named Futurist of the Year 2024, has been described as a "Walking encyclopaedia of the future" by NASA and a futurist polymath. One of the world's most renowned futurists and strategic foresight experts Matthew is the 15 times author of the blockbuster "Codex of the Future" series, and is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working across the next 50 years, XPotential University, the world's first free futures and foresight university, and the World Futures Forum which works with the United Nations to solve the worlds greatest challenges. Matthew is an in demand international keynote, acclaimed university lecturer and mentor, and host of the hit Fanatical Futurist podcast.

A rare talent in his past Matthew helped build and run several multi-billion dollar business units for Atos, Dell-EMC, and IBM, and his ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders, G7, G20, and G77+ governments, and many of the world's most respected brands including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Dentons, Deloitte, Disney, Dow, EY, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, T-Mobile, UBS, VISA, and many others. He was also the only futurist invited to talk at the UN COP28 held in Dubai alongside world leaders.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.