DeepMind's newest AI thrashes human gamers at Stratego

1 0

By Matthew Griffin Intelligence and the Senses 23rd December 2022

WHY THIS MATTERS IN BRIEF

We all make decisions even in the absence of data and in order for AI to “take on the real world” it has to aswell but so far that’s been tricky.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

Artificial Intelligence (AI) hates uncertainty, yet to navigate our unpredictable world it needs to learn to make choices with imperfect information – as we do every single day. And this is just one reason why when AI’s have been able to cope adequately with uncertainty quite often the US military has conscripted them to take part in war game simulations …

LG's keynote speaker at CES 2021 was a digital human called Reah

This week though DeepMind just took a stab at solving this issue. The trick was to interweave game theory into an algorithmic strategy loosely based on the human brain called deep reinforcement learning. The result, DeepNash, toppled human experts in a highly strategic board game called Stratego. A notoriously difficult game for AI, Stratego requires multiple strengths of human wit: long-term thinking, bluffing, and strategizing, all without knowing your opponent’s pieces on the board.

The Future of Artificial Intelligence by keynote Matthew Griffin

“Unlike chess and Go, Stratego is a game of imperfect information: players cannot directly observe the identities of their opponent’s pieces,” DeepMind wrote in a blog post. With DeepNash, “game-playing AI systems have advanced to a new frontier.”

It’s not all fun and games though. AI systems that can easily manoeuvre the randomness of our world and adjust their “behaviour” accordingly could one day handle real-world problems with limited information, such as optimizing traffic flow to reduce travel time and, hopefully, quenching road rage as self-driving cars become ever more present.

New algorithms save lives by taking seconds to calculate future Tsunami impacts

“If you’re making a self-driving car, you don’t want to assume that all the other drivers on the road are perfectly rational, and going to behave optimally,” said Dr. Noam Brown at Meta AI, who wasn’t involved in the research.

DeepNash’s triumph comes hot on the heels of another AI advance this month, where an algorithm learned to play Diplomacy – a game that requires negotiation and cooperation to win. As AI gains more flexible reasoning, becomes more generalized, and learns to navigate social situations, it may also spark insights into our own brains’ neural processes and cognition.

In terms of complexity, Stratego is a completely different beast compared to chess, Go, or poker – all games that AI has previously mastered. The game is essentially capture the flag. Each side has 40 pieces they can place at any position on the board. Each piece has a different name and numerical rank, such as “marshal,” “general,” “scout,” or “spy.” Higher ranking pieces can capture lower ones. The goal is to eliminate the opposition and capture their flag.

Stratego is especially challenging for AI because players can’t see the location of their opponents’ pieces, both during initial setup and throughout gameplay. Unlike chess or Go, in which each piece and movement is in view, Stratego is a game with limited information. Players must “balance all possible outcomes” any time they make a decision, the authors explained.

Facebook's AI clone of Bill Gates sounds like the real deal

This level of uncertainty is partly why Stratego has stumped AI for ages. Even the most successful game-play algorithms, such as AlphaGo and AlphaZero, rely on complete information. Stratego, in contrast, has a touch of Texas Hold ’em, a poker game DeepMind previously conquered with an algorithm. But that strategy faltered for Stratego, largely because of the length of game, which unlike poker, normally encompasses hundreds of moves.

The number of potential game plays is mind-blowing. Chess has one starting position. Stratego has over 10⁶⁶ possible starting positions – far more than all the stars in the universe. Stratego’s game tree, the sum of all potential moves in the game, totals a staggering 10⁵³⁵.

“The sheer complexity of the number of possible outcomes in Stratego means algorithms that perform well on perfect-information games, and even those that work for poker, don’t work,” said study author Dr. Julien Perolat at DeepMind. The challenge is “what excited us,” he said.

Futurist Keynote, UK: The Future of Health and Wellness, Willis Towers Watson

Stratego’s complexity means that the usual strategy for searching gameplay moves is out of the question. Dubbed the Monte Carlo tree search, a “stalwart approach to AI-based gaming,” the technique plots out potential routes – like branches on a tree – that could result in victory.

Instead, the magic touch for DeepNash came from the mathematician John Nash, portrayed in the film A Beautiful Mind. A pioneer in game theory, Nash won the Nobel Prize for his work for the Nash equilibrium. Put simply, in each game, players can tap into a set of strategies followed by everyone, so that no single player gains anything by changing their own strategy. In Statego, this brings about a zero-sum game: any gain a player makes results in a loss for their opponent.

Because of Stratego’s complexity, DeepNash took a model-free approach to their algorithm. Here, the AI isn’t trying to precisely model its opponent’s behaviour. Like a baby, it has a blank slate, of sorts, to learn. This set-up is particularly useful in early stages of gameplay, “when DeepNash knows little about its opponent’s pieces,” making predictions “difficult, if not impossible,” the authors said.

European giants team up to build electric aircraft

The team then used deep reinforcement learning to power DeepNash, with the goal of finding the game’s Nash equilibrium. It’s a match made in heaven: reinforcement learning helps decide the best next move at every step of the game, while DeepNash provides an overall learning strategy. To evaluate the system, the team also engineered a “tutor” using knowledge from the game to filter out obvious mistakes that likely wouldn’t make real-world sense.

As a first learning step, DeepNash played against itself in 5.5 billion games, a popular approach in AI training dubbed self-play. When one side wins, the AI gets awarded, and its current artificial neural network parameters are strengthened. The other side – the same AI – receives a penalty to dampen its neural network strength. It’s like rehearsing a speech to yourself in front of a mirror. Over time, you figure out mistakes and perform better. In DeepNash’s case, it drifts towards a Nash equilibrium for best gameplay.

The team tested the algorithm against other elite Stratego bots, some of which won the Computer Stratego World Championship. DeepNash squashed its opponents with a win rate of roughly 97 percent. When unleashed against Gravon – an online platform for human players – DeepNash trounced its human opponents. After over two weeks of matches against Gravon’s players in April this year, DeepNash rose to third place in all ranked matches since 2002.

OpenAI readies AI search engine to smash Google on home turf

It shows that bootstrapping human play data to AI isn’t needed for DeepNash to reach human-level performance – and beat it.

The AI also exhibited some intriguing behaviour with the initial setup and during gameplay. For example, rather than settling on a particular “optimized” starting position, DeepNash constantly shifted the pieces around to prevent its opponent from spotting patterns over time. During gameplay, the AI bounced between seemingly senseless moves – such as sacrificing high-ranking pieces – to locate the opponent’s even higher-ranking pieces upon counterattack.

DeepNash can also bluff. In one play, the AI moved a low-ranking piece as if it were a high-ranking one, luring the human opponent to chase after the piece with its high-ranking colonel. The AI sacrificed the pawn, but in turn, lured the opponent’s valuable spy piece into an ambush.

China is using autonomous 3D printers and robots to print a giant hydroelectric dam

Although DeepNash was developed for Stratego, it’s generalizable to the real-world. The core method can potentially instruct AI to better tackle our unpredictable future using limited information – from crowd and traffic control to analyzing market turmoil, perhaps even war games.

“In creating a generalizable AI system that’s robust in the face of uncertainty, we hope to bring the problem-solving capabilities of AI further into our inherently unpredictable world,” the team said.

Matthew Griffin / About Author

Matthew Griffin, multi-award winning Futurist and named Futurist of the Year 2024, has been described as a "Walking encyclopaedia of the future" by NASA and a futurist polymath. One of the world's most renowned futurists and strategic foresight experts Matthew is the 15 times author of the blockbuster "Codex of the Future" series, and is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working across the next 50 years, XPotential University, the world's first free futures and foresight university, and the World Futures Forum which works with the United Nations to solve the worlds greatest challenges. Matthew is an in demand international keynote, acclaimed university lecturer and mentor, and host of the hit Fanatical Futurist podcast.

A rare talent in his past Matthew helped build and run several multi-billion dollar business units for Atos, Dell-EMC, and IBM, and his ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders, G7, G20, and G77+ governments, and many of the world's most respected brands including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Dentons, Deloitte, Disney, Dow, EY, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, T-Mobile, UBS, VISA, and many others. He was also the only futurist invited to talk at the UN COP28 held in Dubai alongside world leaders.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.

Comments (1)

DeepMind says it's AI is a step closer to Artificial General Intelligence – Matthew Griffin | Keynote Speaker & Master Futurist

12th January 2024 at 9:02 am

[…] Intelligence (AI) has mastered some of the most complex games known to man, but models are generally tailored to solve specific kinds of challenges which is why […]

DeepMind’s newest AI thrashes human gamers at Stratego

WHY THIS MATTERS IN BRIEF

We all make decisions even in the absence of data and in order for AI to “take on the real world” it has to aswell but so far that’s been tricky.

Comments (1)

Leave a comment Cancel reply

ORGANISING AN EVENT OR WORKSHOP?

STAY CONNECTED

FREE BOOKS AND STUFF

MY PLEDGE TO THE PLANET

NET ZERO .

ZERO HARM .

ZERO IMPACT .

ZERO WASTE .

EXPLORE MORE!

You have Successfully Subscribed!

Pin It on Pinterest

DeepMind’s newest AI thrashes human gamers at Stratego

WHY THIS MATTERS IN BRIEF

We all make decisions even in the absence of data and in order for AI to “take on the real world” it has to aswell but so far that’s been tricky.

Related Posts

Comments (1)

Leave a comment Cancel reply

ORGANISING AN EVENT OR WORKSHOP?

STAY CONNECTED

FREE BOOKS AND STUFF

MY PLEDGE TO THE PLANET

NET ZERO .

ZERO HARM .

ZERO IMPACT .

ZERO WASTE .

EXPLORE MORE!

You have Successfully Subscribed!

Pin It on Pinterest