OpenAI's AI learned to play Minecraft by just watching YouTube videos

1 0

By Matthew Griffin Intelligence and the Senses 26th July 2022

WHY THIS MATTERS IN BRIEF

Researchers are continually finding new ways to train AI’s to create even better models and this is a good example of one.

Love the Exponential Future? Join our XPotential Community, subscribe to the podcast, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

We’re always being told that Artificial Intelligence (AI) needs massive amounts of data to learn from so it can “do things,” but increasingly that’s not true and in some cases now all it needs is a rule book to outcompete even the best human experts in games like Chess.

Google DeepMind's new business unit to assess AI's impact on society

In 2020 OpenAI’s machine learning algorithm GPT-3 blew people away when, after ingesting billions of words scraped from the internet, it began spitting out well crafted blog posts. This year, DALL-E 2, a cousin of GPT-3 trained on text and images, caused a similar stir online when it began whipping up surreal images of astronauts riding horses and, more recently, crafting weird, photorealistic faces of people that don’t exist.

The Future of AI, by keynote speaker Matthew Griffin

Now, the company says its latest AI has learned to play Minecraft, which also just got a massive Web 3.0 upgrade, after watching some 70,000 hours of video showing people playing the game on YouTube.

Neurotech breakthrough lets Russians visualise people's thoughts

Compared to numerous prior Minecraft algorithms which operate in much simpler “sandbox” versions of the game, the new AI plays in the same environment as humans, using standard keyboard and mouse commands.

In a blog post and preprint detailing the work, the OpenAI team say that, out of the box, the algorithm learned basic skills, like chopping down trees, making planks, and building crafting tables. They also observed it swimming, hunting, cooking, and “pillar jumping.”

“To the best of our knowledge, there is no published work that operates in the full, unmodified human action space, which includes drag-and-drop inventory management and item crafting,” the authors wrote in their paper.

Sci-fi like gadget lets blind people see using infra red and loose the cane

With fine tuning – that is, training the model on a more focused data set – they found the algorithm more reliably performed all of these tasks, but also began to advance its technological prowess by fabricating wooden and stone tools and building basic shelters, exploring villages, and raiding chests.

After further fine tuning with reinforcement learning, it learned to build a diamond pickaxe a skill that takes human players some 20 minutes and 24,000 actions to accomplish.

This is a notable result. AI has long struggled with Minecraft’s wide open gameplay. Games like chess and Go, which AI’s already mastered, have clear objectives, and progress toward those objectives can be measured. To conquer Go, researchers used reinforcement learning, where an algorithm is given a goal and rewarded for progress toward that goal. Minecraft, on the other hand, has any number of possible objectives, progress is less linear, and deep reinforcement learning algorithms are usually left spinning their wheels.

US military starts trialling robotic surgeons in the battlefield

In the 2019 MineRL Minecraft competition for AI developers, for example, none of the 660 submissions achieved the competition’s relatively simple goal of mining diamonds.

It’s worth noting that to reward creativity and show that throwing computing power at a problem isn’t always the answer, the MineRL organizers placed strict limits on participants: they were allowed one NVIDIA GPU and 1,000 hours of recorded gameplay. Though the contestants performed admirably, the OpenAI result, achieved with more data and 720 NVIDIA GPUs, seems to show computing power still has its benefits.

With its Video Pre-Training (VPT) algorithm for Minecraft, OpenAI returned to the approach it’s used with GPT-3 and DALL-E: pre-training an algorithm on a towering data set of human-created content. But the algorithm’s success wasn’t enabled by computing power or data alone. Training a Minecraft AI on that much video wasn’t practical before.

MIT researchers use AI and a smartphone to see round corners

Raw video footage isn’t as useful for behavioral AIs as it is for content generators like GPT-3 and DALL-E. It shows what people are doing, but it doesn’t explain how they’re doing it. For the algorithm to link video to actions, it needs labels. A video frame showing a player’s collection of objects, for example, would need to be labelled “inventory” alongside the command key “E” which is used to open the inventory.

Labelling every frame in 70,000 hours of video would be insane. So, the team paid Upwork contractors to record and label basic Minecraft skills. They used 2,000 hours of this video to teach a second algorithm how to label Minecraft videos, and that algorithm, IDM, annotated all 70,000 hours of YouTube footage. The team says IDM was over 90 percent accurate when labelling keyboard and mouse commands.

This approach of humans training a data-labelling algorithm to unlock behavioural data sets online may help AI learn other skills too.

Inside OpenAI, the company setting AI free

“VPT paves the path toward allowing agents to learn to act by watching the vast numbers of videos on the internet,” the researchers wrote. Beyond Minecraft, OpenAI thinks VPT can bring new real-world applications, like algorithms that operate computers at a prompt. Imagine, for instance, asking your laptop to find a document and email it to your boss.

Much to the chagrin of the MineRL competition organizers perhaps, the results do seem to show that computing power and resources still move the needle on the most advanced AI.

Never mind the cost of computing, OpenAI said the Upwork contractors alone cost $160,000. Though to be fair, manually labelling the whole data set would’ve run into the millions and taken considerable time to complete. And while the computing power wasn’t negligible, the model was actually quite small. VPT’s hundreds of millions of parameters are orders of magnitude less than GPT-3’s hundreds of billions.

Suspicious nuclear power plants can now be monitored remotely using WATCHMAN

Still, the drive to find clever new approaches that use less data and computing is valid. A kid can learn Minecraft basics by watching one or two videos. Today’s AI requires far more to learn even simple skills. Making AI more efficient is a big, worthy challenge.

In any case, OpenAI is in a sharing mood this time. The researchers say VPT isn’t without risk and they’ve strictly controlled access to algorithms like GPT-3 and DALL-E partly to limit misuse, but the risk is minimal for now. They’ve open sourced the data, environment, and algorithm and are partnering with MineRL. This year’s contestants are free to use, modify, and fine-tune the latest in Minecraft AI.

Chances are good they’ll make it well past mining diamonds this time around.

Matthew Griffin / About Author

Matthew Griffin, multi-award winning Futurist and named Futurist of the Year 2024, has been described as a "Walking encyclopaedia of the future" by NASA and a futurist polymath. One of the world's most renowned futurists and strategic foresight experts Matthew is the 15 times author of the blockbuster "Codex of the Future" series, and is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working across the next 50 years, XPotential University, the world's first free futures and foresight university, and the World Futures Forum which works with the United Nations to solve the worlds greatest challenges. Matthew is an in demand international keynote, acclaimed university lecturer and mentor, and host of the hit Fanatical Futurist podcast.

A rare talent in his past Matthew helped build and run several multi-billion dollar business units for Atos, Dell-EMC, and IBM, and his ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders, G7, G20, and G77+ governments, and many of the world's most respected brands including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Dentons, Deloitte, Disney, Dow, EY, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, T-Mobile, UBS, VISA, and many others. He was also the only futurist invited to talk at the UN COP28 held in Dubai alongside world leaders.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.

Comments (1)

Google say their next AI will eclipse GPT-4 by multiples – By Futurist and Virtual Keynote Speaker Matthew Griffin

19th September 2023 at 7:31 am

[…] software learns to take on tough problems that require choosing what actions to take like in Go or video games by making repeated attempts and receiving feedback on its performance. It also used a method […]

OpenAI’s AI learned to play Minecraft by just watching YouTube videos

WHY THIS MATTERS IN BRIEF

Researchers are continually finding new ways to train AI’s to create even better models and this is a good example of one.

Comments (1)

Leave a comment Cancel reply

ORGANISING AN EVENT OR WORKSHOP?

STAY CONNECTED

FREE BOOKS AND STUFF

MY PLEDGE TO THE PLANET

NET ZERO .

ZERO HARM .

ZERO IMPACT .

ZERO WASTE .

EXPLORE MORE!

You have Successfully Subscribed!

Pin It on Pinterest

OpenAI’s AI learned to play Minecraft by just watching YouTube videos

WHY THIS MATTERS IN BRIEF

Researchers are continually finding new ways to train AI’s to create even better models and this is a good example of one.

Related Posts

Comments (1)

Leave a comment Cancel reply

ORGANISING AN EVENT OR WORKSHOP?

STAY CONNECTED

FREE BOOKS AND STUFF

MY PLEDGE TO THE PLANET

NET ZERO .

ZERO HARM .

ZERO IMPACT .

ZERO WASTE .

EXPLORE MORE!

You have Successfully Subscribed!

Pin It on Pinterest