We’re always being told that Artificial Intelligence (AI) needs massive amounts of data to learn from so it can “do things,” but increasingly that’s not true and in some cases now all it needs is a rule book to outcompete even the best human experts in games like Chess.
Compared to numerous prior Minecraft algorithms which operate in much simpler “sandbox” versions of the game, the new AI plays in the same environment as humans, using standard keyboard and mouse commands.
In a blog post and preprint detailing the work, the OpenAI team say that, out of the box, the algorithm learned basic skills, like chopping down trees, making planks, and building crafting tables. They also observed it swimming, hunting, cooking, and “pillar jumping.”
“To the best of our knowledge, there is no published work that operates in the full, unmodified human action space, which includes drag-and-drop inventory management and item crafting,” the authors wrote in their paper.
With fine tuning – that is, training the model on a more focused data set – they found the algorithm more reliably performed all of these tasks, but also began to advance its technological prowess by fabricating wooden and stone tools and building basic shelters, exploring villages, and raiding chests.
After further fine tuning with reinforcement learning, it learned to build a diamond pickaxe a skill that takes human players some 20 minutes and 24,000 actions to accomplish.
This is a notable result. AI has long struggled with Minecraft’s wide open gameplay. Games like chess and Go, which AI’s already mastered, have clear objectives, and progress toward those objectives can be measured. To conquer Go, researchers used reinforcement learning, where an algorithm is given a goal and rewarded for progress toward that goal. Minecraft, on the other hand, has any number of possible objectives, progress is less linear, and deep reinforcement learning algorithms are usually left spinning their wheels.
It’s worth noting that to reward creativity and show that throwing computing power at a problem isn’t always the answer, the MineRL organizers placed strict limits on participants: they were allowed one NVIDIA GPU and 1,000 hours of recorded gameplay. Though the contestants performed admirably, the OpenAI result, achieved with more data and 720 NVIDIA GPUs, seems to show computing power still has its benefits.
With its Video Pre-Training (VPT) algorithm for Minecraft, OpenAI returned to the approach it’s used with GPT-3 and DALL-E: pre-training an algorithm on a towering data set of human-created content. But the algorithm’s success wasn’t enabled by computing power or data alone. Training a Minecraft AI on that much video wasn’t practical before.
Raw video footage isn’t as useful for behavioral AIs as it is for content generators like GPT-3 and DALL-E. It shows what people are doing, but it doesn’t explain how they’re doing it. For the algorithm to link video to actions, it needs labels. A video frame showing a player’s collection of objects, for example, would need to be labelled “inventory” alongside the command key “E” which is used to open the inventory.
Labelling every frame in 70,000 hours of video would be insane. So, the team paid Upwork contractors to record and label basic Minecraft skills. They used 2,000 hours of this video to teach a second algorithm how to label Minecraft videos, and that algorithm, IDM, annotated all 70,000 hours of YouTube footage. The team says IDM was over 90 percent accurate when labelling keyboard and mouse commands.
This approach of humans training a data-labelling algorithm to unlock behavioural data sets online may help AI learn other skills too.
“VPT paves the path toward allowing agents to learn to act by watching the vast numbers of videos on the internet,” the researchers wrote. Beyond Minecraft, OpenAI thinks VPT can bring new real-world applications, like algorithms that operate computers at a prompt. Imagine, for instance, asking your laptop to find a document and email it to your boss.
Much to the chagrin of the MineRL competition organizers perhaps, the results do seem to show that computing power and resources still move the needle on the most advanced AI.
Never mind the cost of computing, OpenAI said the Upwork contractors alone cost $160,000. Though to be fair, manually labelling the whole data set would’ve run into the millions and taken considerable time to complete. And while the computing power wasn’t negligible, the model was actually quite small. VPT’s hundreds of millions of parameters are orders of magnitude less than GPT-3’s hundreds of billions.
Still, the drive to find clever new approaches that use less data and computing is valid. A kid can learn Minecraft basics by watching one or two videos. Today’s AI requires far more to learn even simple skills. Making AI more efficient is a big, worthy challenge.
In any case, OpenAI is in a sharing mood this time. The researchers say VPT isn’t without risk and they’ve strictly controlled access to algorithms like GPT-3 and DALL-E partly to limit misuse, but the risk is minimal for now. They’ve open sourced the data, environment, and algorithm and are partnering with MineRL. This year’s contestants are free to use, modify, and fine-tune the latest in Minecraft AI.
Chances are good they’ll make it well past mining diamonds this time around.
Matthew Griffin, described as “The Adviser behind the Advisers” and a “Young Kurzweil,” is the founder and CEO of the World Futures Forum and the 311 Institute, a global Futures and Deep Futures consultancy working between the dates of 2020 to 2070, and is an award winning futurist, and author of “Codex of the Future” series.
Regularly featured in the global media, including AP, BBC, Bloomberg, CNBC, Discovery, RT, Viacom, and WIRED, Matthew’s ability to identify, track, and explain the impacts of hundreds of revolutionary emerging technologies on global culture, industry and society, is unparalleled. Recognised for the past six years as one of the world’s foremost futurists, innovation and strategy experts Matthew is an international speaker who helps governments, investors, multi-nationals and regulators around the world envision, build and lead an inclusive, sustainable future.
A rare talent Matthew’s recent work includes mentoring Lunar XPrize teams, re-envisioning global education and training with the G20, and helping the world’s largest organisations envision and ideate the future of their products and services, industries, and countries.
Matthew's clients include three Prime Ministers and several governments, including the G7, Accenture, Aon, Bain & Co, BCG, Credit Suisse, Dell EMC, Dentons, Deloitte, E&Y, GEMS, Huawei, JPMorgan Chase, KPMG, Lego, McKinsey, PWC, Qualcomm, SAP, Samsung, Sopra Steria, T-Mobile, and many more.
FANATICALFUTURIST PODCAST! Hear about ALL the latest futures news and breakthroughs!SUBSCRIBE
1000's of articles about the exponential future, 1000's of pages of insights, 1000's of videos, and 100's of exponential technologies: Get The Email from 311, your no-nonsense briefing on all the biggest stories in exponential technology and science.