It’s common knowledge that students learn best when the curricula they’re taking is just right for their level of skill, and the same is true for AI, so now a team of computer scientists in the US have created an AI that can design its own curricula so it can figure out the best way to teach itself and learn new tasks faster. The work could speed learning in self-driving cars and household robots, and it might even help crack previously unsolvable math problems.
In one of the new experiments, an AI program tries to quickly reach a destination by navigating a 2D grid populated with solid blocks. The “agent” improves its abilities through a process called reinforcement learning, a kind of trial and error.
To help it navigate increasingly complex worlds, the researchers, who were led by University of California Berkeley graduate student Michael Dennis and Natasha Jaques, a research scientist at Google, considered two ways in which they could draw the maps. One method randomly distributed blocks, but the AI didn’t learn much. Another method remembered what the AI had struggled with in the past and maximized difficulty accordingly. But that made the worlds too hard, and sometimes even impossible to complete.
So the team created a setting that was just right, using a new approach they call PAIRED. First, they coupled their AI with a nearly identical one, albeit with a slightly different set of strengths, which they called the antagonist. Then, they had a third AI design worlds that were easy for the antagonist but hard for the original protagonist.
That kept the tasks just at the edge of the protagonist’s ability to solve. The designer, like the two agents, used a neural network, a program inspired by the brain’s architecture, to learn its task over many trials.
After training, the protagonist attempted a set of difficult mazes. If it trained using the two older methods, it solved none of the new mazes. But after training with PAIRED, it solved one in five, the team reported last month at the Conference on Neural Information Processing Systems (NeurIPS).
“We were excited by how PAIRED started working pretty much out of the gate,” Dennis says.
In another study, presented at a NeurIPS workshop, Jaques and colleagues at Google used a version of PAIRED to teach an AI agent to fill out web forms and book a flight. Whereas a simpler teaching method led it to fail nearly every time, an AI trained with the PAIRED method succeeded about 50% of the time.
Selman and his colleagues presented another approach for so-called “Auto-Curricula” at the meeting. Their task was a game called Sokoban, in which an AI agent must push blocks to target locations. But blocks can get stuck in dead ends, so success often requires planning hundreds of steps ahead – imagine rearranging large furniture in a small apartment.
Their system creates a collection of simpler puzzles to train on, with fewer blocks and targets. Then, based on the recent performance of their AI, it selects puzzles that the agent only occasionally solves, effectively ratcheting the lesson plan to the right level. Sometimes, the right puzzles are hard to predict, Selman says.
“The notion of what is a simpler task is not always obvious.”
The researchers tested their trained agent on 225 problems that no computer had ever solved. It cracked 80% of them, with about one-third of its success coming strictly from the novel training method.
“That was just fun to see,” Selman says. He says he now receives astounded messages from AI researchers who’ve been working on the problems for decades. He hopes to apply the method next to unsolved math proofs.
Pieter Abbeel, a computer scientist at UC Berkeley, also showed at the meeting that autocurricula can help robots learn to manipulate objects. He says the approach could even be used for human students.
“As an instructor, I think, ‘Hey, not every student needs the same homework exercise,’” Abbeel says, noting that AI could help tailor harder or easier material to a student’s needs. As for AI auto-curricula, he says, “I think it’s going to be at the core of pretty much all reinforcement learning.”
Matthew Griffin, described as “The Adviser behind the Advisers” and a “Young Kurzweil,” is the founder and CEO of the World Futures Forum and the 311 Institute, a global Futures and Deep Futures consultancy working between the dates of 2020 to 2070, and is an award winning futurist, and author of “Codex of the Future” series.
Regularly featured in the global media, including AP, BBC, Bloomberg, CNBC, Discovery, RT, Viacom, and WIRED, Matthew’s ability to identify, track, and explain the impacts of hundreds of revolutionary emerging technologies on global culture, industry and society, is unparalleled. Recognised for the past six years as one of the world’s foremost futurists, innovation and strategy experts Matthew is an international speaker who helps governments, investors, multi-nationals and regulators around the world envision, build and lead an inclusive, sustainable future.
A rare talent Matthew’s recent work includes mentoring Lunar XPrize teams, re-envisioning global education and training with the G20, and helping the world’s largest organisations envision and ideate the future of their products and services, industries, and countries.
Matthew's clients include three Prime Ministers and several governments, including the G7, Accenture, Aon, Bain & Co, BCG, Credit Suisse, Dell EMC, Dentons, Deloitte, E&Y, GEMS, Huawei, JPMorgan Chase, KPMG, Lego, McKinsey, PWC, Qualcomm, SAP, Samsung, Sopra Steria, T-Mobile, and many more.
FANATICALFUTURIST PODCAST! Hear about ALL the latest futures news and breakthroughs!SUBSCRIBE
1000's of articles about the exponential future, 1000's of pages of insights, 1000's of videos, and 100's of exponential technologies: Get The Email from 311, your no-nonsense briefing on all the biggest stories in exponential technology and science.