Matthew Griffin, described as “The Adviser behind the Advisers” and a “Young Kurzweil,” is the founder and CEO of the 311 Institute, a global futures and deep futures consultancy working between the dates of 2020 to 2070, and is an award winning futurist, and author of “Codex of the Future.” Regularly featured in the global media, including AP, BBC, CNBC, Discovery, RT, and Viacom, Matthew’s ability to identify, track, and explain the impacts of hundreds of revolutionary emerging technologies on global culture, industry and society, is unparalleled. Recognised for the past six years as one of the world’s foremost futurists, innovation and strategy experts Matthew is an international speaker who helps governments, investors, multi-nationals and regulators around the world envision, build and lead an inclusive, sustainable future. A rare talent Matthew’s recent work includes mentoring Lunar XPrize teams, re-envisioning global education and training with the G20, and helping the world’s largest organisations envision and ideate the future of their products and services, industries, and countries. Matthew's clients include three Prime Ministers and several governments, including the G7, Accenture, Bain & Co, BCG, BOA, Blackrock, Bentley, Credit Suisse, Dell EMC, Dentons, Deloitte, Du Pont, E&Y, GEMS, HPE, Huawei, JPMorgan Chase, KPMG, McKinsey, PWC, Qualcomm, SAP, Samsung, Sopra Steria, UBS, and many more.
WHY THIS MATTERS IN BRIEF
After almost forty years of research speech recognition systems are now as good as humans.
Microsoft have announced that they have made a major breakthrough in speech recognition and created a technology that, finally, recognises the words in conversational speech as well as humans do – or at least, as good as professional human transcriptionists, which is better than most humans.
In the report the team from Microsofts Artificial Intelligence and Research Unit announced that their speech recognition system makes the same – and in some cases fewer – errors than professional transcriptionists and that the systems word error rate (WER) is now just 5.9 percent – down from the 6.3 percent WER the team reported just last month.
The 5.9 percent error rate is about equal to that of people who were asked to transcribe the same conversation, and it’s the lowest ever recorded against the imaginatively titled industry standard “Switchboard Speech Recognition Task”.
“We’ve reached human parity,” said Xuedong Huang, the company’s chief speech scientist, “and this is an historic achievement.”
And it is. To put it into context the announcement means, that for the first time ever, a machine is as good at recognising the words being spoken in a fluid conversation as a human is and by achieving this latest milestone the team has beat a goal they set less than a year ago. The new announcement goes to show just how fast the company’s speech recognition technology, which is based on Microsoft’s Computational Network Toolkit (MCNT), a homegrown system for deep learning that the research team has since posted on GitHub via an open source license, is progressing.
Huang said CNTK’s ability to quickly process deep learning algorithms across multiple computers running a specialized chip called a Graphics Processing Unit (GPU) helped to vastly improve the speed at which they were able to do their research and, ultimately, reach parity.
“Even five years ago, I wouldn’t have thought we could have achieved this. I just wouldn’t have thought it would be possible,” said Harry Shum, the executive vice president who heads the Microsoft Artificial Intelligence and Research group.
The research milestone comes after decades of research in speech recognition, beginning almost forty years ago in the early 1970s with DARPA, the US agency tasked with making technology breakthroughs in the interest of national security and over the decades, more and more technology companies and many research organizations have joined in the pursuit.
“This accomplishment is the culmination of over twenty years of effort,” said Geoffrey Zweig, who manages Microsofts Speech & Dialog Research Group.
The announcment has broad implications for the consumer and business worlds who can now use the technology to augment their products and apps with state of the art speech recognition. That includes consumer entertainment devices like the Xbox, accessibility tools such as instant speech-to-text transcription and personal digital assistants such as Cortana.
“This will make Cortana more powerful, making a truly intelligent assistant possible,” Shum said, “and it’s a dream come true for me.”
Moving forward, Zweig said the researchers are working on ways to make sure that speech recognition works well in more real-life settings. That includes places where there is a lot of background noise, such as at a party or while driving on the highway. They’ll also focus on better ways to help the technology assign names to individual speakers when multiple people are talking, and on making sure that it works well with a wide variety of voices, regardless of age, accent or ability.
In the longer term, researchers will focus on ways to teach computers not just to transcribe the acoustic signals that come out of people’s mouths, but instead to understand the words they are saying. That would give the technology the ability to answer questions or take action based on what they are told.
“The next frontier is to move from recognition to understanding,” Zweig said.
Shum noted that we are moving away from a world where people must understand computers to a world in which computers must understand us, still, he cautioned, true artificial intelligence is still on the distant horizon.
“It will be much longer, much further down the road until computers can understand the real meaning of what’s being said or shown,” he said.