Baidu's AI can clone anyone's voice in under a minute

0 0

By Matthew Griffin Intelligence and the Senses 27th February 2018

WHY THIS MATTERS IN BRIEF

Being able to clone people’s voices, and mimic them, opens up a range of well meaning opportunities, but it also paves the way for fake news chaos and everything in between…

In the near future, just as it is today, fake news will become an increasing problem, so it’s hard not to be concerned about the kind of mimicry that today’s Artificial Intelligence (AI) technology is making possible. First, researchers developed deep learning AI that can superimpose one person’s face onto another person’s body. Now, researchers at Chinese search giant Baidu have created an AI they claim can learn to accurately mimic your voice — based on less than a minute’s worth of listening to it.

Google's AI just created its own child AI and it beat human experts

“From a technical perspective, this is an important breakthrough showing that a complicated generative modelling problem, namely speech synthesis, can be adapted to new cases by efficiently learning only from a few examples,” said Leo Zou, a member of Baidu’s communications team, “previously it would take numerous examples for a model to learn. Now, it takes a fraction of what it used to.”

And the company isn’t alone in its ability to mimic real voices by listening to just a minute’s worth of audio, last year I covered Adobe’s Voco product, essentially “photoshop but for voice,” and a company called LyreBird which used neural networks to replicate voices including President Donald Trump and former President Barack Obama with a relatively small number of samples. Like Lyrebird’s work, Baidu’s speech synthesis technology doesn’t sound completely convincing, but it’s an impressive step forward, and way ahead of a lot of the robotic AI voice assistants that existed just a few years ago.

US Government drafts plans to confuse cyber-criminals

The work is based around Baidu’s text-to-speech synthesis system Deep Voice, which was trained on upwards of 800 hours of audio from a total of 2,400 speakers. It needs just 100 5 second sections of vocal training data to sound its best, but a version trained on only 10 5 second samples was able to trick a voice recognition system more than 95 percent of the time.

“We see many great use cases or applications for this technology, for example, voice cloning could help patients who lost their voices. This is also an important breakthrough in the direction of personalised human-machine interfaces. For example, a mom can easily configure an audiobook reader to read her back stories using her own voice,” (errr, freaky…), “the method [additionally] allows creation of original digital content. Hundreds of characters in a video game would be able to have unique voices because of this technology. Another interesting application is speech-to-speech language translation, as the synthesizer can learn to mimic the speaker identity in another language,” added Zou.

For a deeper dive into this subject, you can listen to a sample of the voices or read a paper describing the work.

Matthew Griffin / About Author

Matthew Griffin, multi-award winning Futurist and named Futurist of the Year 2024, has been described as a "Walking encyclopaedia of the future" by NASA and a futurist polymath. One of the world's most renowned futurists and strategic foresight experts Matthew is the 15 times author of the blockbuster "Codex of the Future" series, and is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working across the next 50 years, XPotential University, the world's first free futures and foresight university, and the World Futures Forum which works with the United Nations to solve the worlds greatest challenges. Matthew is an in demand international keynote, acclaimed university lecturer and mentor, and host of the hit Fanatical Futurist podcast.

A rare talent in his past Matthew helped build and run several multi-billion dollar business units for Atos, Dell-EMC, and IBM, and his ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders, G7, G20, and G77+ governments, and many of the world's most respected brands including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Dentons, Deloitte, Disney, Dow, EY, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, T-Mobile, UBS, VISA, and many others. He was also the only futurist invited to talk at the UN COP28 held in Dubai alongside world leaders.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.