VALL-E, which was trained with 60,000 hours of English speech, is capable of mimicking a voice in “Zero-shot scenarios,” meaning it can make a voice say words it has never heard the voice say before, according to a paper published by Cornell University in which the developers introduced the tool.
VALL-E uses Text-to-Speech technology to convert written words into spoken words in “high-quality personalized” speeches, according to the 16-page paper.
The Future of Cyber and Spoofing, by keynote Matthew Griffin
It used recordings of more than 7,000 real speakers from LibriLight– an audiobook dataset made up of public-domain texts read by volunteers – to conduct its sampling. The tech giant released samples of how VALL-E would work, showcasing how the voice of a speaker is cloned.
The AI tool is not currently available for public use, and Adobe who also created a similar tool a while ago called VoCo canned the project fearing it would unleash the equivalent of “Photoshop for voice content,” and so far Microsoft hasn’t made it clear what its intended purpose is. The researchers also said the results so far showed that VALL-E “significantly outperforms” the most advanced systems of its kind, “in terms of speech naturalness and speaker similarity.”
But they pointed out the lack of diversity of accents among speakers, and that some words in the synthesized speech were “unclear, missed, or duplicated.”
They also included an ethical warning about VALL-E and its risks, saying the tool could be misused, for example in “spoofing voice identification or impersonating a specific speaker,” the latter of which a while ago meant that a company transferred $243,000 after it’s CFO, whose voice got cloned, “told” them to.
“To mitigate such risks, it is possible to build a detection model to discriminate whether an audio clip was synthesized by VALL-E,” the developers wrote in the paper. They didn’t give details of how this could be done.
They added that “if the model is generalized to unseen speakers in the real world, it should include a protocol to ensure that the speaker approves the use of their voice.”
Meanwhile, Microsoft announced Monday it will make OpenAI’s ChatGPT available to its own services after announcing its interest in investing $10 billion in the AI writing tool.
Matthew Griffin, described as “The Adviser behind the Advisers” and a “Young Kurzweil,” is the founder and CEO of the World Futures Forum and the 311 Institute, a global Futures and Deep Futures consultancy working between the dates of 2020 to 2070, and is an award winning futurist, and author of “Codex of the Future” series.
Regularly featured in the global media, including AP, BBC, Bloomberg, CNBC, Discovery, RT, Viacom, and WIRED, Matthew’s ability to identify, track, and explain the impacts of hundreds of revolutionary emerging technologies on global culture, industry and society, is unparalleled. Recognised for the past six years as one of the world’s foremost futurists, innovation and strategy experts Matthew is an international speaker who helps governments, investors, multi-nationals and regulators around the world envision, build and lead an inclusive, sustainable future.
A rare talent Matthew’s recent work includes mentoring Lunar XPrize teams, re-envisioning global education and training with the G20, and helping the world’s largest organisations envision and ideate the future of their products and services, industries, and countries.
Matthew's clients include three Prime Ministers and several governments, including the G7, Accenture, Aon, Bain & Co, BCG, Credit Suisse, Dell EMC, Dentons, Deloitte, E&Y, GEMS, Huawei, JPMorgan Chase, KPMG, Lego, McKinsey, PWC, Qualcomm, SAP, Samsung, Sopra Steria, T-Mobile, and many more.
FANATICALFUTURIST PODCAST! Hear about ALL the latest futures news and breakthroughs!SUBSCRIBE
1000's of articles about the exponential future, 1000's of pages of insights, 1000's of videos, and 100's of exponential technologies: Get The Email from 311, your no-nonsense briefing on all the biggest stories in exponential technology and science.