ElevenLabs introduces AI real time dubbing in 20 languages

By Matthew Griffin Intelligence and the Senses 14th October 2023

WHY THIS MATTERS IN BRIEF

Being able to translate what people are saying it – along with their accents and emotions – in real time into other languages is revolutionary in the industry.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

ElevenLabs, a year-old voice cloning and voice synthesis startup founded by former Google and Palantir employees, today announced the launch of AI Dubbing, a dedicated product that can translate any speech, including long-form content, into more than 20 different languages.

Researchers are trying to create a rating system for cybersecurity software

Available to all platform users, the offering comes as a new way to dub audio and video content and can transform an area that has largely been manual for years.

More importantly, it can break language barriers for smaller content creators who don’t have the resources to hire manual translators to convert their content and take it global.

“We have tested and iterated this feature in collaboration with hundreds of content creators to dub their content and make it more accessible to wider audiences,” said Mati Staniszewski, CEO and co-founder of ElevenLabs. “We see huge potential for independent creatives – such as those creating video content and podcasts – all the way through to film and TV studios.”

ElevenLabs claims the feature can deliver high-quality translated audio in minutes – depending on the length of the content – while retaining the original voice of the speaker, complete with their emotions and intonation.

Google's AI enlisted to help crack the secret of nuclear fusion

However, in this age of Artificial Intelligence (AI), when almost every enterprise is looking at language models to drive efficiencies, it is not the only one exploring speech-to-speech translation.

While AI-driven translation involves multiple layers of work, starting from noise removal to speech translation, users at the front end don’t have to go through any of those steps. They just have to select the AI Dubbing tool on ElevenLabs, create a new project, select the source and target languages and upload the file of the content.

Once the content is uploaded, the tool automatically detects the number of speakers and gets to work with a progress bar appearing on the screen. This is just like any other conversion tool on the internet. After completion, the file can be downloaded and used.

Behind the scenes, the tool works by tapping ElevenLabs’ proprietary method to remove background noise, differentiating music and noise from actual dialogue from speakers. It recognizes which speakers speak when, keeping their voices distinct, and transcribes what they say in their original language using a speech-to-text model. Then, this text is translated, adapted (so lengths match) and voiced in the target language to produce the desired speech while retaining the speaker’s original voice characteristics.

Doodle a face and watch this AI image generator make it look more "Human"

Finally, the translated speech is synced back with the music and background noise originally removed from the file, preparing the dubbed output for use. EvenLabs claims this work is the culmination of its research on voice cloning, text and audio processing and multilingual speech synthesis.

For producing the final speech from translated text, the company taps its latest Multilingual v2 model. It currently supports more than 20 languages, including Hindi, Portuguese, Spanish, Japanese, Ukrainian, Polish and Arabic, giving users a wide range of options to globalize their content.

Prior to this end-to-end interface, ElevenLabs offered separate tools for voice cloning and text-to-speech synthesis. This way, if one wanted to translate their audio content, like a podcast, into a different language, they first had to create a clone of their voice on the platform while transcribing and translating the audio separately. Then, using the translated text file and their cloned speech, they could produce audio from the text-to-speech model. Not to mention, this only worked for speech without any major background music or noise.

HR giant Workday launches a new platform to manage all your AI Agents

Staniszewski confirmed that the new dubbing feature will be available to all users of the platform, but will have some character limits, as has been the case with text-to-speech generation. Around one minute of AI Dubbing would typically equate to 3,000 characters, he said.

While ElevenLabs is making headlines with back-to-back developments, it isn’t the only one exploring AI-based voicing. A few weeks back, Microsoft-backed OpenAI made ChatGPT multimodal with the ability to have conversations in response to voice prompts, like Amazons Alexa product.

Here too the company is using speech-to-text and text-to-speech models to convert audio, but the technology is not available to all.

OpenAI said it is using it with select partners to prevent misuse of the capabilities. One of these is Spotify which is using is helping its podcasters transcribe their content into different languages while retaining their own voice.

IBM's "flying brain" Watson heads into space

On his part, Staniszewski said ElevenLabs’ AI Dubbing tool differentiates by translating video or audio of any length, containing any number of speakers, while preserving their voice and emotions across up to 20 languages and delivering the highest quality results.

Other players are also active in the AI-powered voice and speech synthesis space, including MURF.AI, Play.ht and WellSaid Labs.

Just recently, Meta also launched SeamlessM4T, an open-source multilingual foundational model that can understand nearly 100 languages from speech or text and generate translations into either or both in real-time.

According to Market US, the global market for such tools stood at $1.2 billion in 2022 and is estimated to touch nearly $5 billion in 2032, with a CAGR of slightly above 15.40%.

Matthew Griffin / About Author

Matthew Griffin is a multi-award winning Futurist and expert in Disruption and Innovation, Geopolitics, Leadership, and Technology, who NASA have described as a "walking encyclopaedia of the future" and a "futurist Polymath." 15-time best selling author of the "Codex of the Future" series, Matthew is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working with royal households, world leaders, G7, G20, and G77 governments, NGOs, and multi-national mid and mega cap firms to help them explore, shape, and lead the next 50 years of business and society.

An award-winning YouTube creator with over a million followers, with an unrivalled global reach and impact, Matthew is a highly sought-after international keynote speaker, lecturer, and mentor who collaborates with global leaders through the United Nations Alliance of Civilizations (UNAOC) and United Nations General Assembly (UNGA) to shape pivotal initiatives such as the UN’s AI for Humanity program, the United Nations Conference of the Parties (UN COP), and the World Economic Forum in Davos.

As the former Global Head of Cloud, National Security, and Enterprise Sales for companies including Atos, Dell-EMC, and IBM, Matthew has a proven track record of building multi-billion dollar business units and turning failing divisions into market leaders. His ability to identify, analyse, and communicate the implications of hundreds of emerging technologies and trends is unparalleled, and his insights are trusted by many of the world’s most respected organisations, including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi, Coca-Cola, Dentons, Deloitte, Dow Jones, EY, Google, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, Siemens AG and Siemens Energy, T-Mobile, UBS, VISA, Walmart, Workday, Worldpay and many others.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.