WHY THIS MATTERS IN BRIEF
Being able to create short videos from just one data source using just a single click will make video creation a whole lot faster and easier for everyone.
What if you could create a YouTube video, or even a full length movie by just writing the script, or having a Artificial Intelligence (AI) write the script for you, and then just get another AI to automatically turn it into a video right in front of your eyes? Or better yet, what if you could just think about your movie and an AI reads your brain waves and creates it for you?
All of these realities, and many more, are already reality today, albeit at a basic level, thanks to the development of new advanced AI’s that are being used to create increasingly sophisticate so called synthetic content – from books, course content, and games, to art, images, and you guessed it, even fake news.
Now, stepping it up a notch Baidu in China have created yet another new form of AI that can create short videos from text based content using nothing more than a click of a mouse – like this one which now has millions of views around the world.
This news video was created by the new AI, click to watch.
Near the end of 2019, when Baidu’s AI, named ERNIE, beat Google’s AI, named BERT, in its understanding of human natural language, a team at Baidu Research was already prepping ERNIE for a new tool. They envisioned a program that could analyze the text from a URL, synthesize a pithy narrative, and align it with machine-selected clips to churn out a 2 minute video complete with voice over – all in less time than it would take to play a song.
Last month, a prototype version of such a program, called VidPress, debuted. The AI’s goal is not only to save human video editors’ time but also to eventually outperform them in quality.
In a test performed by the team within Baidu’s video platform, Haokan, it took up to 9 minutes for VidPress to generate a video from scratch. When it comes to viewers’ video completion rate, a rough proxy for quality, viewers stayed with 65 percent of VidPress’s videos from the beginning to the end, whereas the rate for videos produced by human editors was 50 percent, says Xi Chen, a research engineer at Baidu.
Chen and his team of engineers at Baidu Research in the San Francisco Bay Area are not alone in testing AI for the booming short-video market. For example, GliaStudio, a Taiwan based startup, has been creating video summaries of articles since 2015 but few startups have the resources and advantages Baidu has, Chen says.
With access to ERNIE and other Baidu proprietary technologies, including computer-vision programs, the VidPress team is “standing on giant’s shoulder,” says Julia Li, director of Baidu Research USA.
To understand how VidPress works, Li explains, consider someone feeding a web page about the death of NBA basketball star Kobe Bryant, who was killed in a helicopter accident in January 2020, to the tool.
On one level of a parallel process, VidPress generates a lightweight version of the story, making sure that important sentences, which can be crafted by the AI or pulled directly from the web page, appear early in the script. Such sentences might include keywords like “helicopter” and “Kobe.” During this step, the program also ensures that the logical structure of the summary is coherent and clear, and it can also fix human writers’ bad habits, such as using vague pronouns, Li says.
After having text-to-speech services convert the script into a synthesized speech, VidPress sets “anchors” in this audio track to suggest time points where viewers are most interested in seeing new visuals. Chen and colleagues wrote a decision-tree model to choose these anchor points based on how well the content around them correlates with the theme of the story. The system also pays attention to phrases people are normally curious about, such as the names of brands and locations.
On the other parallel level, VidPress finds and scores relevant media captured from the Internet, starting from the given web page and through other relevant pages on Baidu’s newsfeed network Baijiahao. The algorithms are written in such a way that only higher-ranking videos or images are aligned to those anchor points in the timeline. Chen says the team is working on accessing general web pages, and developing capabilities to use commercial clients’ copyrighted databases.
Baidu’s machine vision technologies are also involved. So, after a crash-site photo in the video about Bryant, Li says, VidPress can add post-match interview footage of Bryant and not of another NBA player when recapping Bryant’s career.
This ability to mine materials in multiple formats including text and visuals from a vast database of websites, as well as the ability to create a timeline dotted with anchor points to hook people’s attention, allows VidPress to improve viewers’ satisfaction, Li explains. That’s probably why VidPress had a better video completion rate than human editors, she says.
An observer of China’s technology industry, Hefei Zhang, notes in an online post that the value of VidPress lies in how it uses algorithms to reduce the time costs of material organisation, video editing and video compilation. Like most AI products in the market, although VidPress saves time, it can’t yet replace or outperform humans in creativity, he says.
As Baidu’s Li points out, becoming more creative and even providing customized video content based on viewers’ tastes is a direction they’d like to take VidPress, but she acknowledges it’s not there yet… but, let’s face it, this is a very fast moving market and it won’t be long before it not just matches human creators and editors but surpasses them too.