Scroll Top

Making AI videos with voice cues will emerge soon

WHY THIS MATTERS IN BRIEF

One day we will replace text inputs for Generative AI with voice on a more regular basis, and it will change how we interact with the AI’s around us.

 

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trendsconnect, watch a keynote, or browse my blog.

A little while ago Artificial Intelligence (AI) leader OpenAI quietly introduced a new AI model called Sora which can create “realistic” and “imaginative” 60-second videos from quick text prompts and now the company’s announced that Sora can now generate videos up to 60 seconds in length from text instructions, with the ability to serve up scenes with multiple characters, specific types of motion, and detailed background details.

 

RELATED
Can your smartphone detect when you're depressed? This startup thinks so

 

“The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world,” the blog post said.

OpenAI said it intends to train the AI models so it can “help people solve problems that require real-world interaction.”

 

The Future of Synthetic Content, by keynote Matthew Griffin

 

This is the latest effort from the company behind the viral chatbot ChatGPT, which continues to push the generative AI movement forward. Although “multi-modal models” are not new and text-to-video models already exist, what sets this apart is the length and accuracy that OpenAI claims Sora to have, according to Reece Hayden, a senior analyst at market research firm ABI Research.

 

RELATED
If satellites can withstand 10,000G then Spintronics can fling them into space

 

Hayden said these types of AI models could have a big impact on digital entertainment markets with new personalized content being streamed across channels.

“One obvious use case is within TV; creating short scenes to support narratives,” Hayden said. “The model is still limited though, but it shows the direction of the market.”

At the same time, OpenAI said Sora is still a work in progress with clear “weaknesses,” particularly when it comes to spatial details of a prompt – mixing up left and right – and cause and effect. It gave the example of creating a video of someone taking a bite out of a cookie but it not having a bite mark right after.

 

RELATED
Quantum computing Rose's Law is Moore's Law on steroids

 

For now, OpenAI’s messaging remains focused on safety. The company said it plans to work with a team of experts to test the latest model and look closely at various areas including misinformation, hateful content and bias. The company said it is also building tools to help detect misleading information.

Sora will first be made available to cybersecurity professors, called “red teamers,” who’ve I’ve shared details on before, who can assess the product for harms or risks. It is also granting access to a number of visual artists, designers and filmmakers to collect feedback on how creative professionals could use it.

The latest update comes as OpenAI continues to advance ChatGPT.

 

RELATED
The size of a grain of rice, researchers unveil the world's smallest computers

 

Earlier this week, the company said it is testing a feature in which users can control ChatGPT’s memory, allowing them to ask the platform to remember chats to make future conversations more personalized or tell it to forget what was previously discussed.

Related Posts

Leave a comment

EXPLORE MORE!

1000's of articles about the exponential future, 1000's of pages of insights, 1000's of videos, and 100's of exponential technologies: Get The Email from 311, your no-nonsense briefing on all the biggest stories in exponential technology and science.

You have Successfully Subscribed!

Pin It on Pinterest

Share This