Making AI videos with voice cues will emerge soon

By Matthew Griffin Intelligence and the Senses 18th March 2024

WHY THIS MATTERS IN BRIEF

One day we will replace text inputs for Generative AI with voice on a more regular basis, and it will change how we interact with the AI’s around us.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

A little while ago Artificial Intelligence (AI) leader OpenAI quietly introduced a new AI model called Sora which can create “realistic” and “imaginative” 60-second videos from quick text prompts and now the company’s announced that Sora can now generate videos up to 60 seconds in length from text instructions, with the ability to serve up scenes with multiple characters, specific types of motion, and detailed background details.

Overwatch starts using AI to identify the world's best players

“The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world,” the blog post said.

OpenAI said it intends to train the AI models so it can “help people solve problems that require real-world interaction.”

The Future of Synthetic Content, by keynote Matthew Griffin

This is the latest effort from the company behind the viral chatbot ChatGPT, which continues to push the generative AI movement forward. Although “multi-modal models” are not new and text-to-video models already exist, what sets this apart is the length and accuracy that OpenAI claims Sora to have, according to Reece Hayden, a senior analyst at market research firm ABI Research.

MIT researchers have taught their AI to recognise sounds

Hayden said these types of AI models could have a big impact on digital entertainment markets with new personalized content being streamed across channels.

“One obvious use case is within TV; creating short scenes to support narratives,” Hayden said. “The model is still limited though, but it shows the direction of the market.”

At the same time, OpenAI said Sora is still a work in progress with clear “weaknesses,” particularly when it comes to spatial details of a prompt – mixing up left and right – and cause and effect. It gave the example of creating a video of someone taking a bite out of a cookie but it not having a bite mark right after.

Amazon's AI reportedly tracks, monitors and automatically fires hundreds of employees

For now, OpenAI’s messaging remains focused on safety. The company said it plans to work with a team of experts to test the latest model and look closely at various areas including misinformation, hateful content and bias. The company said it is also building tools to help detect misleading information.

Sora will first be made available to cybersecurity professors, called “red teamers,” who’ve I’ve shared details on before, who can assess the product for harms or risks. It is also granting access to a number of visual artists, designers and filmmakers to collect feedback on how creative professionals could use it.

The latest update comes as OpenAI continues to advance ChatGPT.

Disney just unveiled its high def face swapping tech

Earlier this week, the company said it is testing a feature in which users can control ChatGPT’s memory, allowing them to ask the platform to remember chats to make future conversations more personalized or tell it to forget what was previously discussed.

Matthew Griffin / About Author

Matthew Griffin is a multi-award winning Futurist and expert in Disruption and Innovation, Geopolitics, Leadership, and Technology, who NASA have described as a "walking encyclopaedia of the future" and a "futurist Polymath." 15-time best selling author of the "Codex of the Future" series, Matthew is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working with royal households, world leaders, G7, G20, and G77 governments, NGOs, and multi-national mid and mega cap firms to help them explore, shape, and lead the next 50 years of business and society.

An award-winning YouTube creator with over a million followers, with an unrivalled global reach and impact, Matthew is a highly sought-after international keynote speaker, lecturer, and mentor who collaborates with global leaders through the United Nations Alliance of Civilizations (UNAOC) and United Nations General Assembly (UNGA) to shape pivotal initiatives such as the UN’s AI for Humanity program, the United Nations Conference of the Parties (UN COP), and the World Economic Forum in Davos.

As the former Global Head of Cloud, National Security, and Enterprise Sales for companies including Atos, Dell-EMC, and IBM, Matthew has a proven track record of building multi-billion dollar business units and turning failing divisions into market leaders. His ability to identify, analyse, and communicate the implications of hundreds of emerging technologies and trends is unparalleled, and his insights are trusted by many of the world’s most respected organisations, including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi, Coca-Cola, Dentons, Deloitte, Dow Jones, EY, Google, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, Siemens AG and Siemens Energy, T-Mobile, UBS, VISA, Walmart, Workday, Worldpay and many others.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.