Scroll Top

Google advances AI vision with launch of Project Astra

WHY THIS MATTERS IN BRIEF

There is an AI arms race and Google got caught napping, so now they’re trying to research their way out of trouble.

 

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trendsconnect, watch a keynote, or browse my blog.

Google owner Alphabet has unveiled an Artificial Intelligence (AI) agent that can answer real-time queries across video, audio and text, as part of a number of initiatives designed to showcase its prowess in AI and quell criticism that it has fallen behind rivals.

 

RELATED
US military and DARPA team up to develop tech to uncover fake news

 

Chief executive Sundar Pichai demonstrated the Silicon Valley giant’s new “multi-modal” AI assistant called Project Astra, powered by an upgraded version of its Gemini model, during an annual developer conference on Tuesday. Astra was part of a series of announcements to showcase a new AI-centric vision for Google. It follows product launches and upgraded AI models from Big Tech rivals including Meta, Microsoft and its partner OpenAI. In a video demonstration, Google’s prototype AI assistant responded to voice commands based on an analysis of what it sees through a phone camera or when using a pair of smart glasses. It successfully identified sequences of code, suggested improvements to electrical circuit diagrams, recognised the King’s Cross area of London through the camera lens, and reminded the user where they had left their glasses.

 

See it in action.

 

Google plans to start adding Astra’s capabilities to its Gemini app and across its products this year, Pichai said. However, he said that while the ultimate “goal is to make Astra seamlessly available” across the company’s software, it would be rolled out cautiously and “the path to productisation will be quality driven.”

 

RELATED
China wants to shape and lead global AI standards

 

“Getting response time down to something conversational is a difficult engineering challenge,” said Sir Demis Hassabis, head of its AI research arm DeepMind. “It is amazing to see how far AI has come, especially when it comes to spatial understanding, video processing and memory.”

At the conference, Google also set out big changes to its core search engine. From this week, all US users will see an “AI Overview” – a brief AI-generated summary answer to the query – at the top of many common search results, followed by clickable links interspersed with advertisements lower down. The company said the search system would be able to answer complex questions with multi-step reasoning  – meaning the AI agent can make several independent decisions in order to complete a task – and help customers generate search queries using voice and video. Liz Reid, head of Google search, said the aim was to “remove some of the legwork in search” and that AI overview would be expanded to users in other parts of the world later this year. The changes come as OpenAI threatens Google’s search business. The San Francisco-based start-up’s ChatGPT chatbot provides quick and complete answers to many questions, threatening to render obsolete search results that provide a traditional list of links alongside advertising. OpenAI has also signed deals with media organisations to include up-to-date information to improve its responses.

 

RELATED
Google has taught its DeepMind AI to dream

On Monday – in a move seen as an attempt to upstage Google’s announcements – OpenAI demonstrated a faster and cheaper version of the model that powers ChatGPT, which can similarly interpret voice, video, images and code in a single interface.

Google also revealed new or improved AI products including Veo, which generates video from text prompts; Imagen 3, which creates pictures; and Lyria, a model for AI music generation. Subscribers to Gemini Advanced will be able to create personalised chatbots called “Gems” to help with specific tasks. The company’s flagship Gemini 1.5 Pro model has also been upgraded. It now has a much larger context window of 2 Million tokens – referring to the amount of data such as code or images that it can refer to when generating a response – making it better at following nuanced instructions and referring back to earlier conversations.

Related Posts

Leave a comment

EXPLORE MORE!

1000's of articles about the exponential future, 1000's of pages of insights, 1000's of videos, and 100's of exponential technologies: Get The Email from 311, your no-nonsense briefing on all the biggest stories in exponential technology and science.

You have Successfully Subscribed!

Pin It on Pinterest

Share This