Google Gemini can now click, scroll, and type in your browser

By Matthew Griffin Robo Revolution 1st October 2025

WHY THIS MATTERS IN BRIEF

Automation is coming to everyone’s desktop and that’s good and bad.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

In news that’s wonderful for anyone who wants to automate their desktop work, and hackers who are finding new ways to take over your computer and destroy your life, Google is previewing a new Gemini Artificial Intelligence (AI) model designed to navigate and interact with the web via a browser, letting AI agents do things inside interfaces designed for use by people and not robots. The model, called Gemini 2.5 Computer Use, uses “visual understanding and reasoning capabilities” to analyze a user’s request and carry out a task, such as filling out and submitting a form.

Hyper realistic robot dolphins pave the way for the first animatronic aquariums

It can be used for UI testing or navigating interfaces made for people who don’t have an API or other direct connection available. Other versions of this model have been used for agentic features in AI Mode and Project Mariner, a research prototype that uses AI agents to carry out tasks on its own in a browser, like adding items to your cart based on a list of ingredients.

See it in action

Google’s announcement comes just one day after OpenAI revealed new apps for ChatGPT as part of its annual Dev Day, and continues to focus its attention on its ChatGPT Agent feature that can complete complex tasks on your behalf. Meanwhile, Anthropic had already released a version of its Claude AI model with “computer use” last year.

Google posted some demo videos which you can see here showing its computer use tool in action, and notes that they are sped up 3x.

Las Vegas goes high tech with cocktail mixing robots

Google says its computer use model “outperforms leading alternatives on multiple web and mobile benchmarks.” Unlike ChatGPT Agent and Anthropic’s computer use tool, Google’s new AI model only has access to a browser — not an entire computer environment. Google notes that it shows “it is not yet optimized for desktop OS-level control” and currently supports 13 actions, including opening a web browser, typing text, as well as dragging and dropping elements.

Gemini 2.5 Computer Use is available to developers through Google AI Studio and Vertex AI, but there’s also a demo on Browserbase, where you watch as it completes tasks, like “Play a game of 2048” or “Browse Hacker News for trending debates.”

Matthew Griffin / About Author

Matthew Griffin is a multi-award winning Futurist and expert in Disruption and Innovation, Geopolitics, Leadership, and Technology, who NASA have described as a "walking encyclopaedia of the future" and a "futurist Polymath." 15-time best selling author of the "Codex of the Future" series, Matthew is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working with royal households, world leaders, G7, G20, and G77 governments, NGOs, and multi-national mid and mega cap firms to help them explore, shape, and lead the next 50 years of business and society.

An award-winning YouTube creator with over a million followers, with an unrivalled global reach and impact, Matthew is a highly sought-after international keynote speaker, lecturer, and mentor who collaborates with global leaders through the United Nations Alliance of Civilizations (UNAOC) and United Nations General Assembly (UNGA) to shape pivotal initiatives such as the UN’s AI for Humanity program, the United Nations Conference of the Parties (UN COP), and the World Economic Forum in Davos.

As the former Global Head of Cloud, National Security, and Enterprise Sales for companies including Atos, Dell-EMC, and IBM, Matthew has a proven track record of building multi-billion dollar business units and turning failing divisions into market leaders. His ability to identify, analyse, and communicate the implications of hundreds of emerging technologies and trends is unparalleled, and his insights are trusted by many of the world’s most respected organisations, including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi, Coca-Cola, Dentons, Deloitte, Dow Jones, EY, Google, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, Siemens AG and Siemens Energy, T-Mobile, UBS, VISA, Walmart, Workday, Worldpay and many others.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.