Google Launches Gemini 2.5 Computer Use Model to Power Smarter, UI-Navigating AI Agents

Early testers, including Google’s own payments team, Autotab, and Poke.com, report significant speed and reliability gains

Google Launches Gemini 2.5 Computer Use Model to Power Smarter, UI-Navigating AI Agents
(Photo-Freepik)

Google has unveiled the Gemini 2.5 Computer Use model, a new AI system built on Gemini 2.5 Pro, designed to enable agents that can interact directly with user interfaces.

Now available in preview through the Gemini API in Google AI Studio and Vertex AI, the model lets AI agents perform on-screen actions like clicking, typing, and form-filling — essentially using apps and websites the way humans do.

Unlike traditional models that rely on APIs, Gemini 2.5 Computer Use tackles digital tasks that require hands-on interaction, such as navigating logins or completing web forms.

The model’s new computer_use tool processes user inputs, screenshots, and past actions, then decides the next step — from clicking a button to requesting user confirmation before executing sensitive actions.

Google says the model outperforms leading alternatives in both accuracy and latency, according to benchmarks by Browserbase and internal evaluations. While it’s currently optimised for web environments, it also shows strong promise for mobile UIs.

To ensure safety, Google has embedded guardrails to prevent misuse, prompt injections, and unauthorized actions. Developers can also enforce manual confirmations for high-risk tasks.

Early testers, including Google’s own payments team, Autotab, and Poke.com, report significant speed and reliability gains, with workflows completing up to 50% faster.