Google DeepMind’s Gemini 1.5 Brings AI Robots Closer to the Real World

Gemini Robotics 1.5 translates high-level instructions and visual inputs into robot motor commands.

Google DeepMind’s Gemini 1.5 Brings AI Robots Closer to the Real World

DeepMind has today introduced Gemini Robotics 1.5, an advanced vision-language-action model that helps robots “think before acting,” along with Gemini Robotics-ER 1.5, a reasoning engine that plans multi-step tasks. These new models expand DeepMind’s efforts to bring AI agents into the physical world.

Gemini Robotics 1.5 translates high-level instructions and visual inputs into robot motor commands, while generating a chain of reasoning in natural language. This transparency enables robots to assess complex tasks before executing them, such as sorting items correctly in bins based on context. Gemini Robotics-ER 1.5, meanwhile, handles environment reasoning, planning and tool use, and can call digital tools natively to support its decisions.

"This model thinks before taking action and shows its process, helping robots assess and complete complex tasks more transparently. It also learns across embodiments, accelerating skill learning," Google DeepMind said in a blog post.

A key innovation is that these models support learning across embodiments: behaviors learned on one robot can transfer to another without retraining for each specific device. DeepMind claims this accelerates the deployment of robotics capabilities across diverse platforms.

Gemini Robotics-ER 1.5 is now available via the Gemini API in Google AI Studio, while the more embodied Gemini Robotics 1.5 is being rolled out to select partners. DeepMind is also releasing an upgraded ASIMOV benchmark to improve safety evaluation and semantic reasoning in robot behavior.

Together, the launch of these models marks an important step toward AI agents that reason, plan, and act in the physical world — setting the stage for more capable robots integrated into real-world environments.