Gemini Robotics brings Gemini’s multimodal reasoning into the physical world. It comprises two complementary models. Gemini Robotics is the vision-language-action model: built on Gemini 2.0 with physical actions added as a new output modality so it can directly control robots. Gemini Robotics-ER (Embodied Reasoning) is an advanced vision-language model that supplies spatial understanding — pointing, 3D object detection, grasp and trajectory intuition — and can plan multi-step tasks, call digital tools such as Google Search, and connect to a roboticist’s existing low-level controllers. In an end-to-end setting Gemini Robotics-ER reaches a 2-3x success rate over Gemini 2.0 alone. The family has since advanced to Gemini Robotics 1.5 (available to select partners) and Gemini Robotics-ER 1.6 (in preview via the Gemini API), and a Gemini Robotics On-Device variant optimised to run locally on a robot and adaptable with 50-100 demonstrations. A core strength is generality: the model is designed to work on robots of many shapes and to solve tasks it was not explicitly trained for. Limitations: the full action model is closed and available only to trusted testers and partners — only the ER reasoning model is exposed through the Gemini API; parameter counts are undisclosed; and physical safety with generative models on real robots remains an active research area. The directory should expect rapid version churn here.
Gemini Robotics is Google DeepMind's VLA family built on Gemini. It adds physical action as an output modality, and is paired with Gemini Robotics-ER, an embodied-reasoning model for spatial understanding and multi-step planning.
Gemini Robotics brings Gemini’s multimodal reasoning into the physical world. It comprises two complementary models. Gemini Robotics is the vision-language-action model: built on Gemini 2.0 with physical actions added as a new output modality so it can directly control robots. Gemini Robotics-ER (Embodied Reasoning) is an advanced vision-language model that supplies spatial understanding — pointing, 3D object detection, grasp and trajectory intuition — and can plan multi-step tasks, call digital tools such as Google Search, and connect to a roboticist’s existing low-level controllers. In an end-to-end setting Gemini Robotics-ER reaches a 2-3x success rate over Gemini 2.0 alone. The family has since advanced to Gemini Robotics 1.5 (available to select partners) and Gemini Robotics-ER 1.6 (in preview via the Gemini API), and a Gemini Robotics On-Device variant optimised to run locally on a robot and adaptable with 50-100 demonstrations. A core strength is generality: the model is designed to work on robots of many shapes and to solve tasks it was not explicitly trained for. Limitations: the full action model is closed and available only to trusted testers and partners — only the ER reasoning model is exposed through the Gemini API; parameter counts are undisclosed; and physical safety with generative models on real robots remains an active research area. The directory should expect rapid version churn here.
