GR00T (Generalist Robot 00 Technology) is NVIDIA’s open VLA family for humanoid robots, first announced at GTC 2025. It uses a dual-system architecture inspired by human cognition: System 2 is a vision-language module (built on NVIDIA’s Eagle VLM) that interprets the scene and language instruction, while System 1 is a diffusion-transformer action module that generates fluid motor commands in real time via flow matching. The two systems are tightly coupled and trained end-to-end on a heterogeneous mix of real robot trajectories, egocentric human videos and synthetically generated data (the GR00T-Dreams / Cosmos pipeline). The first release, GR00T N1, was a 2.2B-parameter model; it was followed by GR00T N1.5 and the current GR00T N1.6-3B documented release, with N1.7 in early access using a new VLM backbone and 20,000 hours of human-video pre-training. The model is cross-embodiment and can be post-trained for a specific humanoid with as few as 20-40 demonstrations. Limitations: the weights are released under NVIDIA’s own licence (largely non-commercial for the model checkpoints — review carefully); it is optimised for the NVIDIA stack (Isaac Lab, Isaac Sim, Jetson); and real-time humanoid control needs substantial on-board compute. The directory should note that GR00T is a fast-moving family — the version number listed will date quickly.
Isaac GR00T is NVIDIA's open foundation-model family for generalist humanoid robots. It uses a dual-system vision-language-action design and is openly released for developers to post-train onto their own humanoid platforms.
GR00T (Generalist Robot 00 Technology) is NVIDIA’s open VLA family for humanoid robots, first announced at GTC 2025. It uses a dual-system architecture inspired by human cognition: System 2 is a vision-language module (built on NVIDIA’s Eagle VLM) that interprets the scene and language instruction, while System 1 is a diffusion-transformer action module that generates fluid motor commands in real time via flow matching. The two systems are tightly coupled and trained end-to-end on a heterogeneous mix of real robot trajectories, egocentric human videos and synthetically generated data (the GR00T-Dreams / Cosmos pipeline). The first release, GR00T N1, was a 2.2B-parameter model; it was followed by GR00T N1.5 and the current GR00T N1.6-3B documented release, with N1.7 in early access using a new VLM backbone and 20,000 hours of human-video pre-training. The model is cross-embodiment and can be post-trained for a specific humanoid with as few as 20-40 demonstrations. Limitations: the weights are released under NVIDIA’s own licence (largely non-commercial for the model checkpoints — review carefully); it is optimised for the NVIDIA stack (Isaac Lab, Isaac Sim, Jetson); and real-time humanoid control needs substantial on-board compute. The directory should note that GR00T is a fast-moving family — the version number listed will date quickly.
