Helix is a generalist VLA developed by Figure AI to control its humanoid robots. It uses a decoupled dual-system architecture. System 2 (S2) is an open-weight, internet-pre-trained vision-language model of roughly 7B parameters that handles scene understanding and language comprehension at about 7-9 Hz. System 1 (S1) is a fast visuomotor policy of roughly 80M parameters that converts S2’s latent representation into smooth real-time actions at around 200 Hz. This split lets the robot ‘think slow’ about goals while ‘acting fast’ to execute them. Helix is, by Figure’s account, the first VLA to output high-rate continuous control of the full humanoid upper body — wrists, torso, head and individual fingers, around 35 degrees of freedom — and the first to run on two robots at once so they can cooperate on a shared, long-horizon task with objects neither has seen before. It uses a single set of network weights for all behaviours, with no task-specific fine-tuning, and was trained on roughly 500 hours of teleoperated multi-robot demonstrations with VLM-generated natural-language labels. Limitations: it is fully closed — no weights, code, paper or API — and is tightly bound to Figure’s own hardware; published evidence is company demonstrations rather than independent benchmarks. Helix runs entirely on-board the robot’s embedded GPUs.
Helix is Figure AI's in-house vision-language-action model for humanoid robots. It is the first VLA to drive high-rate continuous control of an entire humanoid upper body, and can run two robots collaborating on one task.
Helix is a generalist VLA developed by Figure AI to control its humanoid robots. It uses a decoupled dual-system architecture. System 2 (S2) is an open-weight, internet-pre-trained vision-language model of roughly 7B parameters that handles scene understanding and language comprehension at about 7-9 Hz. System 1 (S1) is a fast visuomotor policy of roughly 80M parameters that converts S2’s latent representation into smooth real-time actions at around 200 Hz. This split lets the robot ‘think slow’ about goals while ‘acting fast’ to execute them. Helix is, by Figure’s account, the first VLA to output high-rate continuous control of the full humanoid upper body — wrists, torso, head and individual fingers, around 35 degrees of freedom — and the first to run on two robots at once so they can cooperate on a shared, long-horizon task with objects neither has seen before. It uses a single set of network weights for all behaviours, with no task-specific fine-tuning, and was trained on roughly 500 hours of teleoperated multi-robot demonstrations with VLM-generated natural-language labels. Limitations: it is fully closed — no weights, code, paper or API — and is tightly bound to Figure’s own hardware; published evidence is company demonstrations rather than independent benchmarks. Helix runs entirely on-board the robot’s embedded GPUs.
