Skip to content

Askdroid

Menu
  • Home
    • About Us
    • Contact us
  • AI
  • Robotics
  • Podcasts
  • News
  • Blog
Menu
  • Home
    • About Us
    • Contact us
  • AI
  • Robotics
  • Podcasts
  • News
  • Blog
Octo 1 768x364
Previous Next
Ai Category: Vision-Language-Action ModelsAi Tags: Embodied AI Foundation Model Manipulation Open-Source vision-language-action
  • Profile
  • Title
  • Short Description
  • Description
  • Tags
  • Company Name
  • Category
  • Country
  • License
  • Stage
  • Model Size
  • Hardware Requirement
  • API
  • Documentation
  • GitHub
  • Paper / Publication
  • Robots Using

Octo is a transformer-based diffusion policy pre-trained on 800,000 robot episodes from the Open X-Embodiment dataset. It is deliberately lightweight, released in two sizes: Octo-Small (27M parameters) and Octo-Base (93M parameters, equivalent to a ViT-B). Images are encoded with a lightweight convolutional tokeniser and split into patches; language is encoded with a T5-Base text encoder. A modular block-wise attention structure lets the model accept different inputs — one or more RGB cameras, wrist cameras, goal images or language instructions — simply by changing which tokens are fed in. Actions are produced by a small conditional diffusion head that predicts continuous, multi-modal action distributions, with only one transformer forward pass per action. Crucially, Octo can be adapted to new sensory inputs (such as force-torque feedback), new action spaces (joint-position control) and new robot morphologies by adding adapters and fine-tuning on a small dataset with an accessible compute budget. Out of the box it outperforms RT-1-X and performs comparably to the far larger 55B RT-2-X on language-conditioned tasks. Limitations: as a compact model it has weaker language grounding than 7B-class VLAs; it targets table-top manipulation; and it is a research project rather than a supported product. Octo is a popular efficient baseline for labs that cannot run 7B-plus models.

Octo

Octo is a small, fully open-source generalist robot policy built as a transformer-based diffusion model. Its modular design lets it be quickly fine-tuned to new robots, sensors and action spaces on a modest compute budget.

Octo is a transformer-based diffusion policy pre-trained on 800,000 robot episodes from the Open X-Embodiment dataset. It is deliberately lightweight, released in two sizes: Octo-Small (27M parameters) and Octo-Base (93M parameters, equivalent to a ViT-B). Images are encoded with a lightweight convolutional tokeniser and split into patches; language is encoded with a T5-Base text encoder. A modular block-wise attention structure lets the model accept different inputs — one or more RGB cameras, wrist cameras, goal images or language instructions — simply by changing which tokens are fed in. Actions are produced by a small conditional diffusion head that predicts continuous, multi-modal action distributions, with only one transformer forward pass per action. Crucially, Octo can be adapted to new sensory inputs (such as force-torque feedback), new action spaces (joint-position control) and new robot morphologies by adding adapters and fine-tuning on a small dataset with an accessible compute budget. Out of the box it outperforms RT-1-X and performs comparably to the far larger 55B RT-2-X on language-conditioned tasks. Limitations: as a compact model it has weaker language grounding than 7B-class VLAs; it targets table-top manipulation; and it is a research project rather than a supported product. Octo is a popular efficient baseline for labs that cannot run 7B-plus models.

Embodied AI, Foundation Model, Manipulation, Open-Source, and vision-language-action
UC Berkeley, Stanford, Carnegie Mellon University, Google DeepMind
Vision-Language-Action Models
United States
Open source (MIT)
Research prototype
27M (Octo-Small) / 93M (Octo-Base)
On-device possible (lightweight; ~8 GB GPU VRAM or less; fine-tunes on a single mid-range GPU)
Python SDK / HuggingFace
Documentation URL
GitHub URL
Octo Model Team, 'Octo: An Open-Source Generalist Robot Policy' (RSS 2024) - arXiv:2405.12213
WidowX (BridgeData V2) and Franka-based table-top setups; UC Berkeley's bimanual and peg-insertion rigs used in fine-tuning experiments. Academic research and demonstrations only.

Recent Posts

  • Wayve Robotaxi: How a Cambridge Startup Is Rivaling Waymo Without a Single LiDAR
  • Versius Plus and the Gynecology Frontier: CMR Surgical’s FDA Submission and the Future of U.S. Surgical Robotics
  • Autonomous Drone Inspection in 2026: How Industrial Drones Are Replacing Human Inspectors
  • Amazon Sequoia: The Next-Generation Warehouse Robot Arriving in 2026
  • Pudu Robotics Raises 50M and Pivots to Industrial AMR Market in 2026

Recent Comments

No comments to show.

Archives

  • May 2026
  • April 2026
  • October 2024
  • August 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023

Categories

  • Blog
  • News
  • Podcast
  • Uncategorized

Agriculture & Farming
AI Software & SaaS
Autonomous Systems
Aviation & Aerospace
Civil Engineering & Geospatial
Construction & Infrastructure
Defense & Security
Energy & Renewables
General Purpose & Humanoid
Hardware & Components
Healthcare & Medical
Hospitality & Wellness
Industries
Logistics & Warehousing
Manufacturing & Industrial
Product Type
Public Safety & Emergency
R&D & Developer Tools
Robotics Integration & Services
Robots & Automated Systems

Edge AI Hardware for Droids
Motion Planning & Control
Multimodal LLMs for Embodied AI
Robot Foundation Models
Safety & Alignment for Physical Robots
Simulation Platforms
Speech & Dialogue for Droids
Teleoperation & Data Collection Tools
Vision & Perception AI
Vision-Language-Action Models

Let's get in touch with us

At the intersection of innovation and technology, we are pioneers crafting a landscape for the digital age.
Please enable JavaScript in your browser to complete this form.
Name *
Loading

Contact Us

Call Us

+44 (0) 1483 870170

Email:

info@askdroid.com

Follow Us on

Copyright © 2026, Askdroid. All Rights Reserved
  • Home
    • About Us
    • Contact us
  • AI
  • Robotics
  • Podcasts
  • News
  • Blog
Change Location
Find awesome listings near you!