Skip to content

Askdroid

Menu
  • Home
    • About Us
    • Contact us
  • AI
  • Robotics
  • Podcasts
  • News
  • Blog
Menu
  • Home
    • About Us
    • Contact us
  • AI
  • Robotics
  • Podcasts
  • News
  • Blog
whisper 768x296
Previous Next
Ai Category: Speech & Dialogue for DroidsAi Tags: ASR Multilingual on-device Open-Source Speech-to-Text Transformer
  • Profile
  • Title
  • Website URL
  • Short Description
  • Description
  • Tags
  • Company Name
  • Category
  • Country
  • License
  • Stage
  • Model Size
  • Hardware Requirement
  • API
  • Documentation
  • GitHub
  • Paper / Publication
  • Robots Using

Whisper is an open-source automatic speech recognition (ASR) and speech translation model released by OpenAI in September 2022 under the MIT licence. The largest checkpoint, large-v3, was trained on roughly 1 million hours of weakly labelled audio plus 4 million hours of pseudo-labelled audio collected with large-v2, totalling more than 5 million hours overall. The architecture is a straightforward Transformer encoder-decoder: input audio is split into 30-second chunks, converted into a 128-bin log-Mel spectrogram, and decoded as a sequence of tokens that interleave language identification, timestamps, and transcription. Six model sizes are available — tiny (~39M), base, small, medium, large, and turbo (an optimised large-v3) — with English-only variants for the four smaller sizes. The pre-trained checkpoints generalise zero-shot to nearly 100 languages, with around 50% fewer errors than competing open systems on diverse benchmarks, plus translation from many languages into English. Multiple optimised runtimes — whisper.cpp, faster-whisper, Whisper JAX, WhisperKit, and AMD’s Ryzen NPU implementation — let it run fully offline on Raspberry Pi, mobile, and embedded devices, making it the standard ASR for privacy-sensitive on-robot voice input.

OpenAI Whisper (local)
Website URL

Open-source automatic speech recognition model from OpenAI. Trained on 680,000+ hours of multilingual audio (5M+ for large-v3); six model sizes from ~39M tiny to 1.55B large. Runs fully offline via whisper.cpp, faster-whisper, or WhisperKit — robust to accents, noise, and 100+ languages.

Whisper is an open-source automatic speech recognition (ASR) and speech translation model released by OpenAI in September 2022 under the MIT licence. The largest checkpoint, large-v3, was trained on roughly 1 million hours of weakly labelled audio plus 4 million hours of pseudo-labelled audio collected with large-v2, totalling more than 5 million hours overall. The architecture is a straightforward Transformer encoder-decoder: input audio is split into 30-second chunks, converted into a 128-bin log-Mel spectrogram, and decoded as a sequence of tokens that interleave language identification, timestamps, and transcription. Six model sizes are available — tiny (~39M), base, small, medium, large, and turbo (an optimised large-v3) — with English-only variants for the four smaller sizes. The pre-trained checkpoints generalise zero-shot to nearly 100 languages, with around 50% fewer errors than competing open systems on diverse benchmarks, plus translation from many languages into English. Multiple optimised runtimes — whisper.cpp, faster-whisper, Whisper JAX, WhisperKit, and AMD’s Ryzen NPU implementation — let it run fully offline on Raspberry Pi, mobile, and embedded devices, making it the standard ASR for privacy-sensitive on-robot voice input.

ASR, Multilingual, on-device, Open-Source, Speech-to-Text, and Transformer
OpenAI
Speech & Dialogue for Droids
United States
Open source (MIT)
Production-ready
39M – 1.55B (tiny / base / small / medium / large / turbo)
On-device possible (tiny/base on CPU/Raspberry Pi; large needs GPU or optimised runtime)
Python SDK; many third-party runtimes (whisper.cpp; faster-whisper; WhisperKit)
Documentation URL
GitHub URL
Radford et al., 'Robust Speech Recognition via Large-Scale Weak Supervision' (2022) — arXiv:2212.04356
Used in many DIY and research humanoid voice stacks; standard component in offline robot voice pipelines

Recent Posts

  • Wayve Robotaxi: How a Cambridge Startup Is Rivaling Waymo Without a Single LiDAR
  • Versius Plus and the Gynecology Frontier: CMR Surgical’s FDA Submission and the Future of U.S. Surgical Robotics
  • Autonomous Drone Inspection in 2026: How Industrial Drones Are Replacing Human Inspectors
  • Amazon Sequoia: The Next-Generation Warehouse Robot Arriving in 2026
  • Pudu Robotics Raises 50M and Pivots to Industrial AMR Market in 2026

Recent Comments

No comments to show.

Archives

  • May 2026
  • April 2026
  • October 2024
  • August 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023

Categories

  • Blog
  • News
  • Podcast
  • Uncategorized

Agriculture & Farming
AI Software & SaaS
Autonomous Systems
Aviation & Aerospace
Civil Engineering & Geospatial
Construction & Infrastructure
Defense & Security
Energy & Renewables
General Purpose & Humanoid
Hardware & Components
Healthcare & Medical
Hospitality & Wellness
Industries
Logistics & Warehousing
Manufacturing & Industrial
Product Type
Public Safety & Emergency
R&D & Developer Tools
Robotics Integration & Services
Robots & Automated Systems

Edge AI Hardware for Droids
Motion Planning & Control
Multimodal LLMs for Embodied AI
Robot Foundation Models
Safety & Alignment for Physical Robots
Simulation Platforms
Speech & Dialogue for Droids
Teleoperation & Data Collection Tools
Vision & Perception AI
Vision-Language-Action Models

Let's get in touch with us

At the intersection of innovation and technology, we are pioneers crafting a landscape for the digital age.
Please enable JavaScript in your browser to complete this form.
Name *
Loading

Contact Us

Call Us

+44 (0) 1483 870170

Email:

info@askdroid.com

Follow Us on

Copyright © 2026, Askdroid. All Rights Reserved
  • Home
    • About Us
    • Contact us
  • AI
  • Robotics
  • Podcasts
  • News
  • Blog
Change Location
Find awesome listings near you!