Qwen-VL is the open-weights multimodal large language model family from Alibaba’s Qwen team, with multiple successive generations released since 2023. The current Qwen3-VL line ships in both dense and Mixture-of-Experts architectures scaling from edge (~2-3B) to cloud (235B-A22B), with Instruct and reasoning-enhanced variants. The widely adopted Qwen2.5-VL series, released in January 2025, comes in 3B, 7B, 32B, and 72B parameter sizes — all open-weight, with the smaller variants under Apache 2.0 and the 72B under a Qwen-specific licence. The flagship Qwen2.5-VL-72B-Instruct reportedly performs comparably to GPT-4o and Claude 3.5 Sonnet on multimodal benchmarks, and the family supports advanced visual comprehension of charts, diagrams, layouts, and forms; structured output (e.g. JSON from invoices); video understanding longer than an hour with second-level segment localisation; and end-to-end bilingual Chinese/English text recognition. Together with native bounding-box input and output, this makes Qwen-VL a leading open-weights alternative to GPT-4o and Gemini for droids that need vision-language reasoning without cloud dependency. AWQ-quantised checkpoints are available for the 3B, 7B, and 72B Qwen2.5-VL variants for on-device deployment.
Open-weights multimodal LLM family from Alibaba. Latest Qwen3-VL ships in Dense and Mixture-of-Experts architectures from edge to cloud; Qwen2.5-VL covers 3B / 7B / 32B / 72B sizes. Strong vision, video understanding (>1 hour), and bilingual Chinese/English text recognition.
Qwen-VL is the open-weights multimodal large language model family from Alibaba’s Qwen team, with multiple successive generations released since 2023. The current Qwen3-VL line ships in both dense and Mixture-of-Experts architectures scaling from edge (~2-3B) to cloud (235B-A22B), with Instruct and reasoning-enhanced variants. The widely adopted Qwen2.5-VL series, released in January 2025, comes in 3B, 7B, 32B, and 72B parameter sizes — all open-weight, with the smaller variants under Apache 2.0 and the 72B under a Qwen-specific licence. The flagship Qwen2.5-VL-72B-Instruct reportedly performs comparably to GPT-4o and Claude 3.5 Sonnet on multimodal benchmarks, and the family supports advanced visual comprehension of charts, diagrams, layouts, and forms; structured output (e.g. JSON from invoices); video understanding longer than an hour with second-level segment localisation; and end-to-end bilingual Chinese/English text recognition. Together with native bounding-box input and output, this makes Qwen-VL a leading open-weights alternative to GPT-4o and Gemini for droids that need vision-language reasoning without cloud dependency. AWQ-quantised checkpoints are available for the 3B, 7B, and 72B Qwen2.5-VL variants for on-device deployment.
