Alibaba Cloud New Model for Enhanced Visual Reasoning

Published on Dec. 30, 2024

Alibaba Cloud has recently introduced QVQ-72B-Preview (“QVQ”), an open-sourced, experimental research model designed to advance visual reasoning capabilities.

QVQ is an open-weight model for multimodal reasoning that has delivered exceptional performance across various benchmarks. Notably, it achieved an impressive score of 70.3% on the Multimodal Massive Multi-task Understanding (MMMU) benchmark, underscoring its strong multidisciplinary understanding and reasoning abilities. In addition, QVQ demonstrated significant advancements in MathVision—a multimodal mathematical reasoning test set—achieving results that surpass its predecessor, the Qwen2-VL-72B model. Its exceptional performance on the OlympiadBench benchmark, an Olympic competition-level bilingual multimodal science benchmark test set, further highlights QVQ’s ability to tackle complex and challenging problems effectively.

Through step-by-step reasoning, QVQ showcases enhanced capabilities in visual reasoning tasks, excelling particularly in scenarios that demand advanced analytical thinking. However, despite its promising performance, QVQ does have certain limitations. For instance, during multi-step visual reasoning, the model may gradually lose focus on the image content, which can lead to hallucinations.

QVQ has been open-sourced and can be experimented on Hugging Face, Github, and Alibaba’s open-source community Model Studio.

数学题 — *example of QVQ in solving math problem*

demo video of QVQ in responding to math questions

Last month, Alibaba Cloud released its reasoning AI model QwQ (Qwen with Questions). The released version QwQ-32B-Preview, an open-source experimental research model with 32 billion parameters, showcases impressive analytical capabilities and excels in solving complex problems in mathematics and programming.

Currently, more than 78,000 derivative models have been developed on Hugging Face based on the Qwen family of models since it was first open-sourced in 2023, demonstrating its position as one of the most widely adopted open-source models globally.

Learn more for more comments from users and developers

Alibaba Cloud Unveils New Research Model for Enhanced Visual Reasoning

Alibaba Cloud Unveils New Research Model for Enhanced Visual Reasoning

Popular Stories

Alibaba’s New AI Training Method Cuts Search Costs by Nearly 90%

New Whitepaper Shows How AI Can Power Sustainable Business Transformation

Alibaba Introduces Qwen3, Setting New Benchmark in Open-Source AI with Hybrid Reasoning

Alibaba Unveils its Latest Open-Source Video Generation Model

Alibaba Cloud Unveils New Research Model for Enhanced Visual Reasoning

Never Miss a Story

Popular Stories

Sign Up For Our Newsletter