Alibaba Cloud Unveils New Research Model for Enhanced Visual Reasoning

Main Content

Alibaba Cloud Unveils New Research Model for Enhanced Visual Reasoning



Alibaba Cloud has recently introduced QVQ-72B-Preview (“QVQ”), an open-sourced, experimental research model designed to advance visual reasoning capabilities.

QVQ is an open-weight model for multimodal reasoning that has delivered exceptional performance across various benchmarks. Notably, it achieved an impressive score of 70.3% on the Multimodal Massive Multi-task Understanding (MMMU) benchmark, underscoring its strong multidisciplinary understanding and reasoning abilities. In addition, QVQ demonstrated significant advancements in MathVision—a multimodal mathematical reasoning test set—achieving results that surpass its predecessor, the Qwen2-VL-72B model. Its exceptional performance on the OlympiadBench benchmark, an Olympic competition-level bilingual multimodal science benchmark test set, further highlights QVQ’s ability to tackle complex and challenging problems effectively.

Performance

Through step-by-step reasoning, QVQ showcases enhanced capabilities in visual reasoning tasks, excelling particularly in scenarios that demand advanced analytical thinking. However, despite its promising performance, QVQ does have certain limitations. For instance, during multi-step visual reasoning, the model may gradually lose focus on the image content, which can lead to hallucinations.

QVQ has been open-sourced and can be experimented on Hugging FaceGithub, and Alibaba’s open-source community Model Studio. 

数学题

example of QVQ in solving math problem
demo video of QVQ in responding to math questions

Last month, Alibaba Cloud released its reasoning AI model QwQ (Qwen with Questions). The released version QwQ-32B-Preview, an open-source experimental research model with 32 billion parameters, showcases impressive analytical capabilities and excels in solving complex problems in mathematics and programming.

Currently, more than 78,000 derivative models have been developed on Hugging Face based on the Qwen family of models since it was first open-sourced in 2023, demonstrating its position as one of the most widely adopted open-source models globally.

Learn more for more comments from users and developers

AIAlibaba Group
Reuse this content

Sign Up For Our Newsletter

Stay updated on the digital economy with our free weekly newsletter