Alibaba Cloud has introduced Wanx 2.1, the latest iteration of its multimodal large model Tongyi Wanxiang (Wanx), which first debuted in July 2023. Designed to generate high-quality images and videos from text input, Wanx 2.1 represents a significant leap forward in AI-driven visual content creation.
The new model excels at generating realistic visuals by accurately handling complex movements, enhancing pixel quality, adhering to physical rules, and optimizing the precision of instruction follow-through. Its precision in following instructions has propelled Wanx 2.1 to the top of the VBench leaderboard, a comprehensive benchmark suite for video generative models. According to VBench, with an overall score of 84.7%, Wanx 2.1 leads in key dimensions such as dynamic degree, spatial relationships, and multi-object interactions.
To maximize the visual generation quality, the research team behind Wanx 2.1 has made significant technology progress across several fronts: first of all, by leveraging a proprietary VAE (Variational Autoencoder) and DiT (Denoising Diffusion Transformer) framework, Wanx 2.1 excels in strengthening temporal and spatial relationships and hence, achieving higher visual realism in dealing with scenes that involve complicated motion movement and physical rules.
By employing a full space-time attention mechanism, the model can also mimic the complex dynamics of the real world with remarkable accuracy.
Innovative approaches has also been adopted to accelerate the model’s training process using ultra-long context. This ensures seamless integration of text instructions into video generation,enabling faster and more intuitive content creation.
Additionally, Wanx 2.1 has achieved a groundbreaking milestone by becoming the first video generation model to support text effects in both Chinese and English, meeting the diverse creative needs of industries such as advertising design and short video production.
As a result of such innovative approaches, Wanx 2.1 demonstrates its ability to generate videos with large-scale bodily movements and complex rotations. Even in challenging scenarios such as figure skating, swimming, and diving, the model maintains body coordination and adheres to realistic motion trajectories, setting a new benchmark for video generation.
Wanx 2.1 is currently available for free on its official Chinese website. Individual developers and corporate users can explore its potential through Alibaba Cloud’s generative AI platform, Model Studio. This empowers users to create high-quality visual content tailored to their unique needs, further bridging the gap between AI technology and creative industries.