Alibaba Cloud Unveiled Wanx 2.1: Redefining AI-Driven Video Generation

Main Content

Alibaba Cloud Unveiled Wanx 2.1: Redefining AI-Driven Video Generation



Alibaba Cloud has introduced Wanx 2.1, the latest iteration of its multimodal large model Tongyi Wanxiang (Wanx), which first debuted in July 2023. Designed to generate high-quality images and videos from text input, Wanx 2.1 represents a significant leap forward in AI-driven visual content creation.

The new model excels at generating realistic visuals by accurately handling complex movements, enhancing pixel quality, adhering to physical rules, and optimizing the precision of instruction follow-through. Its precision in following instructions has propelled Wanx 2.1 to the top of the VBench leaderboard, a comprehensive benchmark suite for video generative models. According to VBench, with an overall score of 84.7%, Wanx 2.1 leads in key dimensions such as dynamic degree, spatial relationships, and multi-object interactions.

Vbench榜单

To maximize the visual generation quality, the research team behind Wanx 2.1 has made significant technology progress across several fronts: first of all, by leveraging a proprietary VAE (Variational Autoencoder) and DiT (Denoising Diffusion Transformer) framework, Wanx 2.1 excels in strengthening temporal and spatial relationships and hence, achieving higher visual realism in dealing with scenes that involve complicated motion movement and physical rules.

By employing a full space-time attention mechanism, the model can also mimic the complex dynamics of the real world with remarkable accuracy.

Innovative approaches has also been adopted to accelerate the model’s training process using ultra-long context. This ensures seamless integration of text instructions into video generation,enabling faster and more intuitive content creation.

Additionally, Wanx 2.1 has achieved a groundbreaking milestone by becoming the first video generation model to support text effects in both Chinese and English, meeting the diverse creative needs of industries such as advertising design and short video production.

滑冰
Text Prompt:「平拍一位女性花样滑冰运动员在冰场上进行表演的全景。她穿着紫色的滑冰服,脚踩白色的滑冰鞋,正在进行一个旋转动作。她的手臂张开,身体向后倾斜,展现了她的技巧和优雅」。English translation: “A panoramic shot of a female figure skater performing on an ice rink. She is wearing a purple skating outfit and white skates, executing a spinning move. Her arms are outstretched, and her body leans backward, showcasing her skill and grace.”

As a result of such innovative approaches, Wanx 2.1 demonstrates its ability to generate videos with large-scale bodily movements and complex rotations. Even in challenging scenarios such as figure skating, swimming, and diving, the model maintains body coordination and adheres to realistic motion trajectories, setting a new benchmark for video generation.

Wanx 2.1 is currently available for free on its official Chinese website. Individual developers and corporate users can explore its potential through Alibaba Cloud’s generative AI platform, Model Studio. This empowers users to create high-quality visual content tailored to their unique needs, further bridging the gap between AI technology and creative industries.

Reuse this content

Sign Up For Our Newsletter

Stay updated on the digital economy with our free weekly newsletter