Alibaba Cloud Open Sources its AI Models for Video Generation

Main Content

Alibaba Cloud Open Sources its AI Models for Video Generation

  • Alibaba Cloud open-sources 14 bln and 1.3 bln parameter versions of video foundation model Wan2.1
  • Wan2.1 tops the VBench leaderboard, the only open-source video generation model among the top five

Generated by Qwen

Alibaba Cloud said on Wednesday that it has made its AI models for video generation freely available as part of its latest efforts to contribute to the open-source community.

The cloud computing company is open-sourcing four models of its 14-billion(B)-parameter and 1.3-billion(B)-parameter versions of the Wan2.1 series, the latest iteration of its video foundation model Tongyi Wanxiang (Wan).

The four models, including T2V-14B, T2V-1.3B, I2V-14B-720P, and I2V-14B-480P, are designed to generate high-quality images and videos from text and image inputs. They are available for download on Alibaba Cloud’s AI model community, Model Scope, and the collaborative AI platform Hugging Face, accessible to academics, researchers, and commercial institutions worldwide.

Unveiled earlier this year, the Wan2.1 series is the first video generation model to support text effects in both Chinese and English. It excels at generating realistic visuals by accurately handling complex movements, enhancing pixel quality, adhering to physical principles, and optimizing the precision of instruction execution. Its precision in following instructions has propelled Wan2.1 to the top of the VBench leaderboard, a comprehensive benchmark suite for video generative models. It is also the only open-source video generation model among the top five on the VBench leaderboard of Hugging Face.

According to VBench, the Wan2.1 series, with an overall score of 86.22%, leads in key dimensions such as dynamic degree, spatial relationships, color, and multi-object interactions.

Wan 2.1

Training video foundation models requires immense computing resources and vast amounts of high-quality training data. Open access helps lower the barrier for more businesses to leverage AI, enabling them to create high-quality visual content tailored to their needs cost-effectively.

The T2V-14B model is better suited for creating high-quality visuals with substantial motion dynamics. In contrast, the T2V-1.3B model balances generation quality and computational power, making it ideal for a broad range of developers conducting secondary development and academic research. For example, the T2V-1.3B model allows users with standard personal laptops to generate a 5-second-long video at 480p resolution in as little as around 4 minutes.

Text prompt: 一名男子在跳台上做专业跳水动作。全景平拍镜头中,他穿着红色泳裤,身体呈倒立状态,双臂伸展,双腿并拢。镜头下移,他跳入水中,溅起水花。背景中是蓝色的泳池。English translation: “In a wide-angle, frontal shot, a man dives from the platform in red swim trunks, arms out and legs together. As the camera lowers, he leaps into the water, creating splashes, with the blue pool in the background.”

The I2V-14B-720P and I2V-14B-480P models support text-to-video generation and offer image-to-video capabilities. To generate dynamic video content, users simply need to input a single image along with a brief text description. The platform supports normal-sized image inputs of any dimension.

Alibaba Cloud was one of the first major global tech companies to open-source its self-developed large-scale AI model, releasing its first open model, Qwen (Qwen-7B), in August 2023. Qwen’s open models have consistently topped the HuggingFace Open LLM Leaderboards, with performances matching those of leading global AI models across various benchmarks.

As of now, more than 100,000 derivative models based on the Qwen family of models have been developed on Hugging Face, making it one of the most prominent AI model families worldwide.

Learn more about Alibaba’s Open Source AI Journey

Alibaba CloudLLMqwen
Reuse this content

Sign Up For Our Newsletter

Stay updated on the digital economy with our free weekly newsletter