Alibaba Unveils its Latest Open-Source Video Generation Model

Published on April 18, 2025

Alibaba has unveiled its latest open-source video generation model, Wan2.1-FLF2V-14B. It has been designed to facilitate the input of start and end frames to simplify the creation of videos. This gives short-video creators greater creative control by assisting them in the development of their own AI models and applications, both efficiently and affordably.

The model is a part of Alibaba’s foundation model – the Wan2.1 series – which has been designed specifically to generate high-quality images and videos from text and images. The model is now open-sourced on Hugging Face and GitHub, as well as Alibaba Cloud’s open-source community, ModelScope.

The model demonstrates remarkable capabilities in executing users’ instructions, maintaining consistency between the first frame and the generated video, and delivering smooth transitions between the first and last frames to generate realistic and natural visuals of complex movements. It enables users to create a five-second video at 720p resolution for free when prompting the model on Wan’s official website.

The key technology behind this model is an innovative approach to video generation by incorporating an additional control adjustment mechanism. This mechanism uses the user-provided first and last frames of a sequence as control conditions, enabling smooth and precise transitions between the start and end frames.

To ensure visual stability, the mechanism helps inject semantic features from the first and last frames into the generation process, enabling the model to maintain consistency in style, content and structure, while dynamically transforming frames.

As one of the earliest major global tech companies to open source its self-developed large-scale AI models, Alibaba Cloud’s commitment to it saw it open source four Wan2.1 models in February 2025. To date, the models have attracted over 2.2 million downloads on Hugging Face and ModelScope.

Unveiled earlier this year, the Wan2.1 series was the first video generation model to support text effects in both Chinese and English. It is top of the VBench leaderboard, a comprehensive benchmark suite for video generative models.

Alibaba Cloud released its first open large language model (LLM) Qwen-7B, in August 2023. Qwen’s open models have consistently topped the Hugging Face Open LLM Leaderboards, with performances matching those of leading global AI models across various benchmarks.

Over the past years, Alibaba Cloud has made over 200 generative AI models open-source. As of now, more than 100,000 derivative models based on the Qwen family of models have been developed on Hugging Face, making it one of the most prominent AI model families worldwide.