Wan (Wan 2.1)

Alibaba's open-source video AI that beat Sora on VBench — run it locally, free, with an Apache 2.0 licence.

Video & Audio Freemium Has API Open Source
Researched · Published
RECATOOLS Score
8.4 / 10
Capability
8.8
Value for money
9.5
Ease of use
6.5
ASEAN readiness
7.5
API quality
7.5
Founded
2025
HQ
Hangzhou, China
Users
16,000+ GitHub stars; 1M+ model downloads across HuggingFace and ModelScope
Launched
February 25, 2025 (open-source weights release)
Developer
Alibaba Group (Alibaba Cloud / Tongyi Lab)

Overview

Wan 2.1 is a suite of open-source video foundation models released by Alibaba's Wan Team in February 2025 under the Apache 2.0 licence. The flagship 14B-parameter model topped the VBench leaderboard with an 86.22% score — outperforming OpenAI Sora (84.28%), Runway Gen-3 (82.32%), and Tencent HunyuanVideo (83.24%) — and is the only open-source model to place in the global top five at launch. A lightweight 1.3B variant runs on as little as 8.19 GB of VRAM, making local generation accessible on consumer GPUs like an RTX 4070 or higher.

The model family covers text-to-video, image-to-video, first-last-frame-to-video, and video editing tasks, with native bilingual text rendering in both Chinese and English inside generated frames. Models are freely downloadable from Hugging Face and ModelScope, and the official wan.video platform offers a hosted consumer product with tiered paid plans. A technical report (arXiv 2503.20314) by Team Wan — comprising more than 60 named researchers at Alibaba — details the diffusion-transformer architecture, Wan-VAE, and flow-matching training pipeline that underpin the series.

Advertisement

Pricing

Pricing shown for reference only. These figures reflect RECATOOLS research as of 16 Jun 2026 and may be out of date or incomplete. This is not financial or purchasing advice — always confirm the current price on the provider’s official website before making any decision.

Free
Free
Local self-hosting is fully free (Apache 2.0). wan.video offers limited hosted free generation.

Use cases

Indie filmmakers and content creators generating B-roll or stylised clips locally without subscription costs Developers building video-generation pipelines or SaaS products who need open weights and fine-tuning capability Researchers studying video diffusion models, VAE architectures, and flow-matching training at scale Marketing teams producing short social-media videos from product photos using the image-to-video feature Educators and students experimenting with AI video generation on consumer hardware in under-resourced settings

What you can produce with Wan (Wan 2.1)

  • 5-second text-to-video clips at 480p or 720p resolution
  • Image-to-video animations from a single still photograph
  • First-and-last-frame interpolation videos for controlled scene transitions
  • Video editing and stylisation via VACE model variant
  • Bilingual video content with legible Chinese and English text rendered inside frames
  • Locally hosted inference pipeline on consumer NVIDIA GPUs (RTX 3080 and above)
  • Fine-tuned derivative models via Apache 2.0 open weights
Advertisement

ASEAN Perspective

Wan (Wan 2.1) in Southeast Asia

Wan 2.1 holds notable relevance for the ASEAN region: it is developed by Alibaba, whose Cloud division has its deepest Southeast Asia footprint in Singapore and Malaysia (five data centres in Malaysia alone as of mid-2025), and it natively supports Chinese text rendering — useful for Chinese-language content creators across Singapore, Malaysia, and Vietnam. The model is actively deployed by ASEAN-facing AI platforms such as GladCube's Dra Vis and TabSpace.ai via Alibaba Cloud's Model Studio. Its zero-cost local deployment path is a significant equaliser for developers and SMEs in cost-sensitive markets like Indonesia, the Philippines, and Vietnam, where commercial video AI subscriptions are prohibitively expensive for most creators.

RECATOOLS Verdict

Wan 2.1 is arguably the most significant open-source video model released to date: it is the only freely downloadable model family that could genuinely challenge commercial incumbents at launch, topping VBench 86.22% vs Sora's 84.28%, and its Apache 2.0 licence allows unrestricted commercial deployment, fine-tuning, and on-premise hosting with no royalties. The 1.3B variant's 8 GB VRAM floor puts serious AI video generation within reach of hobbyists and indie studios for the first time. Value-for-money is exceptional — the weights are free and the hosted API is competitively priced against Runway or Kling.

Caveats are real. Generation speed on consumer hardware is slow (roughly four minutes for five seconds on an RTX 4090), and local setup requires comfort with Python, CUDA, and Hugging Face tooling. The 14B model demands a high-end GPU (24 GB+ VRAM recommended). By mid-2026 the competitive gap has narrowed: Kling 3.0 leads on character consistency and 1080p quality, while Sora 2 dominates photorealism and long-form generation. Wan 2.1 remains the go-to for developers needing open weights, fine-tuning access, or cost-controlled API integration.

Independent AI-assisted assessment by RECATOOLS.

What people say

Wan 2.1 is the benchmark-setter for open-source video AI as of its February 2025 release, achieving a verified 86.22% VBench score that surpassed Sora and Luma at launch. Its Apache 2.0 licence, freely downloadable weights, and 8 GB VRAM minimum make it uniquely accessible. Generation speed on consumer hardware is slow, setup requires technical proficiency, and the 14B model demands high-end GPUs. By 2026 newer closed models have reclaimed quality leadership in specific domains. For developers and cost-focused creators, Wan 2.1 remains the most compelling open-source option available.

Summary of public user & expert reviews, compiled by RECATOOLS.

Notable facts

  • Wan 2.1 is the only open-source model to rank in the global top five on the VBench video benchmark leaderboard at its February 2025 launch.
  • The 1.3B lightweight variant requires just 8.19 GB of VRAM — enough to run on a standard RTX 3080 or 4070 gaming GPU.
  • The technical paper has more than 60 named co-authors, making it one of the largest team-authored AI model reports of 2025.
  • Wan 2.1 was the first video generation model capable of rendering legible text in both Chinese and English characters directly inside generated video frames.

Frequently asked questions

Can I use Wan 2.1 commercially without paying Alibaba?
Yes. The weights are released under Apache 2.0, which permits unrestricted commercial use, fine-tuning, and redistribution with attribution. You only pay if you use Alibaba's hosted DashScope API or the wan.video consumer platform.
What GPU do I need to run Wan 2.1 locally?
The 1.3B text-to-video model needs at least 8.19 GB of VRAM (e.g. RTX 3080/4070). The 14B model is recommended with 24 GB+ VRAM (RTX 3090/4090 or equivalent). Generation of a 5-second 480p clip on an RTX 4090 takes roughly 4 minutes without quantisation.
How does Wan 2.1 compare to Runway, Kling, and Sora?
At its February 2025 launch, Wan 2.1's 14B model scored 86.22% on VBench, ahead of Sora (84.28%), Runway Gen-3 (82.32%), and HunyuanVideo (83.24%). By 2026, Kling 3.0 leads on character consistency and 1080p quality, and Sora 2 leads on photorealism. Wan 2.1's key differentiator remains free open weights and fine-tuning support.

About this listing

Researched on
Published on

This entry was compiled from publicly available data including Wan (Wan 2.1)'s official website, press releases, documentation, and reputable third-party publications. RECATOOLS is not affiliated with Wan (Wan 2.1) unless explicitly stated.

Data accuracy

Third-party AI tools update their pricing, features, availability, and policies frequently. Information here may be outdated by the time you read this — we make reasonable efforts to keep listings current, but cannot guarantee absolute accuracy.

For the latest details, please refer to Wan (Wan 2.1) directly →

Spotted something out of date? Suggest an update →

Advertisement