While OpenAI keeps teasing Sora after months of delays, Tencent quietly dropped a model that is already showing comparable results to existing top-tier video generators.
Tencent has unveiled Hunyuan Video, a free and open-source AI video generator, strategically timed during OpenAI's 12-day announcement campaign, which is widely anticipated to include the debut of Sora, its highly anticipated video tool.
“We present Hunyuan Video, a novel open-source video foundation model that exhibits performance in video generation that is comparable to, if not superior to, leading closed-source models,” Tencent said in its official announcement.
The Shenzhen, China-based tech giant claims its model “outperforms” those of Runway Gen-3, Luma 1.6, and “three top-performing Chinese video generative models” based on professional human evaluation results.
The timing couldn't be more apt.
Before its video generator—somewhere between the SDXL and Flux eras of open-source image generators— Tencent released an image generator with a similar name.
HunyuanDit provided excellent results and improved understanding of bilingual text, but it was not widely adopted. The family was completed with a group of large language models.
Hunyuan Video uses a decoder-only Multimodal Large Language Model as its text encoder instead of the usual CLIP and T5-XXL combo found in other AI video tools and image generators.
Tencent says this helps the model follow instructions better, grasp image details more precisely, and learn new tasks on the fly without additional training—plus, its causal attention setup gets a boost from a special token refiner that helps it understand prompts more thoroughly than traditional models.
It also rewrites prompts to make them richer and increase the quality of its generations. For example, a prompt that simply says “A man walking his dog” can be enhanced including details, scene setup, light conditions, quality artifacts, and race, among other elements.
Free for the masses
Like Meta's LLaMA 3, Hunyuan is free to use and monetize until you hit 100 million users—a threshold most developers won't need to worry about anytime soon.
The catch? You'll need a beefy computer with at least 60GB of GPU memory to run its 13 billion parameter model locally—think Nvidia H800 or H20 cards. That's more vRAM than most gaming PCs have in total.
For those without a supercomputer lying around, cloud services are already jumping on board.
FAL.ai, a generative media platform tailored for developers, has integrated Hunyuan, charging $0.5 per video. Other cloud providers, including Replicate or GoEhnance, have also started offering access to the model. The official Hunyuan Video server offers 150 credits at $10, with each video generation costing 15 credits minimum.
And, of course, users can run the model on a rented GPU using services like Runpod or Vast.ai.
Early tests show Hunyuan matching the quality of commercial heavyweights like Luma Labs Dream Machine or Kling AI. Videos take about 15 minutes to generate, producing photorealistic sequences with natural-looking human and animal motion.
RIP Sora..
It's only been a few hours since Hunyuan-Video launched,
I've tested out and it's insane.
Here are 8 Wild examples: pic.twitter.com/AeQ2BwZhqv
— el.cine (@EHuanglu) December 4, 2024
Testing reveals one current weakness: the model's grasp of English prompts could be sharper than its competitors. However, being open source means developers can now tinker with and improve the model.
Tencent says its text encoder achieves up to 68.5% alignment rates—meaning how closely the output matches what users ask for—while maintaining 96.4% visual quality scores based on their internal testing.
The complete source code and pre-trained weights are available for download on GitHub and Hugging Face platforms.
Edited by Sebastian Sinclair