AIsphere, a Chinese startup, has raised more than USD 60 million in a Series B funding round led by Alibaba, with participation from Fortune Capital, Shenzhen Capital Group, the Beijing Artificial Intelligence Industry Investment Fund, Hunan Broadcasting System, Giant Network, and Antler. It is the largest funding round to date for a Chinese company in the video generation space.
AIsphere’s flagship product, PixVerse, has surpassed 100 million global users, up from 60 million just months ago. Founder and CEO Wang Changhu told 36Kr that subscription revenues from its products already cover costs.
Other companies are also scaling quickly. Kuaishou reported that revenue from its video generation unit, Kling AI, reached RMB 250 million (USD 35 million) in the second quarter of 2025, contributing 4.8% of total revenue.
A year ago, such growth seemed uncertain. The launch of OpenAI’s Sora, alongside moves by Chinese internet giants, raised doubts about whether startups could survive. In February 2024, 36Kr reported that many investors expected video generation models to deliver negative returns in the near term and anticipated that smaller companies would be squeezed out.
Founded in April 2023, AIsphere was created in this climate of skepticism. Video generation was still a contrarian bet, and founders Wang and Xie Xuzhang were not the wunderkinds typically favored by investors.
Wang attributes the company’s progress to two decisions.
First, it avoided heavy promotional spending, relying instead on product-driven growth. PixVerse’s recent surge in users was aided by viral special effects templates such as “Venom transformation,” which have drawn more than ten billion views since November 2024.
Second, it resisted blind scaling. Each training cycle at AIsphere had to deliver measurable improvements while controlling costs. By analyzing user behavior, the company found that most single-shot videos, whether professional or casual, were under ten seconds, with little demand for one-minute clips. This led it to prioritize speed, quality, and instruction.
To reduce generation times from minutes to seconds, AIsphere applied distribution matching distillation. To stabilize image quality, it incorporated feature self-regularization loss into training.
According to Wang, limited capital prevented AIsphere from training a model comparable to Sora. “Looking back, we should have raised more money earlier to accelerate development,” he said.
Still, entering the market before video generation became consensus allowed the company to avoid inflated valuations and hype-driven funding rounds. By late 2024, as Sora’s diffusion transformer (DiT)-based architecture became the global standard, multiple contenders had already started to target the segment. By then, PixVerse had ten million global users and was generating revenue.
PixVerse V5, released on August 27, now ranks third worldwide in text-to-video generation and first in image-to-video, according to Artificial Analysis benchmarks as of September 11. AIsphere also introduced a consumer-facing “agent creator assistant” that enables users with minimal editing skills to generate videos using simple prompts and template libraries.


Can the video generation industry create its own Canva and Photoshop?
Since its launch, AIsphere has faced a central question: can startups compete with tech giants in video generation? In July 2024, shortly after Kuaishou introduced Kling AI, AIsphere released PixVerse V2, which it described as the first DiT-based video model from a Chinese startup ready for practical use.
Wang argues that video generation is not simply about using artificial intelligence to replace existing content on platforms like TikTok. Instead, it enables new forms of interaction:
“Once video can be generated in or close to real time, users will be able to modify and create content as they watch. That shift in interaction will give rise to a new wave of mass market products, just as the short video format gave rise to TikTok.”
The competitive landscape is expanding. Liu Yu, formerly head of SenseTime’s Miaohua, founded Vivix Group, whose product TipTap focuses on AI-generated pornography. Cao Yue, former co-founder of Moonshot AI, launched Sand AI, which uses an autoregressive approach instead of DiT to create infinitely extendable video.
AIsphere has concentrated on the consumer market, targeting users without production expertise. Wang likens PixVerse to Canva, offering simple tools for mass adoption, while other platforms resemble Photoshop or Figma, catering to professionals.
Templates have helped drive viral engagement since late 2024, but Wang said they are part of a broader effort to lower creative barriers. User-generated templates now constitute a core part of PixVerse’s ecosystem. On its domestic platform, a user-created “closet transformation” template has already drawn more than one million views.
For AIsphere, the most important measures are user growth, retention, and community engagement, especially the scale of user-generated content.
Consolidation may be close. A 2025 report from Bessemer Venture Partners projected that the video generation market is likely to narrow within a year, mirroring the trajectory of large language model development. Wang believes the companies that endure will be those that can cover costs while steadily expanding their user base.
KrASIA Connection features translated and adapted content that was originally published by 36Kr. This article was written by Zhou Xinyu for 36Kr.