Why world models alone will not solve robotics’ deployment problem

When asked to review ACE Robotics’ progress over the past six months, Wang Xiaogang, chairman of ACE Robotics and co-founder of SenseTime, spoke for more than ten minutes without pause.

Founded in July 2025, ACE Robotics was a late entrant to the embodied intelligence sector. But over the past year, the company has become one of the field’s more active new players.

On the model side, ACE Robotics’ newly released Kairos 3.0, a world model for robotics, achieved what the company described as state-of-the-art results in four global embodied intelligence benchmark tests. Its open-source Kairos 3.0-4B was also reportedly the first open-source model of its size to directly control an embodied intelligence robot body on device.

On the data side, ACE Robotics has proposed a “human-centered” environmental data collection approach. Instead of relying only on humans teleoperating robots, the company collects human interactions with real environments at scale. It said this has expanded world model training data to one million hours, ten times the volume of traditional real-robot data collection.

Then there is deployment. Half a year ago, ACE Robotics’ A1 module was mainly deployed in quadruped robots for road inspection. Today, A1 has entered hotels, unmanned retail stores, unmanned logistics warehouses, and other scenarios through robots of different forms.

On June 15, ACE Robotics announced that it had completed an angel plus funding round, just four months after announcing its previous round. Investors include Fortune Capital, Shenzhen Capital Group, Shanghai Sci-Tech Innovation Center Capital, MetaX, Sharewin, Fosun RZ Capital, Tsinghua Holdings Capital, Lingang New Area Fund, and Yuzi Zhangquan. Existing shareholder SenseCapital also increased its investment, while Gaojie Capital served as long-term financial advisor.

The round brought ACE Robotics’ cumulative financing in 2026 to a nine-figure USD sum. According to 36Kr, ACE Robotics has also become one of the fastest embodied intelligence companies to reach unicorn status.

ACE Robotics is hardly alone. Qianjue Technology, incubated by Tsinghua University, is targeting real-world projects such as hotel cleaning, commercial services, and precision indoor operations. Tars, founded by Chen Yilun, former chief scientist of Huawei’s automotive business unit, is focused on wire harness assembly.

The embodied intelligence industrial chain is very long. It is hard for one company to do everything,” Wang told 36Kr. “So figuring out how to mobilize more resources and take control of an ecosystem position in the embodied intelligence industrial chain is critical.”

Yet during deployment, Wang has found that the combination of hardware, data, and models is still not enough.

Outside China, leading embodied intelligence companies such as Figure and Tesla integrate hardware R&D, data collection, and model training internally to improve collaborative iteration.

In China, that loop has not yet formed. Wang said many robot body companies remain cautious about scenario deployment because of technological immaturity and pressure from resource investment. Upstream data collection standards have not been unified, and the supply of high-quality data that can be used directly for embodied model training remains insufficient. At the same time, hardware iteration cycles are far longer than model cycles, making design difficult to coordinate.

ACE Robotics’ current methodology is to identify scalable deployment scenarios and robot body manufacturers with which it can work deeply.

In Wang’s plan, ACE Robotics will first go deeper into road inspection and unmanned logistics warehouse scenarios, then expand into more complex consumer home scenarios with higher safety requirements.

The advantage is that ACE Robotics can first collect enough scenario data from business settings, improve its world model capabilities, and quickly form scalable solutions that help robot body manufacturers enter new scenarios.

Wang spoke with 36Kr about ACE Robotics’ progress and his observations on the embodied intelligence sector.

Photo of Wang Xiaogang, co-founder and executive director of SenseTime, and the chairman of ACE Robotics. — Wang Xiaogang, co-founder and executive director of SenseTime, and the chairman of ACE Robotics. Photo source: SenseTime.

The following transcript has been edited and consolidated for brevity and clarity.

36Kr: Embodied intelligence and world models are among the hottest sectors in the primary market this year. Compared with the time when ACE Robotics was founded, has fundraising become more difficult in this round?

Wang Xiaogang (WX): The advantage of raising money at this moment is that the market is hot and there is attention.

But conversely, there are too many companies, and sometimes investors do not understand where each company’s value lies. So we need to spend more effort explaining our development path and technical thinking to investors.

36Kr: ACE Robotics was founded in July 2025. At the time, did you feel it was late to enter the embodied intelligence sector?

WX: We chose that timing because we saw a change in the research paradigm for embodied intelligence. The original mainstream VLA (vision-language-action) paradigm had limitations and lacked a structured understanding of the physical world. World models happen to solve that problem. So by entering at that time, we had a chance to overtake on the curve.

Also, when the technology was still immature, many players had wasted a lot of data, model training resources, and manpower while exploring technical paradigms, especially for embodied intelligence. So by entering last year, we could avoid some detours and actually have a latecomer advantage.

36Kr: Relatively speaking, does entering late make the competition more intense?

WX: The embodied intelligence industrial chain is very long. It is hard for one company to do everything. So figuring out how to mobilize more resources and take control of an ecosystem position in the embodied intelligence industrial chain is critical.

Before ACE Robotics was founded last year, we interviewed many embodied intelligence companies. I found that, at the time, embodied intelligence companies were generally cautious about entering scenarios.

But scenarios play a key role in the development of embodied intelligence. The field is divided by scenarios. As long as closed-loop validation is completed in one scenario, it becomes easy to replicate globally at scale. During scalable replication, the volume of data collection and the scale of hardware can increase by several orders of magnitude.

36Kr: Why are embodied intelligence companies reluctant to enter scenarios?

WX: On one hand, the technology is not yet mature enough. On the other, solving problems in scenarios involves large investments in data collection, R&D, and other resources. In addition, many emerging companies do not yet have a deep enough understanding of industries and scenarios.

So the attitude of many companies is: raise money first, then wait for the sector to mature before catching up. But by the time that moment appears, the first-mover opportunity will already have been taken by someone else.

36Kr: When the technology was not mature, how did you talk to leading customers in these scenarios?

WX: It is important to find the boundaries of the technology. We need to identify real-world boundaries based on the maturity of the technology, software, and hardware.

If you enter a B2C scenario, such as Level 4 autonomous driving, your technology cannot have boundaries. But if you enter a B2B scenario, with various controllable conditions, the technology can be deployed.

We also have to judge which scenarios can be solved directly and which can be solved through certain methods. In addition, the solutions for these scenarios need to be replicable. If the scenario you find is not replicable, then after finishing one, you have to customize the next one again. That is not a good choice.

36Kr: How do you judge whether a scenario is replicable?

WX: For example, we prioritize retail and warehousing because their business systems and needs can be replicated nationwide. Hotels are also a replicable scenario. There are many hotels across the country, and what we deliver is the same set of inspection, navigation robot, and quadruped robot systems.

36Kr: Will competition in these scenarios become more intense?

WX: Although everyone is targeting these scenarios, many companies do not go deep enough. The result is that their costs cannot be controlled or reduced at the margin.

You can make a demo to show off the technology, but it does not meet the prerequisites for scaling.

36Kr: What kind of deployment model counts as going “deep” into a scenario?

WX: First, you need close ecosystem partners. For example, in unmanned retail, we work with Shanhui, a company in the SenseTime ecosystem, to provide unmanned retail solutions.

Shanhui first puts forward requirements around cost, battery life, energy conservation, and emissions reduction. Then, in specific complex scenarios, it gives us a lot of technical feedback. These requirements and feedback help us form a data loop and iterate quickly within the scenario.

After undergoing necessary preparations with an ecosystem partner, we can also determine which solutions are necessary, which can be omitted, and which can be compensated for through other approaches.

Once the solution matures, we can expand business cooperation to other leading retail-related companies and bring costs down through scale. Using this approach, ACE Robotics can currently reduce the cost of its solutions to one-third of the industry level.

36Kr: You have mentioned that ACE Robotics’ deployment plan follows the sequence of road inspection, unmanned logistics, and home scenarios. What is the thinking behind that sequence?

WX: On one hand, we consider the difficulty of technical implementation. On the other, we still follow a B2B before B2C strategy. Consumer scenarios have weak rule boundaries and contain many unstructured scenarios. But business-side scenarios are controlled and can ensure safety.

So after accumulating more experience in B2B scenarios, we will move toward B2C settings.

36Kr: In the early stages of entrepreneurship, you proposed many new ideas. For example, when VLA was still the mainstream paradigm for embodied intelligence, you chose to build world models. You also proposed a human-centered data collection paradigm. How did you judge that this paradigm was feasible?

WX: The direction was very clear. First, compared with VLA, only a generative model like a world model has the capacity for intelligence emergence. So from day one, we chose the world model direction for embodied intelligence.

Second, only real human data, whether in terms of collection efficiency, scale, or human-like authenticity, can meet the requirements for training world models.

But many details only became clear gradually through practice. For example, when building world models, our main focus at first was generation capability. But in real scenarios, a world model not only needs to generate data, it also needs to control real robots and interact with the physical world through robots. That places higher demands on a world model’s physical intelligence and spatial intelligence.

That is why we recently released the open-source general spatial intelligence model ACE-Brain-0 and the physical 3D generation framework PhysX-Omni, to improve world models’ spatial intelligence and physical intelligence.

36Kr: Video generation models, VLA systems, and others all call themselves “world models.” How do you define a world model?

WX: Simply put, a world model needs to have three capabilities: understanding, generation, and prediction. Only when it has all three can the model self-evolve, self-correct, and improve itself.

Why does everyone say they are a world model? Because there is no evaluation system for world models yet. For example, the sector lacks a benchmark for long-horizon complex task execution performance.

Some so-called “world models” only promote what they are good at, but they are actually missing other capabilities. VLA lacks generation capability, while video generation models lack understanding of physics and space.

36Kr: How do you internally evaluate the capabilities of a world model?

WX: We are working with academic institutions and embodied intelligence companies to establish a world model benchmark. Its evaluation dimensions include cross-embodiment generalization capability and simulation capability. Ultimately, these dimensions are meant to measure the model’s capabilities in understanding, generation, and prediction.

36Kr: ACE Robotics’ world model Kairos has been updated to 3.0. If compared with language models, what stage is it at?

WX: We are still gradually iterating Kairos along three dimensions: understanding, generation, and prediction. In the earliest stage, Kairos was mainly used for video generation. Later, it gradually began to control real robots. Correspondingly, we also needed to improve its understanding of spatial and physical attributes.

36Kr: At the current stage of world model development, which has the biggest impact on model capability: data volume, data quality, annotation, or later evaluation?

WX: World models are still in the zero-to-one stage, and there is very little data available for training. So at this stage, data volume has a more obvious effect on improving performance. When training data increases by ten or 100 times, I can immediately see improvements in model capability.

But when world models begin to show intelligence emergence, the data will need to be finely filtered and annotated in more detail. This is similar to the development of LLMs.

36Kr: What is the key to increasing the scale of data collection?

WX: It is still scenario scaling, which requires industrial partners to enter.

For industrial partners, data collection is also an entry point into the embodied intelligence industry. Because they have scenarios, if they do data collection, they can monetize it immediately and generate value right away. Then, by training models and introducing robots, industrial partners can also improve efficiency in their scenarios.

36Kr: There is a view that, after buying a robot, it serves only as a mascot and has no practical use.

WX: Beyond quality issues, one important reason is that embodied intelligence companies have not deeply iterated and polished their products for specific scenarios.

Robot companies today keep releasing new models every year. But these models are not iterated for application scenarios, so old problems are not solved and new problems appear instead.

When problems do not converge, repair rates increase, and robots run into issues after working for only a few hundred hours. So the large-scale rollout of robots faces major problems today.

36Kr: What is the solution that would make embodied intelligence companies iterate according to scenarios?

WX: Once scenarios can scale and hardware can be mass produced, that will force embodied intelligence companies to concentrate resources on scenario-specific iteration.

36Kr: What other problems does the sector face today?

WX: First, today’s combination of models, data, and hardware is not enough.

Foundation model companies, world model companies, and data companies are each building their own data collection plans. But in the future, robot bodies will be driven by data, not by rules from real machines or physical models.

So these questions become critical: How do you collect data from humans? What data should be collected to drive the robot hardware body? How should the robot hardware body be designed to meet the requirements of human behavior? Once the design is too complex and humans cannot perform the corresponding actions, there will be no data in the future to drive the robot body.

Figure and Tesla in the US are taking a highly integrated, vertically integrated technical route. They build models, data, and hardware themselves, so internal iteration is more efficient. Today, we need to find a model that allows the three to be combined relatively well.

Second, the combination of embodied intelligence and scenarios currently faces many difficulties. Scenarios are actually China’s advantage. Many downstream scenarios are replicable, and the pace of embodied intelligence deployment will be very fast in the future.

For embodied intelligence to achieve a scenario breakthrough, it needs a lot of industry know-how. But many industrial players in these scenarios do not have knowledge or technical reserves in embodied intelligence. So we need to find a new model that strongly combines hardware, the “brain,” and scenarios.

36Kr: What kind of “new model?”

WX: On one hand, it means forming strategic partnerships with leading companies in industries. SenseTime, which is behind us, has thousands of customers across many industry directions. Once we secure resources from leading customers, our data collection and solutions can scale.

On the other hand, we have also visited many robot body manufacturers, studied their design thinking, and formed deep cooperation with them to help them enter scenarios.

36Kr: As a world model developer, are you now getting closer to the hardware side?

WX: Yes. So we still need to communicate fully with robot body manufacturers, visit them, and discuss their technical details.

At present, we still have many differences in understanding with robot body manufacturers when it comes to data collection plans.

For example, we will have members of the data collection team wear gloves that can capture tactile information. But some robot body manufacturers’ designs for “hands” are not human hands. They may be grippers or only three fingers. In that case, our data collection plan needs to take the corresponding design into account.

36Kr: It sounds like, at this stage, you still need to adapt to the designs of robot body manufacturers. But you previously said the model should define the robot body. Compared with letting the robot body define the model, what are the benefits?

WX: In the future, robot bodies will still have to be driven by data. If the hardware is designed with very high complexity, what data will drive it in the future?

Also, the development cycle for robot hardware bodies is very long. It is unlike model software, which can be realized quickly. So hardware needs to be planned in advance according to the direction of model iteration. Whoever can think ahead and plan ahead in the iteration direction will gain the first-mover opportunity.

At this stage, the best form of cooperation is for robot body manufacturers and model companies to bind themselves deeply together. When robot body manufacturers design next-generation robots, we can also sort out the corresponding data collection plans and model solutions in advance.

36Kr: Among data collection, model iteration, and scenario expansion, how do you prioritize these businesses right now?

WX: Data and scenarios are relatively critical. Because the embodied intelligence industrial chain is very long, you need to quickly occupy an ecosystem position.

So at this moment, we are using our own solutions to form close cooperation with local governments and leading companies in scenarios. This is the strategic foothold.

The model itself is still evolving and may not be as urgent. But building models also helps us secure leading companies in scenarios, because compared with robot hardware bodies, we are closer to data and scenarios. So while iterating the model, we still have to seize scenarios rather than stay confined to the lab.

KrASIA features translated and adapted content that was originally published by 36Kr. This article was written by Zhou Xinyu for 36Kr.