At its Tech Day event, Xpeng, the automaker turned technology company, unveiled what it calls the world’s first large model for the physical world that can be mass produced. CEO He Xiaopeng described it as a second-generation vision-language-action (VLA) model that removes the linguistic intermediary layer. The system functions in two ways: as a motion generation model and as a physics-based understanding and reasoning engine.
The launch reflects He’s belief that large models will form the operating system for a new era of “physical artificial intelligence,” serving as the foundation for all related applications.
Specifically, Xpeng aims to give machines the ability to gradually understand, interact with, and reshape the physical world. These core technologies will be applied across its smart vehicles, humanoid robots, and flying cars.
The concept aligns closely with today’s dominant tech narrative. Elon Musk has repeatedly described Tesla as the world’s leading physical AI company, defining the field as AI capable of understanding and interacting with the real world. Tesla has invested heavily in robotaxis, humanoid robots, and recently announced plans for a flying car prototype.
Xpeng’s narrative framework resembles Tesla’s, but not entirely.
After the event, CEO He emphasized that Xpeng’s ecosystem is more open than Tesla’s. The company will partner with Amap to build a robotaxi network. The benefit, he explained, is that each branch of Xpeng’s mobility business can independently test viability, reallocate resources quickly, and ease financial pressure.
Following two years of restructuring, Xpeng’s vehicle deliveries are back on a growth track, exceeding 40,000 units for two consecutive months. This stability, he said, allows him to “tell a bigger story.”
“Xpeng has never believed that simply making a car bigger, prettier, cheaper, and higher quality guarantees success. That’s just a necessary condition, not a sufficient one. Hardware is the foundation, but software completes it.”
The story of physical AI is alluring, but challenges remain. A series of quality recalls this year—from Xpeng, Xiaomi, and Li Auto—has reminded automakers that manufacturing fundamentals haven’t changed. Building cars still requires a delicate balance between innovation and reliability.
Even the robotaxi business faces persistent regulatory and technical hurdles. No company has yet demonstrated profitability at scale. As for humanoid robots, they remain a long-term vision rather than a commercial reality.
While Xpeng shares much of Tesla’s technical roadmap, it must still prove that it can sustain long-term investment in physical AI on its own financial strength.
The following transcript has been edited and consolidated for brevity and clarity.
36Kr: Why does Xpeng insist on making humanoid robots? They are expensive. How do you justify the cost?
He Xiaopeng (HX): I believe that advanced robots will take many forms, but humanoid robots have three major advantages.
- First, to make robots smart, they must be driven by AI, not rules, and the best training data comes from the human world.
- Second, most environments, whether homes or factories, are built for humans. The more humanlike a robot is, the better it can operate in those spaces.
- Third, consumers relate to humanoid robots emotionally, which helps drive adoption. Higher sales volumes lower costs and create a positive feedback loop.
36Kr: What’s the component overlap between Xpeng’s robots and its cars?
HX: I don’t have an exact figure, but many systems are shared, like perception, domain controllers, and about 70% of the AI software. Of course, components such as joints or synthetic skin don’t exist in cars.
36Kr: How much revenue do you expect from physical AI compared with the automotive business?
HX: The global car market is worth about USD 10 trillion and produces 90 million vehicles annually. I believe the robot market could reach USD 20 trillion, though that might take 10–20 years. It won’t happen overnight.
Automotive growth is linear and highly regulated, but once robots hit a technological inflection point, growth will be exponential. I don’t know how many robots we’ll sell in a decade, but I’m confident the number will exceed cars.
36Kr: Robotaxi firms are still mostly unprofitable. How will Xpeng make its robotaxi business work?
HX: Xpeng’s approach is different. We use production-ready vehicles, not prototypes. Our focus isn’t technology-first but whether a product delivers real business and user value.
We’ll also release consumer-grade “robo-drive” models to spread bill-of-material and R&D costs. Our robotaxi and retail car businesses share the same hardware and software, giving us a cost advantage.
Moreover, we don’t rely on high-definition maps, street-level scanning, or LiDAR (light detection and ranging). Our system perceives and reasons more like humans, making it broader, more generalizable, and cheaper to deploy.
I believe the future of mobility will combine shared and private vehicles. Not every car will be a robotaxi. Xpeng will provide the “toolbox” comprising vehicles, software, and SDK interfaces for partners worldwide to operate their own fleets.
36Kr: Why partner with Amap on the robotaxi ecosystem?
HX: I used to oversee Amap—it’s my former company—and it’s one of China’s largest mobility platforms. It makes sense: Amap manages operations, and we supply the toolbox. It’s strategically aligned for both sides.
36Kr: You announced three robotaxi models for next year. How do they differ?
HX: They target different price ranges and user groups, including five-, six-, and seven-seater versions. That’s all I can share for now.
36Kr: You went straight to Level 4 automation, bypassing Level 3. Why?
HX: Because the future will only have Level 2 and Level 4. Level 3 is stuck in between—it’s neither here nor there.
36Kr: Has the second-generation VLA model completely eliminated the “L” (language) component?
HX: We still use vision plus language, but not human language. It’s a new physical-world language that’s richer and more efficient. We can also decode the reasoning process. For example, we can explain why the system decided not to turn left, even when it seemed logical to do so. We’ve already achieved this in testing.
36Kr: What other physical AI applications do you foresee? And given the growing US-China tech rivalry, who holds the upper hand in this field?
HX: Whoever starts earlier has the advantage, and data will be the key asset. Progress at major digital model firms has slowed recently, not due to algorithms or compute power, but limited data.
Anyone can lead in physical AI, but success depends on execution, essentially who can engineer systems well, deliver real-world performance, and offer better user experiences. That’s what drives the feedback loop.
That’s also why Xpeng believes making a car bigger, prettier, or cheaper isn’t enough. Hardware is just the start, software is equally important. For this reason, I spent nearly two hours at Tech Day discussing how we aim to merge the physical and digital worlds to enhance user experience. The rise of physical AI has only just begun.
36Kr: Tesla is exploring flying cars, too. What sets Xpeng apart?
HX: Both Tesla and Xpeng are pursuing cross-domain integration to redefine categories. For example, we’re embedding VLA technology into robots. Traditional robots use reinforcement learning for limb control, so their legs, waists, and hands don’t coordinate well. We’ve rebuilt that logic, as it was no longer optimal under the new paradigm.
I don’t know Tesla’s exact approach to flying cars, but we’re both innovating across domains in our own ways.
KrASIA Connection features translated and adapted content that was originally published by 36Kr. This article was written by Xiao Man for 36Kr.
