More than a hardware play: Robot Era’s founder sets the record straight on the company’s true ambitions

Based in Beijing, Robot Era was founded in August 2023 by Chen Jianyu, an assistant professor at Tsinghua University’s Institute for Interdisciplinary Information Sciences. On July 7, the company announced it raised nearly RMB 500 million (USD 70 million) in a Series A funding round co-led by CDH VGC and Haier Capital. Houxue Capital, Meridian Capital, Xianghe Capital, and Fore also participated, with earlier investors Crystal Stream and Tsinghua Holdings joining the round as well.

Despite being less than two years old, Robot Era has already released a series of hardware products including dexterous hands, wheeled platforms, and humanoid robots. This has led some to mistakenly categorize the company as a specialized robotics hardware maker. “Some people think we’re just a company that builds dexterous hands,” Chen said.

But that’s not what he envisions for Robot Era.

Chen’s goal, set nearly a decade ago after coming across AlphaGo, was to build a general-purpose intelligent robot. Specifically, one that’s not just a mechanical form, but a system with a “brain” capable of adapting to various real-world environments.

“Building both the brain and the body may look difficult, but it’s a natural choice for me, because I can do both,” Chen said.

Among founders in the embodied intelligence space, Chen’s interdisciplinary background is unusual. His academic training spans both physical systems and intelligent control. In 2011, he was admitted to Tsinghua’s Department of Precision Instruments, a pioneer in bipedal humanoid robot research in China. Later, while pursuing his PhD at the University of California, Berkeley, he shifted focus to model predictive control (MPC) and end-to-end reinforcement learning—two foundational elements in robotic intelligence today.

Chen is perhaps better known for his algorithmic work than his hardware designs. He developed DWL, a next-generation learning framework for humanoid robots that was nominated at the Robotics: Science and Systems (RSS) conference. His team also introduced VPP, an embodied AI architecture built on generative world models, which was highlighted at the International Conference on Machine Learning (ICML).

In a three-hour interview with 36Kr, about half the time was spent discussing algorithms and what Chen refers to as “brains.”

But Chen is adamant that neither brains nor bodies alone will be enough to build a true general-purpose humanoid robot. His focus is on building an integrated system: a complete software-hardware stack.

On the software side, Robot Era has developed ERA-42, a vision-language-action (VLA) model that fuses perception with generative capabilities. This allows robots to interpret their surroundings and anticipate events in real time.

On the hardware side, the company is developing modular platforms that can be reconfigured like Lego blocks into bipedal, wheeled, or humanoid forms depending on the task.

Because robotics supply chains are still underdeveloped, Robot Era is designing and producing its own foundational components: joint modules, control units, motors, and reducers.

This two-pronged strategy explains the company’s speed in hardware development. Robot Era already offers three commercial products: the XHand 1, the Q5, and the STAR1.

Chen often references a favorite idea: “laying eggs along the way.” The concept is to release each usable component as an independent product. This helps recoup costs, reduces financial pressure, and generates real-world data to feed back into the company’s research loop.

As of June, Robot Era had delivered over 200 units, with hundreds more in production. It counts nine of the world’s ten most valuable tech companies among its clients, including Haier Smart Home, Lenovo, and BZS Technology Development.

The following transcript has been edited and consolidated for brevity and clarity.

36Kr: Given your academic background spans both robotics and artificial intelligence, did you ever consider focusing on just one area when you started Robot Era? Or was it never really a question?

Chen Jianyu (CJ): It was never a question. I made two judgments early on:

First, does a robot need both brain and body? Absolutely. A robot without a brain is just scrap metal. A brain without a body isn’t a robot. To commercialize properly, we need to deliver both.
Second, are we capable of doing both? Yes. Because I’ve done both, it’s a natural path for me.

My own journey started with hardware and mechatronics. In grad school, I moved into systems integration and control. I’ve been working on AI for robots for nearly a decade, starting around the AlphaGo era, in 2016 or 2017.

36Kr: How did the rise of large AI models in 2022 shift your direction?

CJ: We’ve moved through several phases.

In the first phase, right after ChatGPT launched in 2023, we began utilizing it as if it were a robot, tasking it with planning how a robot might use its sensors, recognize targets, and execute steps. It worked surprisingly well. We wrote the world’s first paper on integrating large language models (LLMs) with humanoid robots.

We then addressed alignment issues between high-level language planning and low-level reinforcement learning strategies.

The second phase, inspired by Google’s work, focused on VLA models. We were among the first in China to replicate RT-2. We saw its limitations and introduced our own solution: a two-system VLA architecture combining “slow” cognitive planning with a “fast” action system for fine-motor control.

In September 2024, we published the PAD framework, introducing the idea of world model fusion. Later that year, we released the VPP architecture, merging pretrained video prediction models with PAD. Both were accepted by ICML.

In January this year, we introduced UP-VLA, which is a unified model that integrates understanding, prediction, and policy learning. It can predict future frames and generate precise, joint-level robot actions. It’s essentially a brain that’s always anticipating the next move.

36Kr: Sounds like you’ve got the software groundwork nailed down. So what’s missing?

CJ: Data. LLMs had the advantage of pre-existing corpora. Robots don’t. Waymo recently released a huge driving dataset, but even that’s tiny compared to what’s needed for robot training.

Unlike autonomous driving, where vehicles already generate massive real-world data, robotics lacks that infrastructure. We’d need to collect data for thousands of years to match ChatGPT-level corpora.

36Kr: Are you using teleoperation to generate training data?

CJ: We use a hybrid approach. We start with massive video datasets to pretrain generalist models, then fine-tune using high-quality teleoperation data. This way, we don’t rely solely on costly real-world interactions.

36Kr: What kind of data is actually useful?

CJ: Diversity is key. If you only train on clean, ideal data, your model won’t handle messy or dangerous situations.

For example, when pouring water, you can’t just use the same cup, from the same position, every time. Different cups, angles, and backgrounds all affect how the water behaves. We need multidimensional variation to train models that generalize.

36Kr: How important is it that a robot looks human?

CJ: Very. Training humanoids gives you a powerful base you can downscale to other forms. Many components are reusable across different configurations.

So we’re building humanoid robots not as an end, but as a means. The human form lets us leverage vast human video data. It aligns with our method: direct learning from large-scale human behavior datasets.

36Kr: Some founders say the brain doesn’t matter as long as the body is great. What do you think?

CJ: To train AI, you need a body first, then data collection, then learning. That’s slower than just building the body.

But as a startup, we believe in monetizing along the way. Our dexterous hand products are already profitable. When we scale up production of humanoid robots, unit economics will improve. Meanwhile, we’re commercializing our machines, and eventually, our models and platforms too.

36Kr: Could a “brain-only” company outperform you?

CJ: I doubt it. Without a commercialization loop, there’s no resource inflow. If you’re training on multiple third-party platforms, you’ll also need to integrate new data pipelines for each, and that makes it hard to scale.

36Kr: The VLA framework is hot, but critics say it’s fragmented and limited by data. Your thoughts?

CJ: Right now, the “L” (language) is too dominant. The models are first trained as language models, then extended to vision, then to action.

But evolution didn’t work that way. Control came first, then vision, then language. Even monkeys can perform dexterous tasks without speaking.

We’ve reversed that. But many robotic tasks don’t need language at all. You want a robot that can act, not talk. So we’re now researching joint pretraining of language, vision, and action from the beginning.

36Kr: Some companies structure robotic brains by layering tasks or functions. Do you agree with this method?

CJ: People often split models vertically (perception, prediction, control) or horizontally (by task type). The problem is that segmentation prevents synergy. Even if you build 1,000 task-specific models, nothing emergent comes out of it.

We aim for unification. Our generalist model, when fine-tuned for a vertical task, outperforms specialized small models and learns faster.

36Kr: What’s the role of reinforcement learning in all this?

CJ: Most VLA models today don’t use reinforcement learning. They are trained offline by watching others.

It’s like learning ping pong by watching videos and having a coach show you a few moves. You might still suck.

Reinforcement learning means watching, learning, then practicing and adjusting in real time, especially for physically grounded tasks. Without it, you can’t master complexity.

36Kr: You mentioned earlier that there’s a disconnect between how outsiders perceive Robot Era and what you’re actually building. Do you still feel that way?

Chen Jianyu: Absolutely. People haven’t seen the full picture. We’ve built a pretty comprehensive system. It’s universal. But some haven’t pieced it together, or maybe we just haven’t communicated it clearly enough.

36Kr: What do you mean by “system”?

CJ: It’s both hardware and software.

Think of the hardware like Lego blocks. We’ve developed all the smallest components in-house: joint modules, motors, gearboxes, controllers. We’ve also made everything modular and interoperable.

For example, our robotic hand product is a module that can be used across different robot types. Even the joints can be pulled out and reconfigured into a totally different robot.

Our software is also general-purpose. It can adapt quickly to different forms and tasks. One stack for everything.

36Kr: With that base in place, is it easy to expand to new robot forms?

CJ: Very easy. Any robot form you can imagine, we can build it using the same foundational components.

36Kr: Sounds like you’re modularizing the entire robot. So how do you define product form?

CJ: The humanoid robot will probably be the most common form in the long run. But needs vary by scenario, so we build for different shapes.

If a task requires stairs, we use bipedal robots. If it’s flat ground, wheeled robots work better. In a 3C factory where you’re replacing a single workstation, maybe you just need a torso.

36Kr: What’s your current shipment volume?

CJ: Over 200 units. Our customer base is wide. Nine of the world’s top ten most valuable tech companies are our clients. Some buy dozens at a time and are already using them.

36Kr: How do you select which scenarios to target?

CJ: Two filters: high value and reusability.

High value means jobs where human workers earn high wages. We look for high-value tasks within the limits of robot capability. Right now, we’re focused on two types: industrial and service.

Our industrial robot is a mix of mobility, strength, agility, intelligence. Our service robot is smaller and more focused on aesthetics and interaction, which are important for service-related sectors.

36Kr: How smart are your current robots?

CJ: We have two levels: product-grade and demo-grade.

Our demo bots can handle tasks like drilling screws, scanning barcodes, and scooping water with high success rates.

Product-grade robots face stricter requirements. In logistics settings, they can now locate labels, scan codes, and sort items with solid success rates. They are now being deployed in real-world environments.

36Kr: Beyond logistics, what’s the next promising scenario?

CJ: Manufacturing. It requires more precision.

Logistics is mostly moving boxes, which is fairly straightforward. Manufacturing demands fine-motor skills and tool use. Tasks like flipping parts, applying labels, or using custom tools.

36Kr: For humanoid robots, are most parts off-the-shelf or custom?

CJ: We don’t customize parts ourselves as it’s too costly. But we do deep internal design, right down to the motor. Our motors, gears, control boards, and drivers are all self-designed. We own the blueprints.

36Kr: In factory settings, what jobs can’t be replaced by robots yet? And what if one day they all can?

CJ: In theory, any repetitive line task can be replaced, but it’s still hard right now.

If they are eventually replaced, it will cause massive social change. But I think that’s a good thing. Robots can take over the dirty, dangerous, and dull jobs that young people no longer want.

That frees people up to do more meaningful work while improving efficiency. Everything gets cheaper.

Robots themselves will become a new kind of consumer endpoint. Somewhere between smartphones and cars in scale. In five years, families might start owning one or two robots for service or companionship.

36Kr: What does this latest funding round mean to you?

Chen Jianyu: It’s about preparing for the future. Competition will get fierce, even though commercialization has barely even started.

Today’s funding in robotics is tiny compared to electric vehicles or large models. But eventually, we’ll need manufacturing and AI capabilities on a much larger scale. We’re just early.

36Kr: How soon before robots are common in the home?

CJ: Gradually. In three to five years, you’ll probably see early forms in high-net-worth households.

For the mass market, requirements will be stricter. We’ll need something affordable and highly generalizable.

36Kr: Haier and Midea have talked about robotics for years. How do you view large corporations in this space?

CJ: It’s always competition and cooperation. With internet giants, we can be a hardware supplier. With traditional manufacturers, maybe we’ll provide software.

36Kr: Every carmaker has an autonomous driving team. Now many are looking at end-to-end architectures. Will the automotive industry pivot to robotics?

CJ: Robotics is a natural extension of smart vehicles, but not all will make the leap.

Big companies are careful with new bets. Right now, their investments are small, roughly on par with ours in terms of personnel.

36Kr: LLMs have converged around a few major players. Will robotics follow the same pattern?

CJ: No. Robotics is more fragmented.

LLMs are universal. Release one and everyone can use it instantly. It centralizes power.

Robots are hardware. They are messy, local, and diverse. That makes the market wide open, and more forgiving. There will be many players.

KrASIA Connection features translated and adapted content that was originally published by 36Kr. This article was written by Qiu Xiaofen for 36Kr.