Is teleoperation holding robotics development back? MindOn Robotics makes its case

Zhu Qingxu, born after 1995, is a former researcher at Tencent Robotics X. In June, he left Robotics X to start MindOn Robotics, an embodied intelligence company focused on algorithms.

Citing exclusive information, 36Kr reported that MindOn completed three funding rounds within its first four months, raising close to RMB 100 million (USD 14 million):

The first round was backed solely by Oriza Seed.
The second was led by MSA Capital, with participation from Innoangel Fund, Yuanbio Venture Capital, Genspark, and existing shareholder Oriza Seed, which made an oversized follow-on investment.
The third round was led by Genspark, followed by Plum Ventures and G&O, with Innoangel Fund making another oversized follow-on investment and MSA Capital also joining.

Recently, MindOn combined its algorithms with Unitree Robotics’ hardware and released a series of demo videos. From clearing dust mites while sprawled across a bed to watering plants perched high on a greenhouse frame, the robot performed household tasks with a fluidity close to that of a real person. The videos were not sped up.

The tasks were inspired by a theme Zhu noticed on Xiaohongshu: a day in the life of a mother raising a child alone. He selected several of the most difficult chores, particularly those requiring coordinated use of hands and feet, because these actions strongly test embodied intelligence control. The videos were widely shared after posting.

Graphic of a Unitree humanoid robot programmed with MindOn’s algorithms, opening window blinds in a poised manner. — A Unitree humanoid robot programmed with MindOn’s algorithms opening window blinds in a poised manner. Graphic source: MindOn Robotics.

In an interview with 36Kr, Zhu offered a contrarian view:

“I believe the right robot form factor for real household scenarios is still a bipedal humanoid, and it should be achievable within three to five years.”

Household environments involve diverse and non-standardized tasks and layouts, which complicate learning and generalization for embodied intelligence models. Bipedal robots also face challenges in motion control, balance, and engineering complexity. For these reasons, many in the industry believe humanoid robots will not be able to perform real household tasks for another five to ten years.

Zhu remains convinced that the bipedal form better suits home environments. He argues that the human world is designed around the human body. Human-like forms can better reuse human data and adapt to complex household layouts, especially for climbing, stepping over objects, crouching, and other movements that wheeled robots find difficult.

Graphic showing a Unitree humanoid robot crouching down to pick up an object. — MindOn uses human behavioral data to train robots to emulate subconscious coordination needed for complex tasks, such as crouching to pick up an object. Graphic source: MindOn Robotics.

Asked why his timeline differs from the broader industry view, Zhu replied:

“Human-form robot training is progressing slowly because the mainstream teleoperation-based approach has a fundamental flaw.”

In teleoperation, an operator controls the robot with a handheld device. Because the operator must consciously think through each action, movements that should be instinctive become slow and jerky. Zhu argues that training robots on such data leads to robots that lack fluidity.

His perspective stems from years of academic and industry experience. Zhu has a background in robot control and research. In 2021, he graduated from a joint program between ETH Zurich and RWTH Aachen University. He joined Robotics X that same year. Over four years, he and his team collected data through multiple methods and trained embodied intelligence models at scale. They found that models trained on teleoperation data performed poorly on execution efficiency.

In May this year, Boston Dynamics also questioned teleoperation, arguing that the approach relies too heavily on a person’s conscious mental processes during data collection. The company said this leads to inefficient behavior, limited dynamics, and unnecessary micro movements. The critique reinforced Zhu’s technical direction.

At MindOn, Zhu adopts an architecture combining a “cerebellum” and a “cortex.” The cerebellum manages motion control, while the cortex handles planning and generalization. The company is currently focused more heavily on advancing the cerebellum, which Zhu said has been less explored in the industry. By building a comprehensive “human motion library,” MindOn can rapidly collect movement data and train robots to learn a wide range of foundational actions.

For real-world data collection, MindOn uses an approach that differs from teleoperation. It relies on optical motion capture combined with a universal manipulation interface (UMI) system. Operators wear motion capture suits and perform natural movements in a controlled capture space as multiple cameras record their trajectories. This generates high-fidelity data that reflects subconscious human coordination while improving data collection efficiency in the lab.

Next, operators use handheld UMI grippers to manipulate objects in real-world settings, producing large-scale hand-object interaction data. Combined with motion capture recordings, the system yields high-quality, scalable datasets.

Graphic of a Unitree humanoid robot joining children in a game of frisbee. — Through MindOn’s approach, bipedal robots could even become playmates for activities such as a game of frisbee. Graphic source: MindOn Robotics.

Discussing the company’s financing, Zhu said MindOn’s technical differentiation is the main reason investors committed quickly. He noted that these investors are already heavily involved in embodied intelligence but chose MindOn because its technology complements their broader portfolios.

Zhu predicts that as training efficiency improves, the timeline for humanoid robots entering households will shorten to three to five years. Before that, he expects bipedal versions to enter retail or fast casual dining environments, such as unmanned fast food outlets, within one to two years. These environments feature fixed tasks and controlled conditions that support rapid validation and commercial deployment.

Asked about MindOn’s moat, Zhu said:

“When everyone believed in teleoperation, we recognized its fundamental flaw and found another path. Our real moat is the ability to take an immature idea and turn it into reality, step by step.”

The following transcript has been edited and consolidated for brevity and clarity.

36Kr: Why do you believe teleoperation has a fundamental flaw?

Zhu Qingxu (ZQ): At its core, teleoperation forces a person to use the brain’s “slow system” to control a robot. The operator must observe, think, and then act. That process is inherently slow, jerky, and full of unnecessary pauses.

Training robots like this is akin to learning from a teacher who moves unnaturally. It locks the robot’s performance ceiling. All those robot videos you see that require sped-up playback? That is the root cause.

For tasks requiring tactile sensitivity, such as twisting a bottle cap, teleoperation is even worse. Without true force feedback, operators cannot tell whether the robot’s hand is gripping correctly, which lowers efficiency.

36Kr: If teleoperation is flawed, why did it become so widely adopted?

ZQ: The initial goal was simple: to get robots to manipulate real objects and collect real-world data. Teleoperation was the first feasible solution.

36Kr: How does your system work, and what are its advantages?

ZQ: It balances data quality with data scale. Motion capture records natural, instinctive full-body movements. UMI allows operators to manipulate objects at scale, capturing detailed hand-object interactions. Together, they generate far higher quality data than video-only approaches and in far larger quantities than teleoperation. Essentially, it captures the subconscious actions robots should be learning.

36Kr: Once the data is collected, how does your architecture work?

ZQ: It mirrors how intelligence forms. The “cerebellum” learns basic human actions such as walking, running, crouching, grasping, and pulling. Using motion capture data, we train these in simulation. Once built, the library becomes universal.

The “cortex” handles perception, instructions, and planning, calling the cerebellum’s skills to execute actions. They evolve together. The richer the cerebellum’s skills, the more the cortex can do, and the smarter the cortex becomes, the more precisely it can deploy those skills.

36Kr: You mentioned humanoid robots entering unmanned stores in one to two years. How would that work?

ZQ: Tasks in places such as unmanned fast food outlets are limited and enumerable. We can capture every action in the motion capture lab. Because our data quality is high, robots learn these primitive actions efficiently. It takes only two to three days to learn all the roles. Then we collect UMI data in the real store to support generalization. Teleoperation cannot match this efficiency.

36Kr: Motion capture requires a studio filled with cameras. You are not going to build one inside a kitchen, so how do you handle scenario-specific teaching?

ZQ: You do not need to. Actions can be captured fully in the lab. Household or kitchen tasks break down into finite movements such as grasp, place, lift, and shake. These can all be captured with motion capture equipment. Then we gather UMI and environment data on-site.

36Kr: What is the biggest challenge in generalizing from controlled environments to real households?

ZQ: Generalization is the biggest challenge. We need to address three types: object generalization, positional generalization, and scene generalization. This requires substantial and diverse scene data. We believe in scaling laws, but only when data quality and quantity are sufficient.

36Kr: Why will the embodied intelligence system that enters households likely take the humanoid form?

ZQ: Humanoid robots have challenges such as stability and control, but their advantages outweigh the downsides. Homes are designed for people. Many layouts and tasks are unfriendly to wheels, including stairs, uneven levels, and thick carpets. Tasks that involve changing heights also favor the humanoid form. For non-human forms, you would need to enumerate and solve shape-specific problems continuously. Training relies on human motion data, and such datasets do not exist for non-human forms.

36Kr: How will users teach robots new tasks once they enter homes?

ZQ: Ideally, robots will come equipped with most commonly employed skills. For new tasks, we may later release a simple demonstration tool that allows users to teach a task once through hands-on demonstration, after which the robot can learn through observation and minimal practice.

36Kr: Some say algorithms are weak products and that hardware companies will eventually build their own algorithms. Where is MindOn’s moat?

ZQ: Technology has no permanent moat, only temporary leads. Our moat is judgment and iteration. When everyone believed in teleoperation, we recognized its fundamental flaw and found another path. Our real moat is the ability to take an immature idea and turn it into reality, step by step.

36Kr: What is your outlook for the sector?

ZQ: The sector will go through a shakeout. The companies that survive will be the ones “working out,” not “putting on makeup.” For us, we aim to deepen our technical foundation and focus on long-term capability. When the industry matures, we want to remain standing.

KrASIA Connection features translated and adapted content that was originally published by 36Kr. This article was written by Fu Chong for 36Kr.