Learning robust perceptive locomotion for quadrupedal robots in the wild

Takahiro Miki1*, Joonho Lee1, Jemin Hwangbo2, Lorenz Wellhausen1, Vladlen Koltun3, Marco Hutter1

1 Robotic Systems Lab, ETH Zurich
2 Robotics and Artificial Intelligence Lab, KAIST
3 Intelligent Systems Lab, Intel
* Corresponding author: tamiki@ethz.ch

Paper Link: Science Robotics 10.1126/scirobotics.abk2822 (2022)
arXiv: arXiv:2201.08117
Our Version: PDF

Cite (Bibtex)
author = {Takahiro Miki  and Joonho Lee  and Jemin Hwangbo  and Lorenz Wellhausen  and Vladlen Koltun  and Marco Hutter },
title = {Learning robust perceptive locomotion for quadrupedal robots in the wild},
journal = {Science Robotics},
volume = {7},
number = {62},
pages = {eabk2822},
year = {2022},
doi = {10.1126/scirobotics.abk2822},

URL = {https://www.science.org/doi/abs/10.1126/scirobotics.abk2822},
eprint = {https://www.science.org/doi/pdf/10.1126/scirobotics.abk2822},


Legged robots that can operate autonomously in remote and hazardous environments will greatly increase opportunities for exploration into under-explored areas. Exteroceptive perception is crucial for fast and energy-efficient locomotion: perceiving the terrain before making contact with it enables planning and adaptation of the gait ahead of time to maintain speed and stability. However, utilizing exteroceptive perception robustly for locomotion has remained a grand challenge in robotics. Snow, vegetation, and water visually appear as obstacles on which the robot cannot step―or are missing altogether due to high reflectance. Additionally, depth perception can degrade due to difficult lighting, dust, fog, reflective or transparent surfaces, sensor occlusion, and more. For this reason, the most robust and general solutions to legged locomotion to date rely solely on proprioception. This severely limits locomotion speed, because the robot has to physically feel out the terrain before adapting its gait accordingly. Here we present a robust and general solution to integrating exteroceptive and proprioceptive perception for legged locomotion. We leverage an attention-based recurrent encoder that integrates proprioceptive and exteroceptive input. The encoder is trained end-to-end and learns to seamlessly combine the different perception modalities without resorting to heuristics. The result is a legged locomotion controller with high robustness and speed. The controller was tested in a variety of challenging natural and urban environments over multiple seasons and completed an hour-long hike in the Alps in the time recommended for human hikers.



An hour-long hiking loop on the Etzel mountain in Switzerland. The hiking route was 2.2 km long, with an elevation gain of 120 m. The robot was able to reach the summit in 31 minutes, which is faster than the expected human hiking duration indicated in the official signage (35 minutes), and finished the entire path in 78 minutes –virtually the same duration suggested by a hiking planner (76 minutes), which rates the hike “difficult”. The difficulty levels are chosen from “easy”, “moderate”, and “difficult”, calculated by combining the required fitness level, sport type, and the technical complexity.

DARPA Subterrenean Challenge

Our controller was used as the default controller in the DARPA Subterranean Challenge missions of team CERBERUS which has won the first prize in the finals (Results). In this challenge, our controller drove ANYmals to operate autonomously over extended periods of time in underground environments with rough terrain, obstructions, and degraded sensing in the presence of dust, fog, water, and smoke. Our controller played a crucial role as it enabled four ANYmals to explore over 1700m in all three types of courses – tunnel, urban, and cave – without a single fall.

Robustness against unreliable perception

The robot perceives the environment in the form of height samples from an elevation map constructed from point cloud input.

We encountered many circumstances in which exteroception provides incomplete or misleading input. The estimated elevation map can unreliable due to sensing failures, limitations of the 2.5D height map representation, or viewpoint restrictions due to onboard sensing.

Exteroceptive representation and challenges
Exteroceptive representation and challenges. Our locomotion controller perceives the environment through height samples (reddots) from an elevation map (A). The controller is robust to many perception challenges com-monly encountered in the field: missing map information due to sensing failure (B, C, G) andmisleading map information due to non-rigid terrain (D, E) and pose estimation drift (F).
Exteroceptive challenges
Animations of unreliable exteroception.

Training overview

Training overview
Overview of the training methods and deployment. We first train a teacher policy with access to privileged simulation data using reinforcement learning (RL). This teacher policyis then distilled into a student policy, which is trained to imitate the teacher’s actions and to reconstruct the ground-truth environment state from noisy observations. We deploy the student policy zero-shot on real hardware using height samples from a robot-centric elevation map.


Walking over stairs in different directions

For traversing stairs, the other quadrupedal robots typically require that a dedicated mode is engaged, and the robot must be properly oriented with respect to the stairs. In contrast, our controller does not require any special mode for stairs, and can traverse stairs natively in any direction and any orientation, such as sideways, diagonally, and turning around on the stairway.

Baseline comparison

We have demonstrated the extreme robustness of our controller in the real world, but does exteroceptive input actually help improve locomotion performance? To answer this, we conducted controlled experiments to quantitatively evaluate the contribution of exteroception. We compared our controller to a proprioceptive baseline that does not use exteroception.

Robustness evaluation

To examine how our controller integrates proprioception and exteroception, we conducted a number of controlled experiments and visualized the reconsted features from the belief state.

Slippery surfaces and soft obstacles


Author contributions

T.M. formulated the main idea of combining inputs from multiple modalities. J.L. and J.H designed and tested the initial setup. T.M. developed software and trained the controller. T.M. and L.W. set up the perception pipeline on the robot. T.M. conducted most of the indoor experiments. T.M., J.L., and L.W. conducted outdoor experiments. All authors refined ideas, contributed in the experiment design, analyzed the data, and wrote the paper.


The project was funded, in part, by the Intel Network on Intelligent Systems, the SwissNational Science Foundation (SNF) through the National Centre of Competence in ResearchRobotics and project No. 188596, the European Research Council (ERC) under the EuropeanUnion’s Horizon 2020 research and innovation programme grant agreement No. 852044, No.780883 and No. 101016970. The work has been conducted as part of ANYmal Research, acommunity to advance legged robotics.