Learning Quadrupedal Locomotion over Challenging Terrain

Joonho Lee^1,*, Jemin Hwangbo^1,2,†, Lorenz Wellhausen¹, Vladlen Koltun³, and Marco Hutter¹

¹ Robotic Systems Lab, ETH Zurich
² Robotics and Artificial Intelligence Lab, KAIST
³ Intelligent Systems Lab, Intel

^† Substantial part of the work was carried out during his stay at 1
^* Corresponding author: jolee@ethz.ch

Paper links

Science Robotics Vol.5 eabc5986 (2020)
Author's version PDF

ABSTRACT

Legged locomotion can extend the operational domain of robots to some of the most challenging environments on Earth. However, conventional controllers for legged locomotion are based on elaborate state machines that explicitly trigger the execution of motion primitives and reflexes. These designs have increased in complexity but fallen short of the generality and robustness of animal locomotion. Here, we present a robust controller for blind quadrupedal locomotion in challenging natural environments. Our approach incorporates proprioceptive feedback in locomotion control and demonstrates zero-shot generalization from simulation to natural environments. The controller is trained by reinforcement learning in simulation. The controller is driven by a neural network policy that acts on a stream of proprioceptive signals. The controller retains its robustness under conditions that were never encountered during training: deformable terrains such as mud and snow, dynamic footholds such as rubble, and overground impediments such as thick vegetation and gushing water. The presented work indicates that robust locomotion in natural environments can be achieved by training in simple domains.

Summary video

Highlights

Zero-shot generalization to unknown environments

Figure 1. A number of specific deployments (paper Figure 2)

The presented controller has been deployed in diverse natural environments. These include steep mountain trails, creeks with running water, mud, thick vegetation, loose rubble, snow-covered hills, and a damp forest. A number of specific scenarios are further highlighted in Fig. 1A-F. These environments have characteristics that the policy does not experience during training. The terrains can deform and crumble, with significant variation of material properties over the surface. The robot's legs are subjected to frequent disturbances due to vegetation, rubble, and sticky mud. Existing terrain estimation pipelines that use cameras or LiDAR fail in environments with snow (Fig. 1A), water (Fig. 1C), or dense vegetation (Fig. 1F). Our controller does not rely on exteroception and is immune to such failure. The controller learns omnidirectional locomotion based on a history of proprioceptive observations and is robust in zero-shot deployment on terrains with characteristics that were never experienced during training.

Our controller was used by the Cerberus team for the DARPA Subterranean Challenge Urban Circuit (Fig. 1G). It replaced a model-based controller that had been employed used by the team in the past. The objective of the competition is to develop robotic systems that rapidly map, navigate, and search complex underground environments, including tunnels, urban underground, and cave networks. The human operators are not allowed to assist the robots during the competition physically; only teleoperation is allowed. Accordingly, the locomotion controller needs to perform without failure over extended mission durations. The presented controller drove two ANYmal-B robots in four missions of 60 minutes. The controller exhibited a zero failure rate throughout the competition. A steep staircase that was traversed by one of the robots during the competition is shown in Fig. 1G.

Foot-trapping reflex

Movie 1. Step experiment (paper Movie S3)

The learned controller manifests a foot-trapping reflex, as shown in Movie 1. The policy identifies the trapping of the foot purely from proprioceptive observations and lifts the foot over the obstacle. Such reflexes were not specified in any way during training: they developed adaptively. This distinguishes the presented approach from conventional controller design methods, which explicitly build in such reflexes and orchestrate their execution by a higher-level state machine.

Robustness to Model-mismatch

Movie 2. Payload experiment (paper Movie S4)

We tested the controllers in the presence of substantial model mismatch. We attached a 10 kg payload. This payload is 22.7 % of the total weight of the robot, and was never simulated during training. As shown in Movie 2, the presented controller can still traverse steps up to 13.4 cm despite the model mismatch. The baseline is incapable of traversing any steps under any command speed with the payload.

Robustness to Foot Slippage

Movie 3. Foot Slippage experiment (paper Movie S5)

Next we test robustness to foot slippage. To introduce slippage, we used a moistened whiteboard. The results are shown in Movie 3. The baseline quickly loses balance, aggressively swings the legs, and falls. In contrast, the presented controller adapts to the slippery terrain and successfully locomotes in the commanded direction.

Acknowledgment

Author contributions: J.L. formulated the main idea of the training and control methods, implemented the controller, set up the simulation, and trained control policies. J.L. performed the indoor experiments. J.H. contributed in setting up the simulation. J.L. and L.W. performed outdoor experiments together. J.L., J.H., L.W., M.H., and V.K. refined ideas, contributed in the experiment design, and analyzed the data.

Funding: The project was funded, in part, by the Intel Network on Intelligent Systems, the Swiss National Science Foundation (SNF) through the National Centre of Competence in Research Robotics, the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme grant agreement no 852044 and no 780883. The work has been conducted as part of ANYmal Research, a community to advance legged robotics.