Identifying Terrain Physical Parameters from Vision

Towards Physical-Parameter-Aware Locomotion and Navigation

Robotic Systems Lab, ETH Zurich | Autonomous Learning Group, MPI for Intelligent Systems
accepted for IEEE Robotics and Automation Letters (RA-L) 2024

Supplementary Video of the Paper

Abstract

Identifying the physical properties of the surrounding environment is essential for robotic locomotion and navigation to deal with non-geometric hazards, such as slippery and deformable terrains. It would be of great benefit for robots to anticipate these extreme physical properties before contact; however, estimating environmental physical parameters from vision is still an open challenge. Animals can achieve this by using their prior experience and knowledge of what they have seen and how it felt. In this work, we propose a cross-modal self-supervised learning framework for vision-based environmental physical parameter estimation, which paves the way for future physical properties-aware locomotion and navigation. We bridge the gap between existing policies trained in simulation and identification of physical terrain parameters from vision. We propose to train a physical decoder in simulation to predict friction and stiffness from multi-modal input. The trained network allows the labeling of real-world images with physical parameters in a self-supervised manner to further train a visual network during deployment, which can densely predict the friction and stiffness from image data. We validate our physical decoder in simulation and the real world using a quadruped ANYmal robot, outperforming an existing baseline method. We show that our visual network can predict the physical properties in indoor and outdoor experiments while allowing fast adaptation to new environments.

System Overview

System Overview

Overview of the two-stage self-supervised terrain physical parameter learning framework. A physical decoder in twin structure is trained in simulation to predict simulated friction and stiffness parameters per foot. The physical decoder transfers to the real world, where it provides self-supervised labels (within the supervision mask) to train a visual network on real-world image data. In the training stage, the visual network is trained with weak supervision only on the foothold pixels. In the inference phase, the visual pipeline processes all pixel features within an image and outputs the corresponding dense prediction of the simulated physical parameters with a confidence mask.

Off-road evaluation

Off-road evaluation of dense friction prediction in a hiking scenario.

Dense friction prediction in a hiking scenario.


Off-road evaluation of dense stiffness prediction in a snowy scenario (outdoor).

Dense stiffness prediction on outdoor ground covered by heavy snow.


Off-road evaluation of dense stiffness prediction in a snowy scenario (indoor).

Dense stiffness prediction on indoor rigid ground.

BibTeX

@INPROCEEDINGS{Chen24physical, 
        AUTHOR    = {Jiaqi Chen AND Jonas Frey AND Ruyi Zhou AND Takahiro Miki AND Georg Martius AND Marco Hutter}, 
        TITLE     = {Identifying Terrain Physical Parameters from Vision - Towards Physical-Parameter-Aware Locomotion and Navigation}, 
        BOOKTITLE = {accepted for IEEE Robotics and Automation Letters (RA-L)}, 
        YEAR      = {2024}
      }