Guided Reinforcement Learning for
Robust Multi-Contact Loco-Manipulation

ETH Zurich and NVIDIA

CoRL 2024 (Oral)

Abstract

Reinforcement learning (RL) often necessitates a meticulous Markov Decision Process (MDP) design tailored to each task. This work aims to address this challenge by proposing a systematic approach to behavior synthesis and control for multi-contact loco-manipulation tasks, such as navigating spring-loaded doors and manipulating heavy dishwashers. We define a task-independent MDP to train RL policies using only a single demonstration per task generated from a model-based trajectory optimizer. Our approach incorporates an adaptive phase dynamics formulation to robustly track the demonstrations while accommodating dynamic uncertainties and external disturbances. We compare our method against prior motion imitation RL works and show that the learned policies achieve higher success rates across all considered tasks. These policies learn recovery maneuvers that are not present in the demonstration, such as re-grasping objects during execution or dealing with slippages. Finally, we successfully transfer the policies to a real robot, demonstrating the practical viability of our approach.

Overview

We generate a single demonstration for each task using the multi-contact planner (MCP) from Sleiman et al. This planner takes the task specification and a set of user-defined object affordances (e.g. handle or surfaces) and the robot's end-effectors (e.g. the tool on the arm or the feet) for interaction as inputs. It then searches for possible robot-object interactions to provide a physically consistent demonstration based on a nominal robot and object model.

Subsequently, we train an RL policy to reliably track these behaviors while leveraging only one pre-computed trajectory per task as an ``expert demonstration". We train this policy entirely in simulation with domain randomization to achieve a successful transfer to the real robot. In contrast to prior motion imitation works, we propose a state-dependent adaptive phase dynamics to facilitate successful task execution despite having modeling inaccuracies and significant external disturbances.

Hardware Deployment

Traversing a spring-loaded push door

Traversing a spring-loaded pull door

Dishwasher Opening

Dishwasher Closing

BibTeX

@inproceedings{sleiman2024guided,
  title={Guided Reinforcement Learning for Robust Multi-Contact Loco-Manipulation},
  author={Jean Pierre Sleiman and Mayank Mittal and Marco Hutter},
  booktitle={8th Annual Conference on Robot Learning},
  year={2024},
  url={https://openreview.net/forum?id=9aZ4ehSTRc}
}