We generate a single demonstration for each task using the multi-contact planner (MCP) from Sleiman et al. This planner takes the task specification and a set of user-defined object affordances (e.g. handle or surfaces) and the robot's end-effectors (e.g. the tool on the arm or the feet) for interaction as inputs. It then searches for possible robot-object interactions to provide a physically consistent demonstration based on a nominal robot and object model.
Subsequently, we train an RL policy to reliably track these behaviors while leveraging only one pre-computed trajectory per task as an ``expert demonstration". We train this policy entirely in simulation with domain randomization to achieve a successful transfer to the real robot. In contrast to prior motion imitation works, we propose a state-dependent adaptive phase dynamics to facilitate successful task execution despite having modeling inaccuracies and significant external disturbances.
@inproceedings{sleiman2024guided,
title={Guided Reinforcement Learning for Robust Multi-Contact Loco-Manipulation},
author={Jean Pierre Sleiman and Mayank Mittal and Marco Hutter},
booktitle={8th Annual Conference on Robot Learning},
year={2024},
url={https://openreview.net/forum?id=9aZ4ehSTRc}
}