MPC-Net¶
MPC-Net is an imitation learning approach that uses solutions from MPC to guide the policy search. The main idea is to imitate MPC by minimizing the control Hamiltonian while representing the corresponding control inputs by a parametrized policy. MPC-Net can be used to clone a model predictive controller into a neural network policy, which can be evaluated much faster than MPC. Therefore, MPC-Net is a useful proxy for MPC in computationally demanding applications that do not require the most exact solution.
The multi-threaded data generation and policy evaluation run asynchronously with the policy training. The data generation and policy evaluation are implemented in C++ and run on CPU, while the policy training is implemented in Python and runs on GPU. The control Hamiltonian is represented by a linear-quadratic approximation. Therefore, the training can run on GPU without callbacks to OCS2 C++ code running on CPU to evaluate the Hamiltonian, and one can exploit batch processing on GPU.
Robots¶
MPC-Net has been implemented for the following Robotic Examples:
Robot |
Recom. CPU Cores |
Recom. GPU Memory |
RaiSim |
Training Time |
---|---|---|---|---|
ballbot |
4 |
2 GB |
No |
0m 20s |
legged_robot |
12 |
8 GB |
Yes / No |
7m 40s |
Setup¶
Make sure to follow the Installation page. Follow all the instructions for the dependencies. Regarding the optional dependencies, make sure to follow the instruction for ONNX Runtime and the virtual environment, optionally set up RaiSim.
To build all MPC-Net packages, build the meta package:
cd <path_to_catkin_ws>
catkin_build ocs2_mpcnet
# Example:
cd ~/catkin_ws
catkin_build ocs2_mpcnet
To build a robot-specific package, replace <robot>
with the robot name:
cd <path_to_catkin_ws>
catkin_build ocs2_<robot>_mpcnet
# Example:
cd ~/catkin_ws
catkin_build ocs2_ballbot_mpcnet
Training¶
To train an MPC-Net policy, run:
cd <path_to_ocs2_repo>/ocs2_mpcnet/ocs2_<robot>_mpcnet/python/ocs2_<robot>_mpcnet
source <path_to_catkin_ws>/devel/setup.bash
source <path_to_venvs>/mpcnet/bin/activate
python3 train.py
# Example:
cd ~/git/ocs2/ocs2_mpcnet/ocs2_ballbot_mpcnet/python/ocs2_ballbot_mpcnet
source ~/catkin_ws/devel/setup.bash
source ~/venvs/mpcnet/bin/activate
python3 train.py
To monitor the training progress with Tensorboard, run:
cd <path_to_ocs2_repo>/ocs2_mpcnet/ocs2_<robot>_mpcnet/python/ocs2_<robot>_mpcnet
source <path_to_venvs>/mpcnet/bin/activate
tensorboard --logdir=runs
# Example:
cd ~/git/ocs2/ocs2_mpcnet/ocs2_ballbot_mpcnet/python/ocs2_ballbot_mpcnet
source ~/venvs/mpcnet/bin/activate
tensorboard --logdir=runs
If you use RaiSim, you can visualize the data generation and policy evaluation rollouts with RaiSim Unity, where pre-built executables are provided in RaiSim’s raisimUnity
folder. For example, on Linux run:
<path_to_raisimLib_repo>/raisimUnity/linux/raisimUnity.x86_64
# Example:
~/git/raisimLib/raisimUnity/linux/raisimUnity.x86_64
Deployment¶
To deploy the default policy stored in the robot-specific package’s policy
folder, run:
cd <path_to_catkin_ws>
source devel/setup.bash
roslaunch ocs2_<robot>_mpcnet <robot>_mpcnet.launch
# Example:
cd ~/catkin_ws
source devel/setup.bash
roslaunch ocs2_ballbot_mpcnet ballbot_mpcnet.launch
To deploy a new policy stored in the robot-specific package’s python/ocs2_<robot>_mpcnet/runs
folder, replace <path>
with the absolute file path to the final policy and run:
cd <path_to_catkin_ws>
source devel/setup.bash
roslaunch ocs2_<robot>_mpcnet <robot>_mpcnet.launch policyFile:=<path>
# Example:
cd ~/catkin_ws
source devel/setup.bash
roslaunch ocs2_ballbot_mpcnet ballbot_mpcnet.launch policyFile:='/home/user/git/ocs2/ocs2_mpcnet/ocs2_ballbot_mpcnet/python/ocs2_ballbot_mpcnet/runs/2022-04-01_12-00-00_ballbot_description/final_policy.onnx'
How to Set Up a New Robot¶
Setting up MPC-Net for a new robot is relatively easy, as the ocs2_mpcnet_core package takes care of the data generation as well as policy evaluation rollouts and implements important learning components, such as the memory, policy, and loss function.
This section assumes that you already have the packages for the robot-specific MPC implementation:
ocs2_<robot>: Provides the library with the robot-specific MPC implementation.
ocs2_<robot>_ros: Wraps around the MPC implementation with ROS to define ROS nodes.
ocs2_<robot>_raisim: (Optional) interface between the robot-specific MPC implementation and RaiSim.
For the actual ocs2_<robot>_mpcnet package, follow the structure of existing robot-specific MPC-Net packages. The most important classes/files that have to be implemented are:
<Robot>MpcnetDefinition: Defines how OCS2 state variables are transformed to the policy observations. and how policy actions are transformed to OCS2 control inputs.
<Robot>MpcnetInterface: Provides the interface between C++ and Python, allowing to exchange data and policies.
<robot>.yaml: Stores the configuration parameters.
mpcnet.py: Adds robot-specific methods, e.g. implements the tasks that the robot should execute, for the MPC-Net training.
train.py: Starts the main training script.
Known Issues¶
Stiff inequality constraints can lead to very large Hamiltonians and gradients of the Hamilltonian near the log barrier. This can obstruct the learning process and the policy might not learn something useful. In that case, enable the gradient clipping in the robot’s MPC-Net YAML configuration file and tune the gradient clipping value.
References¶
This part of the toolbox has been developed based on the following publications:
Jan Carius, Farbod Farshidian, and Marco Hutter. Mpc-net: a first principles guided policy search. IEEE Robotics and Automation Letters, 5(2):2897–2904, 2020.
Alexander Reske, Jan Carius, Yuntao Ma, Farbod Farshidian, and Marco Hutter. Imitation learning from mpc for quadrupedal multi-gait control. In 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021.