How to Profile the Solvers

This document lists some profiling and benchmarking tools and how to use them with OCS2.

Test conditions

To improve accuracy and make comparison fair turn off powersave and turbo boost:

cpupower frequency-set --governor performance
echo "1" | tee /sys/devices/system/cpu/intel_pstate/no_turbo
# run your benchmark
cpupower frequency-set --governor powersave
echo "0" | tee /sys/devices/system/cpu/intel_pstate/no_turbo

You may also want to build in Release mode and enable architecture specific features:

catkin config --cmake-args -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-march=native -mtune=native"

Use the built-in timers

OCS2 keeps internal statistics of the solver timings. If enabled, the solver will print a summary from the destructor. You can enable them in the task.info file:

; DDP settings
ddp
{
  ; ...
  displayShortSummary   true
  ; ...
}

or you can directly access them through getBenchmarkingInfo() method.

linux perf-tools

Overview:

  • Perf collects statistical data of your process

  • perf stat : CPU performance counters

    • instructions

    • branches, branch-misses

    • L1, LLC cache loads and misses

    • context-switches

  • perf record :

    • see in which functions time is spent

  • Note: perf does not give exact measurements but only a statistical approximation!

Installation:

sudo apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`

Usage:

  • Compile with -DCMAKE_CXX_FLAGS=-fno-omit-frame-pointer

  • Run the target process and retrieve the PID: eg. ps -eo pid,command | grep ocs2_anymal_croc_mpc_node | grep -v grep

  • For recording perf.data: sudo perf record -g -p PID

  • For gathering performance counter statistics: sudo perf stat -d -p PID

  • Stop recording with Ctrl-C. The data is written to perf.data or stderr respectively.

  • Analyze perf.data with: sudo perf report -g ‘graph,0.5,caller -i perf.data<tt>

  • Alternative:rosrun ocs2_benchmark run_perf.py stat timeout=60 output=stat.csv`:

    • automatically finds and attaches to process

    • runs for a given timeout

Generate a flamegraph from perf.data:

  • git clone https://github.com/brendangregg/FlameGraph.git

  • cd FlameGraph

  • perf script --max-stack=20 -i path/to/perf.data | ./stackcollapse-perf.pl | ./flamegraph.pl > flame.svg

  • open the interactive SVG in a web browser

References:

Valgrind

Installation:

sudo apt-get install valgrind

Massif: A Heap Profiler

Valgrind Massif profiles memory usage for snapshots, which are taken at regular time intervals. It allows analyzing memory usage down to individual functions and lines if debug symbols are available.

Usage:

  • Launch with launch-prefix="valgrind --tool=massif"

  • retrieve ~/.ros/massif.out.PID

  • ms_print massif.out.PID | less -S

massif-visualizer GUI:

  • sudo apt-get install massif-visualizer

  • massif-visualizer massif.out.PID

References:

Cachegrind: A Cache and Branch-prediction Profiler

Usage:

  • Compile with debug info: -DCMAKE_BUILD_TYPE=RelWithDebInfo

  • Launch with launch-prefix="valgrind --tool=cachegrind"

  • retrieve ~/.ros/cachegrind.out.PID

  • cg_annotate cachegrind.out.PID | less -S

References: