Configuration¶

RSL-RL is configured with a dictionary that is passed to RSL-RL’s runner during initialization. The dictionary is usually read from a YAML file or constructed from Python dataclasses, such as in Isaac Lab. It is nested to reflect the structure of the library, and follows this pattern:

The top level represents the runner configuration, which is composed of general settings and configuration dictionaries for the algorithm (e.g. PPO), as well as for the models used by the algorithm (e.g. actor and critic). The algorithm dictionary contains the parameters of the algorithm, and may contain one or more configuration dictionaries for extensions. The model dictionaries contain the parameters of the models, and may contain a configuration dictionary for a distribution.

In the following sections, we list the available settings for each configuration component, provide a minimal example configuration in YAML format, and explain how observations are configured.

Runner Configuration¶

Currently, RSL-RL implements two runner classes: OnPolicyRunner and DistillationRunner, which are configured as follows.

OnPolicyRunner¶

Key	Type	Default	Description
`obs_groups`	dict[str, list[str]]	required	Mapping from observation sets to observation groups coming from the environment. See here for more details.
`num_steps_per_env`	int	required	Number of environment steps collected per iteration.
`save_interval`	int	required	Number of iterations between checkpoints.
`run_name`	str	missing	Optional run label shown in the console output.
`check_for_nan`	bool	`True`	Whether to check for NaN values coming from the environment.
`torch_compile_mode`	str \| None	`None`	Compile mode for the PyTorch models to accelerate training. Valid values: `None`, `"default"`, `"max-autotune-no-cudagraphs"`.
`logger`	str \| dict	`"tensorboard"`	Logging writer configuration. The plain strings `"wandb"` and `"neptune"` are still accepted but deprecated.
`wandb_project`	str	–	Deprecated. Pass `project_name` inside the `logger` configuration instead.
`neptune_project`	str	–	Deprecated. Pass `project_name` inside the `logger` configuration instead.
`algorithm`	dict	required	RL algorithm configuration.
`actor`	dict	required	Actor model configuration.
`critic`	dict	required	Critic model configuration.

DistillationRunner¶

For the DistillationRunner, the actor and critic keys are simply replaced by student and teacher keys, respectively:

Key	Type	Default	Description
…	…	…	…
`student`	dict	required	Student model configuration.
`teacher`	dict	required	Teacher model configuration.

Logger¶

The logger key of the runner configuration defines the logging writer used to log training metrics and other information during training. RSL-RL supports TensorBoard, Weights & Biases, and Neptune out of the box. While TensorBoard does not require any configuration and is set using the plain string "tensorboard", the other logging backends are configured by passing a dictionary with the following keys:

Key	Type	Default	Description
`class_name`	str	required	Logger class name. Valid values: `"WandbLogWriter"`, `"NeptuneLogWriter"`.
`project_name`	str	required	Name of the project.

Algorithm Configuration¶

RSL-RL implements two algorithms, PPO and Distillation, which are configured as follows.

PPO¶

Key	Type	Default	Description
`class_name`	str	required	Algorithm class name. Valid values: `"PPO"`.
`optimizer`	str	`"adam"`	Optimizer used for policy/value updates. Valid values: see `resolve_optimizer()`.
`learning_rate`	float	`0.001`	Optimizer learning rate.
`num_learning_epochs`	int	`5`	Number of optimization epochs per iteration.
`num_mini_batches`	int	`4`	Number of mini-batches per iteration.
`schedule`	str	`"adaptive"`	Learning-rate schedule. Valid values: `"adaptive"`, `"fixed"`.
`value_loss_coef`	float	`1.0`	Coefficient for the value-function loss.
`clip_param`	float	`0.2`	PPO clipping parameter for surrogate/value clipping.
`use_clipped_value_loss`	bool	`True`	Whether to clip the value loss.
`desired_kl`	float	`0.01`	Target KL divergence used by the adaptive learning-rate schedule.
`entropy_coef`	float	`0.01`	Entropy regularization coefficient.
`gamma`	float	`0.99`	Discount factor.
`lam`	float	`0.95`	GAE lambda parameter.
`max_grad_norm`	float	`1.0`	Maximum gradient norm for gradient clipping.
`normalize_advantage_per_mini_batch`	bool	`False`	Whether to normalize advantages for each mini-batch instead of across the entire rollout.
`share_cnn_encoders`	bool	`False`	Whether to share the CNN networks between actor and critic in case the `CNNModel` is used.
`rnd_cfg`	dict \| None	`None`	Optional RND extension configuration.
`symmetry_cfg`	dict \| None	`None`	Optional symmetry extension configuration.

Distillation¶

Key	Type	Default	Description
`class_name`	str	required	Algorithm class name. Valid values: `"Distillation"`.
`optimizer`	str	`"adam"`	Optimizer used for student updates. Valid values: see `resolve_optimizer()`.
`learning_rate`	float	`1e-3`	Optimizer learning rate.
`num_learning_epochs`	int	`1`	Number of optimization epochs per iteration.
`gradient_length`	int	`15`	Gradient backpropagation length.
`max_grad_norm`	float \| None	`None`	Maximum gradient norm for gradient clipping.
`loss_type`	str	`"mse"`	Loss type. Valid values: `"mse"`, `"huber"`.

Model Configuration¶

Different algorithms use models for different purposes. For example, PPO uses an actor and a critic, while Distillation uses a student and a teacher. Even though their function might be different, they can all use the same underlying model classes. RSL-RL currently implements three different models: MLPModel, RNNModel, and CNNModel, which are configured as follows.

MLPModel¶

Key	Type	Default	Description
`class_name`	str	required	Model class name. Valid values: `"MLPModel"`.
`hidden_dims`	tuple[int] \| list[int]	`[256, 256, 256]`	Hidden dimensions of the MLP.
`activation`	str	`"elu"`	Activation function of the MLP. Valid values: see `resolve_nn_activation()`.
`obs_normalization`	bool	`False`	Whether to normalize the observations before passing them to the MLP.
`distribution_cfg`	dict \| None	`None`	Optional output distribution configuration. If provided, the model can output stochastic values sampled from the specified distribution.

RNNModel¶

The RNNModel inherits from the MLPModel and thus shares the same configuration keys as the MLPModel, with the addition of the following keys:

Key	Type	Default	Description
`class_name`	str	required	Model class name. Valid values: `"RNNModel"`.
…	…	…	…
`rnn_type`	str	`"lstm"`	Type of RNN network. Valid values: `"lstm"`, `"gru"`.
`rnn_hidden_dim`	int	`256`	Hidden dimension of the RNN.
`rnn_num_layers`	int	`1`	Number of RNN layers.

CNNModel¶

The CNNModel inherits from the MLPModel and thus shares the same configuration keys as the MLPModel, with the addition of the following keys:

Key	Type	Default	Description
`class_name`	str	required	Model class name. Valid values: `"CNNModel"`.
…	…	…	…
`cnn_cfg`	dict[str, dict] \| dict[str, Any] \| None	`None`	Configuration of the CNN encoder(s).

Instead of directly passing the CNN parameters to the CNNModel (similar to how it is done for the MLPModel and RNNModel), the parameters are grouped in a dictionary cnn_cfg. This enables passing multiple CNN configurations for different observations (e.g. different cameras). If only one CNN is needed or all CNNs have the same configuration, the dictionary may directly contain the CNN parameters. If multiple CNNs with different configurations are needed, the dictionary must contain a dictionary for each CNN configuration, with the key being the observation the configuration applies to. The CNNModel will then create CNNs based on the provided configurations. A CNN configuration includes the following parameters:

Key	Type	Default	Description
`output_channels`	tuple[int] \| list[int]	required	Output channels for each convolutional layer.
`kernel_size`	int \| tuple[int] \| list[int]	required	Kernel size for each convolutional layer or a single kernel size for all layers.
`stride`	int \| tuple[int] \| list[int]	`1`	Stride for each convolutional layer or a single stride for all layers.
`dilation`	int \| tuple[int] \| list[int]	`1`	Dilation for each convolutional layer or a single dilation for all layers.
`padding`	str	`"none"`	Padding type to use. Valid values: `"none"`, `"zeros"`, `"reflect"`, `"replicate"`, `"circular"`.
`norm`	str \| tuple[str] \| list[str]	`"none"`	Normalization type for each convolutional layer or a single normalization type for all layers. Valid values: `"none"`, `"batch"`, `"layer"`.
`activation`	str	`"elu"`	Activation function to use. Valid values: see `resolve_nn_activation()`.
`max_pool`	bool \| tuple[bool] \| list[bool]	`False`	Whether to apply max pooling after each convolutional layer or a single boolean for all layers.
`global_pool`	str	`"none"`	Global pooling type to apply at the end. Valid values: `"none"`, `"max"`, `"avg"`.
`flatten`	bool	`True`	Whether to flatten the output tensor.

Distribution Configuration¶

RSL-RL implements three distributions that enable stochastic model outputs: GaussianDistribution, HeteroscedasticGaussianDistribution with state-dependent standard deviation, and BetaDistribution for naturally bounded action spaces, which may be configured as follows.

GaussianDistribution¶

Key	Type	Default	Description
`class_name`	str	required	Distribution class name. Valid values: `"GaussianDistribution"`.
`init_std`	float	`1.0`	Initial standard deviation for all dimensions.
`std_range`	tuple[float, float]	`(1e-6, 1e6)`	Minimum and maximum allowed values for the standard deviation for numerical stability.
`std_type`	str	`"scalar"`	Whether the standard deviation is stored directly or in log-space. Valid values: `"scalar"`, `"log"`.
`learn_std`	bool	`True`	Whether the standard deviation is learnable or fixed.

HeteroscedasticGaussianDistribution¶

Key	Type	Default	Description
`class_name`	str	required	Distribution class name. Valid values: `"HeteroscedasticGaussianDistribution"`.
`init_std`	float	`1.0`	Initial standard deviation (used to initialize the std head bias).
`std_range`	tuple[float, float]	`(1e-6, 1e6)`	Minimum and maximum allowed values for the standard deviation for numerical stability.
`std_type`	str	`"scalar"`	Whether the standard deviation is stored directly or in log-space. Valid values: `"scalar"`, `"log"`.

BetaDistribution¶

Key	Type	Default	Description
`class_name`	str	required	Distribution class name. Valid values: `"BetaDistribution"`.
`action_range`	tuple[float, float]	`(-1.0, 1.0)`	Interval `(min, max)` to which samples are linearly rescaled. The Beta distribution naturally produces samples in `[0, 1]`, which are rescaled to this range.

Extension Configuration¶

RSL-RL currently features two extensions for PPO. Those are RandomNetworkDistillation and Symmetry, which may be configured as follows.

Random Network Distillation¶

Key	Type	Default	Description
`num_outputs`	int	required	Number of outputs of the RND networks.
`predictor_hidden_dims`	tuple[int] \| list[int]	required	Hidden dimensions of the RND predictor network.
`target_hidden_dims`	tuple[int] \| list[int]	required	Hidden dimensions of the RND target network.
`activation`	str	`"elu"`	Activation function for the RND networks. Valid values: see `resolve_nn_activation()`.
`state_normalization`	bool	`False`	Whether to normalize the RND state.
`reward_normalization`	bool	`False`	Whether to normalize the RND reward.
`weight`	float	`0.0`	Initial weight of the RND reward.
`weight_schedule`	dict \| None	`None`	Weight schedule for the RND reward. Valid values: see `RandomNetworkDistillation`.
`learning_rate`	float	`0.001`	Learning rate for the RND optimizer.

Symmetry Augmentation¶

Key	Type	Default	Description
`use_data_augmentation`	bool	required	Whether to add symmetric trajectories to the batch.
`data_augmentation_func`	str \| callable \| None	required	Function to generate symmetric trajectories. Resolved using `resolve_callable()`.
`use_mirror_loss`	bool	required	Whether to add a symmetry loss term to the loss function.
`mirror_loss_coeff`	float	required	Coefficient for the symmetry loss.

Example Configuration¶

While the previous sections make it seem rather complicated to set up a configuration, the required configuration to run a training with, e.g., PPO is actually quite simple. The following configuration is already sufficient:

runner:
  num_steps_per_env: 24
  obs_groups: {"actor": ["policy"], "critic": ["policy", "privileged"]}
  save_interval: 100
  algorithm:
    class_name: PPO
  actor:
    class_name: MLPModel
    distribution_cfg:
      class_name: GaussianDistribution
  critic:
    class_name: MLPModel

Observation Configuration¶

RSL-RL expects the step() method of the environment to return observations as a TensorDict. This dictionary contains one or more tensors with observation data, referred to as observation groups in RSL-RL and Isaac Lab.

The obs_groups dictionary of the runner configuration defines which observation groups are used for which purpose. Each purpose defines its own observation set, which is simply a list of observation groups. In other words, the obs_groups dictionary maps from observation sets to lists of observation groups.

As the above definition is quite abstract, let’s consider a simple example for a PPO training. The step() method of our environment might return the following observations:

obs = TensorDict(
  {
    "policy": torch.tensor([1.0, 2.0, 3.0]), # available during robot deployment
    "privileged": torch.tensor([4.0, 5.0, 6.0]), # only available during training
  }
)

Let’s assume the “policy” observation group is meant for both actor and critic. The “privileged” observation group is only available during training and therefore cannot be used by the actor model, but may still improve learning performance when passed to the critic. Thus, the obs_groups dictionary would be configured as follows:

obs_groups: {"actor": ["policy"], "critic": ["policy", "privileged"]}

With this configuration, the actor would receive the “policy” tensor as input, while the critic would receive both the “policy” and the “privileged” tensor as input.

Depending on the algorithm and extensions used, RSL-RL expects different observation sets to be present in the obs_groups dictionary. Currently, the following observation sets may be required, depending on the configuration:

Key	Description
`actor`	Observations used as input to the actor model.
`critic`	Observations used as input to the critic model.
`student`	Observations used as input to the student model.
`teacher`	Observations used as input to the teacher model.
`rnd_state`	Observations used as input to the RND extension.

Incomplete or incorrect configurations are handled in resolve_obs_groups(), which provides detailed information on how errors are resolved.