Modules

MLP

class rsl_rl.modules.mlp.MLP[source]

Multi-Layer Perceptron.

The MLP network is a sequence of linear layers and activation functions. The last layer is a linear layer that outputs the desired dimension unless the last activation function is specified.

It provides additional conveniences: - If the hidden dimensions have a value of -1, the dimension is inferred from the input dimension. - If the output dimension is a tuple, the output is reshaped to the desired shape.

__init__(input_dim, output_dim, hidden_dims, activation='elu', last_activation=None)[source]

Initialize the MLP.

Parameters:
  • input_dim (int) – Dimension of the input.

  • output_dim (int | tuple[int, ...] | list[int]) – Dimension of the output.

  • hidden_dims (tuple[int, ...] | list[int]) – Dimensions of the hidden layers. A value of -1 indicates that the dimension should be inferred from the input dimension.

  • activation (str) – Activation function.

  • last_activation (str | None) – Activation function of the last layer. None results in a linear last layer.

Return type:

None

init_weights(scales)[source]

Initialize the weights of the MLP.

Parameters:

scales (float | tuple[float]) – Scale factor for the weights.

Return type:

None

forward(x)[source]

Forward pass of the MLP.

Parameters:

x (torch.Tensor)

Return type:

torch.Tensor

RNN

rsl_rl.modules.rnn.HiddenState

Type alias for the hidden state of RNNs (GRU/LSTM).

For GRUs, this is a single tensor while for LSTMs, this is a tuple of two tensors (hidden state and cell state).

alias of Tensor | tuple[Tensor, Tensor] | None

class rsl_rl.modules.rnn.RNN[source]

Recurrent Neural Network.

This network is used to store the hidden state of the policy. It currently supports GRU and LSTM.

__init__(input_size, hidden_dim=256, num_layers=1, type='lstm')[source]

Initialize a GRU or LSTM module with internal hidden-state storage.

Parameters:
  • input_size (int)

  • hidden_dim (int)

  • num_layers (int)

  • type (str)

Return type:

None

forward(input, masks=None, hidden_state=None)[source]

Run recurrent inference in rollout mode or batched update mode.

Parameters:
  • input (torch.Tensor)

  • masks (torch.Tensor | None)

  • hidden_state (HiddenState)

Return type:

torch.Tensor

reset(dones=None, hidden_state=None)[source]

Reset hidden states for all or done environments.

Parameters:
  • dones (torch.Tensor | None)

  • hidden_state (HiddenState)

Return type:

None

detach_hidden_state(dones=None)[source]

Detach hidden states for all or done environments from the computation graph.

Parameters:

dones (torch.Tensor | None)

Return type:

None

CNN

class rsl_rl.modules.cnn.CNN[source]

Convolutional Neural Network.

The CNN network is a sequence of convolutional layers, optional normalization layers, optional activation functions, and optional pooling. The final output can be flattened.

__init__(input_dim, input_channels, output_channels, kernel_size, stride=1, dilation=1, padding='none', norm='none', activation='elu', max_pool=False, global_pool='none', flatten=True)[source]

Initialize the CNN.

Parameters:
  • input_dim (tuple[int, int]) – Height and width of the input.

  • input_channels (int) – Number of input channels.

  • output_channels (tuple[int, ...] | list[int]) – List of output channels for each convolutional layer.

  • kernel_size (int | tuple[int, ...] | list[int]) – List of kernel sizes for each convolutional layer or a single kernel size for all layers.

  • stride (int | tuple[int, ...] | list[int]) – List of strides for each convolutional layer or a single stride for all layers.

  • dilation (int | tuple[int, ...] | list[int]) – List of dilations for each convolutional layer or a single dilation for all layers.

  • padding (str) – Padding type to use. Either ‘none’, ‘zeros’, ‘reflect’, ‘replicate’, or ‘circular’.

  • norm (str | tuple[str] | list[str]) – List of normalization types for each convolutional layer or a single type for all layers. Either ‘none’, ‘batch’, or ‘layer’.

  • activation (str) – Activation function to use.

  • max_pool (bool | tuple[bool] | list[bool]) – List of booleans indicating whether to apply max pooling after each convolutional layer or a single boolean for all layers.

  • global_pool (str) – Global pooling type to apply at the end. Either ‘none’, ‘max’, or ‘avg’.

  • flatten (bool) – Whether to flatten the output tensor.

Return type:

None

property output_channels: int | None

Get the number of output channels or None if output is flattened.

property output_dim: tuple[int, int] | int

Get the output height and width or total output dimension if output is flattened.

init_weights()[source]

Initialize the weights of the CNN with Kaiming initialization.

Return type:

None

forward(x)[source]

Forward pass of the CNN.

Parameters:

x (torch.Tensor)

Return type:

torch.Tensor

Normalization

class rsl_rl.modules.normalization.EmpiricalNormalization[source]

Normalize mean and variance of values based on empirical values.

__init__(shape, eps=0.01, until=None)[source]

Initialize EmpiricalNormalization module.

Note

The normalization parameters are computed over the whole batch, not for each environment separately.

Parameters:
  • shape (int | tuple[int, ...] | list[int]) – Shape of input values except batch axis.

  • eps (float) – Small value for stability.

  • until (int | None) – If this arg is specified, the module learns input values until the sum of batch sizes exceeds it.

Return type:

None

property mean: torch.Tensor

Return the current running mean.

property std: torch.Tensor

Return the current running standard deviation.

forward(x)[source]

Normalize mean and variance of values based on empirical values.

Parameters:

x (torch.Tensor)

Return type:

torch.Tensor

update(x)

Learn input values without computing the output values of them.

Parameters:

x (torch.Tensor)

Return type:

None

inverse(y)

De-normalize values based on empirical values.

Parameters:

y (torch.Tensor)

Return type:

torch.Tensor

class rsl_rl.modules.normalization.EmpiricalDiscountedVariationNormalization[source]

Reward normalization from Pathak’s large scale study on PPO.

Reward normalization. Since the reward function is non-stationary, it is useful to normalize the scale of the rewards so that the value function can learn quickly. We did this by dividing the rewards by a running estimate of the standard deviation of the sum of discounted rewards.

__init__(shape, eps=0.01, gamma=0.99, until=None)[source]

Initialize discounted-reward normalization with running moments.

Parameters:
  • shape (int | tuple[int, ...] | list[int])

  • eps (float)

  • gamma (float)

  • until (int | None)

Return type:

None

forward(rew)[source]

Normalize rewards using the running std of discounted returns.

Parameters:

rew (torch.Tensor)

Return type:

torch.Tensor

Distribution

class rsl_rl.modules.distribution.Distribution[source]

Base class for distribution modules.

Distribution modules encapsulate the stochastic output of a neural model. They define the output structure expected from the MLP, manage learnable distribution parameters, and provide methods for sampling, log probability computation, and entropy calculation.

Subclasses must implement all abstract methods and properties to define a specific distribution type.

__init__(output_dim)[source]

Initialize the distribution module.

Parameters:

output_dim (int) – Dimension of the action/output space.

Return type:

None

update(mlp_output)[source]

Update the distribution parameters given the MLP output.

Parameters:

mlp_output (torch.Tensor) – Raw output from the MLP.

Return type:

None

sample()[source]

Sample from the distribution.

Returns:

Sampled values.

Return type:

torch.Tensor

deterministic_output(mlp_output)[source]

Extract the deterministic (mean) output from the raw MLP output.

Parameters:

mlp_output (torch.Tensor) – Raw output from the MLP.

Returns:

The deterministic output (typically the distribution mean).

Return type:

torch.Tensor

as_deterministic_output_module()[source]

Return an export-friendly module that extracts the deterministic output from the MLP output.

Return type:

torch.nn.Module

property input_dim: int | list[int]

Return the input dimension required by the distribution.

property mean: torch.Tensor

Return the mean of the distribution.

property std: torch.Tensor

Return the standard deviation (or spread measure) of the distribution.

property entropy: torch.Tensor

Return the entropy of the distribution, summed over the last dimension.

property params: tuple[torch.Tensor, ...]

Return the distribution parameters as a tuple of tensors.

These are the distribution-specific parameters needed to reconstruct the distribution (e.g., mean and std for Gaussian, alpha and beta for Beta). They are stored during rollouts and used for KL divergence computation.

log_prob(outputs)[source]

Compute the log probability of the given outputs, summed over the last dimension.

Parameters:

outputs (torch.Tensor) – Values to compute the log probability for.

Returns:

Log probability summed over the last dimension.

Return type:

torch.Tensor

kl_divergence(old_params, new_params)[source]

Compute the KL divergence KL(old || new) between two distributions of this type.

The KL divergence measures how the old distribution diverges from the new distribution. This is used for adaptive learning rate scheduling in policy optimization.

Parameters:
  • old_params (tuple[torch.Tensor, ...]) – Parameters of the old distribution (as returned by params).

  • new_params (tuple[torch.Tensor, ...]) – Parameters of the new distribution (as returned by params).

Returns:

KL divergence summed over the last dimension.

Return type:

torch.Tensor

init_mlp_weights(mlp)[source]

Initialize distribution-specific weights in the MLP.

This is called after MLP creation to set up any special weight initialization required by the distribution (e.g., initializing std head weights).

Parameters:

mlp (torch.nn.Module) – The MLP module whose weights may need initialization.

Return type:

None

class rsl_rl.modules.distribution.GaussianDistribution[source]

Gaussian (Normal) distribution module with state-independent standard deviation.

This distribution parameterizes actions using a multivariate Gaussian with diagonal covariance. The standard deviation is a learnable parameter that is independent of the model input. It can be parameterized in either “scalar” space (directly) or “log” space.

__init__(output_dim, init_std=1.0, std_type='scalar')[source]

Initialize the Gaussian distribution module.

Parameters:
  • output_dim (int) – Dimension of the action/output space.

  • init_std (float) – Initial standard deviation.

  • std_type (str) – Parameterization of the standard deviation: “scalar” or “log”.

Return type:

None

update(mlp_output)[source]

Update the Gaussian distribution from MLP output.

Parameters:

mlp_output (torch.Tensor)

Return type:

None

sample()[source]

Sample from the Gaussian distribution.

Return type:

torch.Tensor

deterministic_output(mlp_output)[source]

Extract the mean from the MLP output.

Parameters:

mlp_output (torch.Tensor)

Return type:

torch.Tensor

as_deterministic_output_module()[source]

Return an export-friendly module that extracts the mean from the MLP output.

Return type:

torch.nn.Module

property input_dim: int

Return the input dimension required by the distribution.

property mean: torch.Tensor

Return the mean of the Gaussian distribution.

property std: torch.Tensor

Return the standard deviation of the Gaussian distribution.

property entropy: torch.Tensor

Return the entropy of the Gaussian distribution, summed over the last dimension.

property params: tuple[torch.Tensor, ...]

Return (mean, std) of the current Gaussian distribution.

log_prob(outputs)[source]

Compute the log probability under the Gaussian, summed over the last dimension.

Parameters:

outputs (torch.Tensor)

Return type:

torch.Tensor

kl_divergence(old_params, new_params)[source]

Compute KL(old || new) between two Gaussian distributions using torch.distributions.

Parameters:
  • old_params (tuple[torch.Tensor, ...])

  • new_params (tuple[torch.Tensor, ...])

Return type:

torch.Tensor

class rsl_rl.modules.distribution.HeteroscedasticGaussianDistribution[source]

Gaussian (Normal) distribution module with state-dependent standard deviation.

This distribution parameterizes actions using a multivariate Gaussian with diagonal covariance. The standard deviation is output by the MLP alongside the mean, making it state-dependent (heteroscedastic). It can be parameterized in either “scalar” space (directly) or “log” space.

__init__(output_dim, init_std=1.0, std_type='scalar')[source]

Initialize the heteroscedastic Gaussian distribution module.

Parameters:
  • output_dim (int) – Dimension of the action/output space.

  • init_std (float) – Initial standard deviation (used to initialize MLP std head bias).

  • std_type (str) – Parameterization of the standard deviation: “scalar” or “log”.

Return type:

None

update(mlp_output)[source]

Update the Gaussian distribution from MLP output.

Parameters:

mlp_output (torch.Tensor)

Return type:

None

deterministic_output(mlp_output)[source]

Extract the mean from the MLP output (first slice of the second-to-last dimension).

Parameters:

mlp_output (torch.Tensor)

Return type:

torch.Tensor

as_deterministic_output_module()[source]

Return export-friendly module that extracts the mean from the MLP output.

Return type:

torch.nn.Module

property input_dim: list[int]

Return the input dimension required by the distribution.

The MLP must output a tensor of shape [..., 2, output_dim] where the first slice along the second-to-last dimension is the mean and the second is the standard deviation (or log standard deviation).

init_mlp_weights(mlp)[source]

Initialize the std head weights in the MLP.

Parameters:

mlp (torch.nn.Module)

Return type:

None