Modules¶

MLP¶

class rsl_rl.modules.mlp.MLP[source]¶

Multi-Layer Perceptron.

The MLP network is a sequence of linear layers and activation functions. The last layer is a linear layer that outputs the desired dimension unless the last activation function is specified.

It provides additional conveniences: - If the hidden dimensions have a value of -1, the dimension is inferred from the input dimension. - If the output dimension is a tuple, the output is reshaped to the desired shape.

__init__(input_dim, output_dim, hidden_dims, activation='elu', last_activation=None)[source]¶

Initialize the MLP.

Parameters:

input_dim (int) – Dimension of the input.
output_dim (int | tuple[int, ...] | list[int]) – Dimension of the output.
hidden_dims (tuple[int, ...] | list[int]) – Dimensions of the hidden layers. A value of -1 indicates that the dimension should be inferred from the input dimension.
activation (str) – Activation function.
last_activation (str | None) – Activation function of the last layer. None results in a linear last layer.

Return type:

None

init_weights(scales)[source]¶

Initialize the weights of the MLP.

Parameters:: scales (float | tuple[float]) – Scale factor for the weights.
Return type:: None

forward(x)[source]¶

Forward pass of the MLP.

Parameters:: x (torch.Tensor)
Return type:: torch.Tensor

RNN¶

rsl_rl.modules.rnn.HiddenState¶

Type alias for the hidden state of RNNs (GRU/LSTM).

For GRUs, this is a single tensor while for LSTMs, this is a tuple of two tensors (hidden state and cell state).

alias of Tensor | tuple[Tensor, Tensor] | None

class rsl_rl.modules.rnn.RNN[source]¶

Recurrent Neural Network.

This network is used to store the hidden state of the policy. It currently supports GRU and LSTM.

__init__(input_size, hidden_dim=256, num_layers=1, type='lstm')[source]¶

Initialize a GRU or LSTM module with internal hidden-state storage.

Parameters:

input_size (int)
hidden_dim (int)
num_layers (int)
type (str)

Return type:

None

forward(input, masks=None, hidden_state=None)[source]¶

Run recurrent inference in rollout mode or batched update mode.

Parameters:

input (torch.Tensor)
masks (torch.Tensor | None)
hidden_state (HiddenState)

Return type:

torch.Tensor

reset(dones=None, hidden_state=None)[source]¶

Reset hidden states for all or done environments.

Parameters:

dones (torch.Tensor | None)
hidden_state (HiddenState)

Return type:

None

detach_hidden_state(dones=None)[source]¶

Detach hidden states for all or done environments from the computation graph.

Parameters:: dones (torch.Tensor | None)
Return type:: None

CNN¶

class rsl_rl.modules.cnn.CNN[source]¶

Convolutional Neural Network.

The CNN network is a sequence of convolutional layers, optional normalization layers, optional activation functions, and optional pooling. The final output can be flattened.

__init__(input_dim, input_channels, output_channels, kernel_size, stride=1, dilation=1, padding='none', norm='none', activation='elu', max_pool=False, global_pool='none', flatten=True)[source]¶

Initialize the CNN.

Parameters:

input_dim (tuple[int, int]) – Height and width of the input.
input_channels (int) – Number of input channels.
output_channels (tuple[int, ...] | list[int]) – List of output channels for each convolutional layer.
kernel_size (int | tuple[int, ...] | list[int]) – List of kernel sizes for each convolutional layer or a single kernel size for all layers.
stride (int | tuple[int, ...] | list[int]) – List of strides for each convolutional layer or a single stride for all layers.
dilation (int | tuple[int, ...] | list[int]) – List of dilations for each convolutional layer or a single dilation for all layers.
padding (str) – Padding type to use. Either ‘none’, ‘zeros’, ‘reflect’, ‘replicate’, or ‘circular’.
norm (str | tuple[str] | list[str]) – List of normalization types for each convolutional layer or a single type for all layers. Either ‘none’, ‘batch’, or ‘layer’.
activation (str) – Activation function to use.
max_pool (bool | tuple[bool] | list[bool]) – List of booleans indicating whether to apply max pooling after each convolutional layer or a single boolean for all layers.
global_pool (str) – Global pooling type to apply at the end. Either ‘none’, ‘max’, or ‘avg’.
flatten (bool) – Whether to flatten the output tensor.

Return type:

None

property output_channels: int | None¶: Get the number of output channels or None if output is flattened.

property output_dim: tuple[int, int] | int¶: Get the output height and width or total output dimension if output is flattened.

init_weights()[source]¶

Initialize the weights of the CNN with Kaiming initialization.

Return type:: None

forward(x)[source]¶

Forward pass of the CNN.

Parameters:: x (torch.Tensor)
Return type:: torch.Tensor

Normalization¶

class rsl_rl.modules.normalization.EmpiricalNormalization[source]¶

Normalize mean and variance of values based on empirical values.

__init__(shape, eps=0.01, until=None)[source]¶

Initialize EmpiricalNormalization module.

Note

The normalization parameters are computed over the whole batch, not for each environment separately.

Parameters:

shape (int | tuple[int, ...] | list[int]) – Shape of input values except batch axis.
eps (float) – Small value for stability.
until (int | None) – If this arg is specified, the module learns input values until the sum of batch sizes exceeds it.

Return type:

None

property mean: torch.Tensor¶: Return the current running mean.

property std: torch.Tensor¶: Return the current running standard deviation.

forward(x)[source]¶

Normalize mean and variance of values based on empirical values.

Parameters:: x (torch.Tensor)
Return type:: torch.Tensor

update(x)¶

Learn input values without computing the output values of them.

Parameters:: x (torch.Tensor)
Return type:: None

inverse(y)¶

De-normalize values based on empirical values.

Parameters:: y (torch.Tensor)
Return type:: torch.Tensor

class rsl_rl.modules.normalization.EmpiricalDiscountedVariationNormalization[source]¶

Reward normalization from Pathak’s large scale study on PPO.

Reward normalization. Since the reward function is non-stationary, it is useful to normalize the scale of the rewards so that the value function can learn quickly. We did this by dividing the rewards by a running estimate of the standard deviation of the sum of discounted rewards.

__init__(shape, eps=0.01, gamma=0.99, until=None)[source]¶

Initialize discounted-reward normalization with running moments.

Parameters:

shape (int | tuple[int, ...] | list[int])
eps (float)
gamma (float)
until (int | None)

Return type:

None

forward(rew)[source]¶

Normalize rewards using the running std of discounted returns.

Parameters:: rew (torch.Tensor)
Return type:: torch.Tensor

Distribution¶

class rsl_rl.modules.distribution.Distribution[source]¶

Base class for distribution modules.

Distribution modules encapsulate the stochastic output of a neural model. They define the output structure expected from the MLP, manage learnable distribution parameters, and provide methods for sampling, log probability computation, and entropy calculation.

Subclasses must implement all abstract methods and properties to define a specific distribution type.

__init__(output_dim)[source]¶

Initialize the distribution module.

Parameters:: output_dim (int) – Dimension of the action/output space.
Return type:: None

update(mlp_output)[source]¶

Update the distribution parameters given the MLP output.

Parameters:: mlp_output (torch.Tensor) – Raw output from the MLP.
Return type:: None

sample()[source]¶

Sample from the distribution.

Returns:: Sampled values.
Return type:: torch.Tensor

deterministic_output(mlp_output)[source]¶

Extract the deterministic (mean) output from the raw MLP output.

Parameters:: mlp_output (torch.Tensor) – Raw output from the MLP.
Returns:: The deterministic output (typically the distribution mean).
Return type:: torch.Tensor

as_deterministic_output_module()[source]¶

Return an export-friendly module that extracts the deterministic output from the MLP output.

Return type:: torch.nn.Module

property input_dim: int | list[int]¶: Return the input dimension required by the distribution.

property mean: torch.Tensor¶: Return the mean of the distribution.

property std: torch.Tensor¶: Return the standard deviation (or spread measure) of the distribution.

property entropy: torch.Tensor¶: Return the entropy of the distribution, summed over the last dimension.

property params: tuple[torch.Tensor, ...]¶

Return the distribution parameters as a tuple of tensors.

These are the distribution-specific parameters needed to reconstruct the distribution (e.g., mean and std for Gaussian, alpha and beta for Beta). They are stored during rollouts and used for KL divergence computation.

log_prob(outputs)[source]¶

Compute the log probability of the given outputs, summed over the last dimension.

Parameters:: outputs (torch.Tensor) – Values to compute the log probability for.
Returns:: Log probability summed over the last dimension.
Return type:: torch.Tensor

kl_divergence(old_params, new_params)[source]¶

Compute the KL divergence KL(old || new) between two distributions of this type.

The KL divergence measures how the old distribution diverges from the new distribution. This is used for adaptive learning rate scheduling in policy optimization.

Parameters:

old_params (tuple[torch.Tensor, ...]) – Parameters of the old distribution (as returned by params).
new_params (tuple[torch.Tensor, ...]) – Parameters of the new distribution (as returned by params).

Returns:

KL divergence summed over the last dimension.

Return type:

torch.Tensor

init_mlp_weights(mlp)[source]¶

Initialize distribution-specific weights in the MLP.

This is called after MLP creation to set up any special weight initialization required by the distribution (e.g., initializing std head weights).

Parameters:: mlp (torch.nn.Module) – The MLP module whose weights may need initialization.
Return type:: None

class rsl_rl.modules.distribution.GaussianDistribution[source]¶

Gaussian distribution module with state-independent standard deviation.

This distribution parameterizes stochastic outputs using a multivariate Gaussian with diagonal covariance. The standard deviation can be a learnable parameter or a constant. It can be parameterized in either “scalar” space or “log” space and is clamped to a specified range.

Note

If the standard deviation type is set to “log”, the provided arguments are still interpreted in scalar space, and converted to log space internally.

__init__(output_dim, init_std=1.0, std_range=(1e-06, 1000000.0), std_type='scalar', learn_std=True)[source]¶

Initialize the Gaussian distribution module.

Parameters:

output_dim (int) – Dimension of the action/output space.
init_std (float) – Initial standard deviation.
std_range (tuple[float, float]) – Range for the standard deviation. Should be a tuple of (min, max) values for clamping.
std_type (str) – Parameterization of the standard deviation: “scalar” or “log”.
learn_std (bool) – Whether the standard deviation should be learnable. If False, it will be fixed to init_std.

Return type:

None

update(mlp_output)[source]¶

Update the Gaussian distribution from MLP output.

Parameters:: mlp_output (torch.Tensor)
Return type:: None

sample()[source]¶

Sample from the Gaussian distribution.

Return type:: torch.Tensor

deterministic_output(mlp_output)[source]¶

Extract the mean from the MLP output.

Parameters:: mlp_output (torch.Tensor)
Return type:: torch.Tensor

as_deterministic_output_module()[source]¶

Return an export-friendly module that extracts the mean from the MLP output.

Return type:: torch.nn.Module

property input_dim: int¶: Return the input dimension required by the distribution.

property mean: torch.Tensor¶: Return the mean of the Gaussian distribution.

property std: torch.Tensor¶: Return the standard deviation of the Gaussian distribution.

property entropy: torch.Tensor¶: Return the entropy of the Gaussian distribution, summed over the last dimension.

property params: tuple[torch.Tensor, ...]¶: Return (mean, std) of the current Gaussian distribution.

log_prob(outputs)[source]¶

Compute the log probability under the Gaussian, summed over the last dimension.

Parameters:: outputs (torch.Tensor)
Return type:: torch.Tensor

kl_divergence(old_params, new_params)[source]¶

Compute KL(old || new) between two Gaussian distributions.

Parameters:

old_params (tuple[torch.Tensor, ...])
new_params (tuple[torch.Tensor, ...])

Return type:

torch.Tensor

class rsl_rl.modules.distribution.HeteroscedasticGaussianDistribution[source]¶

Gaussian distribution module with state-dependent standard deviation.

This distribution parameterizes stochastic outputs using a multivariate Gaussian with diagonal covariance. The standard deviation is output by the MLP alongside the mean, making it state-dependent. It can be parameterized in either “scalar” space or “log” space, and is clamped to a specified range.

Note

If the standard deviation type is set to “log”, the provided arguments are still interpreted in scalar space, and converted to log space internally.

__init__(output_dim, init_std=1.0, std_range=(1e-06, 1000000.0), std_type='scalar')[source]¶

Initialize the heteroscedastic Gaussian distribution module.

Parameters:

output_dim (int) – Dimension of the action/output space.
init_std (float) – Initial standard deviation (used to initialize the MLP’s std head bias).
std_range (tuple[float, float]) – Range for the standard deviation. Should be a tuple of (min, max) values for clamping.
std_type (str) – Parameterization of the standard deviation: “scalar” or “log”.

Return type:

None

update(mlp_output)[source]¶

Update the Gaussian distribution from MLP output.

Parameters:: mlp_output (torch.Tensor)
Return type:: None

deterministic_output(mlp_output)[source]¶

Extract the mean from the MLP output (first slice of the second-to-last dimension).

Parameters:: mlp_output (torch.Tensor)
Return type:: torch.Tensor

as_deterministic_output_module()[source]¶

Return export-friendly module that extracts the mean from the MLP output.

Return type:: torch.nn.Module

property input_dim: list[int]¶

Return the input dimension required by the distribution.

The MLP must output a tensor of shape [..., 2, output_dim] where the first slice along the second-to-last dimension is the mean and the second is the standard deviation (or log standard deviation).

init_mlp_weights(mlp)[source]¶

Initialize the std head weights in the MLP.

Parameters:: mlp (torch.nn.Module)
Return type:: None

class rsl_rl.modules.distribution.BetaDistribution[source]¶

Beta distribution module for bounded action spaces.

This distribution parameterizes stochastic outputs using a Beta distribution, which naturally constrains samples to [0, 1]. Samples are linearly rescaled to action_range, which defaults to (-1.0, 1.0).

The MLP must output a tensor of shape [..., 2, output_dim], where the first slice along the second-to-last dimension contains the raw alpha parameters and the second contains the raw beta parameters. Both are passed through Softplus + 1 to ensure they are strictly greater than 1, which guarantees a unimodal distribution.

__init__(output_dim, action_range=(-1.0, 1.0))[source]¶

Initialize the Beta distribution module.

Parameters:

output_dim (int) – Dimension of the action/output space.
action_range (tuple[float, float]) – Interval (min, max) to which Beta samples in [0, 1] are linearly rescaled. Defaults to (-1.0, 1.0).

Return type:

None

update(mlp_output)[source]¶

Update the Beta distribution from MLP output.

Parameters:: mlp_output (torch.Tensor)
Return type:: None

sample()[source]¶

Sample from the Beta distribution and rescale to action_range.

Return type:: torch.Tensor

deterministic_output(mlp_output)[source]¶

Extract the mean from the MLP output and rescale to action_range.

Parameters:: mlp_output (torch.Tensor)
Return type:: torch.Tensor

as_deterministic_output_module()[source]¶

Return export-friendly module that computes the mean from the MLP output.

Return type:: torch.nn.Module

property input_dim: list[int]¶

Return the input dimension required by the distribution.

The MLP must output a tensor of shape [..., 2, output_dim] where the first slice along the second-to-last dimension is the raw alpha parameter and the second is the raw beta parameter.

property mean: torch.Tensor¶: Return the mean of the Beta distribution, rescaled to action_range.

property std: torch.Tensor¶: Return the standard deviation of the Beta distribution, rescaled to action_range.

property entropy: torch.Tensor¶: Return the entropy of the Beta distribution, summed over the last dimension.

property params: tuple[torch.Tensor, ...]¶: Return (alpha, beta) of the current Beta distribution.

log_prob(outputs)[source]¶

Compute the log probability under the Beta distribution, summed over the last dimension.

Outputs are unscaled from action_range back to [0, 1] before computing the log probability. The Jacobian correction for the linear rescaling is included.

Parameters:: outputs (torch.Tensor)
Return type:: torch.Tensor

kl_divergence(old_params, new_params)[source]¶

Compute KL(old || new) between two Beta distributions.

Parameters:

old_params (tuple[torch.Tensor, ...])
new_params (tuple[torch.Tensor, ...])

Return type:

torch.Tensor

init_mlp_weights(mlp)[source]¶

Initialize the beta-parameter head weights to zero for a near-uniform initial distribution.

Parameters:: mlp (torch.nn.Module)
Return type:: None