Modules¶
MLP¶
- class rsl_rl.modules.mlp.MLP[source]¶
Multi-Layer Perceptron.
The MLP network is a sequence of linear layers and activation functions. The last layer is a linear layer that outputs the desired dimension unless the last activation function is specified.
It provides additional conveniences: - If the hidden dimensions have a value of
-1, the dimension is inferred from the input dimension. - If the output dimension is a tuple, the output is reshaped to the desired shape.- __init__(input_dim, output_dim, hidden_dims, activation='elu', last_activation=None)[source]¶
Initialize the MLP.
- Parameters:
input_dim (int) – Dimension of the input.
output_dim (int | tuple[int, ...] | list[int]) – Dimension of the output.
hidden_dims (tuple[int, ...] | list[int]) – Dimensions of the hidden layers. A value of
-1indicates that the dimension should be inferred from the input dimension.activation (str) – Activation function.
last_activation (str | None) – Activation function of the last layer. None results in a linear last layer.
- Return type:
None
RNN¶
- rsl_rl.modules.rnn.HiddenState¶
Type alias for the hidden state of RNNs (GRU/LSTM).
For GRUs, this is a single tensor while for LSTMs, this is a tuple of two tensors (hidden state and cell state).
alias of
Tensor|tuple[Tensor,Tensor] |None
- class rsl_rl.modules.rnn.RNN[source]¶
Recurrent Neural Network.
This network is used to store the hidden state of the policy. It currently supports GRU and LSTM.
- __init__(input_size, hidden_dim=256, num_layers=1, type='lstm')[source]¶
Initialize a GRU or LSTM module with internal hidden-state storage.
- Parameters:
input_size (int)
hidden_dim (int)
num_layers (int)
type (str)
- Return type:
None
- forward(input, masks=None, hidden_state=None)[source]¶
Run recurrent inference in rollout mode or batched update mode.
- Parameters:
input (torch.Tensor)
masks (torch.Tensor | None)
hidden_state (HiddenState)
- Return type:
torch.Tensor
- reset(dones=None, hidden_state=None)[source]¶
Reset hidden states for all or done environments.
- Parameters:
dones (torch.Tensor | None)
hidden_state (HiddenState)
- Return type:
None
Detach hidden states for all or done environments from the computation graph.
- Parameters:
dones (torch.Tensor | None)
- Return type:
None
CNN¶
- class rsl_rl.modules.cnn.CNN[source]¶
Convolutional Neural Network.
The CNN network is a sequence of convolutional layers, optional normalization layers, optional activation functions, and optional pooling. The final output can be flattened.
- __init__(input_dim, input_channels, output_channels, kernel_size, stride=1, dilation=1, padding='none', norm='none', activation='elu', max_pool=False, global_pool='none', flatten=True)[source]¶
Initialize the CNN.
- Parameters:
input_dim (tuple[int, int]) – Height and width of the input.
input_channels (int) – Number of input channels.
output_channels (tuple[int, ...] | list[int]) – List of output channels for each convolutional layer.
kernel_size (int | tuple[int, ...] | list[int]) – List of kernel sizes for each convolutional layer or a single kernel size for all layers.
stride (int | tuple[int, ...] | list[int]) – List of strides for each convolutional layer or a single stride for all layers.
dilation (int | tuple[int, ...] | list[int]) – List of dilations for each convolutional layer or a single dilation for all layers.
padding (str) – Padding type to use. Either ‘none’, ‘zeros’, ‘reflect’, ‘replicate’, or ‘circular’.
norm (str | tuple[str] | list[str]) – List of normalization types for each convolutional layer or a single type for all layers. Either ‘none’, ‘batch’, or ‘layer’.
activation (str) – Activation function to use.
max_pool (bool | tuple[bool] | list[bool]) – List of booleans indicating whether to apply max pooling after each convolutional layer or a single boolean for all layers.
global_pool (str) – Global pooling type to apply at the end. Either ‘none’, ‘max’, or ‘avg’.
flatten (bool) – Whether to flatten the output tensor.
- Return type:
None
- property output_channels: int | None¶
Get the number of output channels or None if output is flattened.
- property output_dim: tuple[int, int] | int¶
Get the output height and width or total output dimension if output is flattened.
Normalization¶
- class rsl_rl.modules.normalization.EmpiricalNormalization[source]¶
Normalize mean and variance of values based on empirical values.
- __init__(shape, eps=0.01, until=None)[source]¶
Initialize EmpiricalNormalization module.
Note
The normalization parameters are computed over the whole batch, not for each environment separately.
- Parameters:
shape (int | tuple[int, ...] | list[int]) – Shape of input values except batch axis.
eps (float) – Small value for stability.
until (int | None) – If this arg is specified, the module learns input values until the sum of batch sizes exceeds it.
- Return type:
None
- property mean: torch.Tensor¶
Return the current running mean.
- property std: torch.Tensor¶
Return the current running standard deviation.
- forward(x)[source]¶
Normalize mean and variance of values based on empirical values.
- Parameters:
x (torch.Tensor)
- Return type:
torch.Tensor
- update(x)¶
Learn input values without computing the output values of them.
- Parameters:
x (torch.Tensor)
- Return type:
None
- inverse(y)¶
De-normalize values based on empirical values.
- Parameters:
y (torch.Tensor)
- Return type:
torch.Tensor
- class rsl_rl.modules.normalization.EmpiricalDiscountedVariationNormalization[source]¶
Reward normalization from Pathak’s large scale study on PPO.
Reward normalization. Since the reward function is non-stationary, it is useful to normalize the scale of the rewards so that the value function can learn quickly. We did this by dividing the rewards by a running estimate of the standard deviation of the sum of discounted rewards.
Distribution¶
- class rsl_rl.modules.distribution.Distribution[source]¶
Base class for distribution modules.
Distribution modules encapsulate the stochastic output of a neural model. They define the output structure expected from the MLP, manage learnable distribution parameters, and provide methods for sampling, log probability computation, and entropy calculation.
Subclasses must implement all abstract methods and properties to define a specific distribution type.
- __init__(output_dim)[source]¶
Initialize the distribution module.
- Parameters:
output_dim (int) – Dimension of the action/output space.
- Return type:
None
- update(mlp_output)[source]¶
Update the distribution parameters given the MLP output.
- Parameters:
mlp_output (torch.Tensor) – Raw output from the MLP.
- Return type:
None
- deterministic_output(mlp_output)[source]¶
Extract the deterministic (mean) output from the raw MLP output.
- Parameters:
mlp_output (torch.Tensor) – Raw output from the MLP.
- Returns:
The deterministic output (typically the distribution mean).
- Return type:
torch.Tensor
- as_deterministic_output_module()[source]¶
Return an export-friendly module that extracts the deterministic output from the MLP output.
- Return type:
torch.nn.Module
- property input_dim: int | list[int]¶
Return the input dimension required by the distribution.
- property mean: torch.Tensor¶
Return the mean of the distribution.
- property std: torch.Tensor¶
Return the standard deviation (or spread measure) of the distribution.
- property entropy: torch.Tensor¶
Return the entropy of the distribution, summed over the last dimension.
- property params: tuple[torch.Tensor, ...]¶
Return the distribution parameters as a tuple of tensors.
These are the distribution-specific parameters needed to reconstruct the distribution (e.g., mean and std for Gaussian, alpha and beta for Beta). They are stored during rollouts and used for KL divergence computation.
- log_prob(outputs)[source]¶
Compute the log probability of the given outputs, summed over the last dimension.
- Parameters:
outputs (torch.Tensor) – Values to compute the log probability for.
- Returns:
Log probability summed over the last dimension.
- Return type:
torch.Tensor
- kl_divergence(old_params, new_params)[source]¶
Compute the KL divergence KL(old || new) between two distributions of this type.
The KL divergence measures how the old distribution diverges from the new distribution. This is used for adaptive learning rate scheduling in policy optimization.
- init_mlp_weights(mlp)[source]¶
Initialize distribution-specific weights in the MLP.
This is called after MLP creation to set up any special weight initialization required by the distribution (e.g., initializing std head weights).
- Parameters:
mlp (torch.nn.Module) – The MLP module whose weights may need initialization.
- Return type:
None
- class rsl_rl.modules.distribution.GaussianDistribution[source]¶
Gaussian (Normal) distribution module with state-independent standard deviation.
This distribution parameterizes actions using a multivariate Gaussian with diagonal covariance. The standard deviation is a learnable parameter that is independent of the model input. It can be parameterized in either “scalar” space (directly) or “log” space.
- __init__(output_dim, init_std=1.0, std_type='scalar')[source]¶
Initialize the Gaussian distribution module.
- Parameters:
output_dim (int) – Dimension of the action/output space.
init_std (float) – Initial standard deviation.
std_type (str) – Parameterization of the standard deviation: “scalar” or “log”.
- Return type:
None
- update(mlp_output)[source]¶
Update the Gaussian distribution from MLP output.
- Parameters:
mlp_output (torch.Tensor)
- Return type:
None
- deterministic_output(mlp_output)[source]¶
Extract the mean from the MLP output.
- Parameters:
mlp_output (torch.Tensor)
- Return type:
torch.Tensor
- as_deterministic_output_module()[source]¶
Return an export-friendly module that extracts the mean from the MLP output.
- Return type:
torch.nn.Module
- property input_dim: int¶
Return the input dimension required by the distribution.
- property mean: torch.Tensor¶
Return the mean of the Gaussian distribution.
- property std: torch.Tensor¶
Return the standard deviation of the Gaussian distribution.
- property entropy: torch.Tensor¶
Return the entropy of the Gaussian distribution, summed over the last dimension.
- property params: tuple[torch.Tensor, ...]¶
Return (mean, std) of the current Gaussian distribution.
- class rsl_rl.modules.distribution.HeteroscedasticGaussianDistribution[source]¶
Gaussian (Normal) distribution module with state-dependent standard deviation.
This distribution parameterizes actions using a multivariate Gaussian with diagonal covariance. The standard deviation is output by the MLP alongside the mean, making it state-dependent (heteroscedastic). It can be parameterized in either “scalar” space (directly) or “log” space.
- __init__(output_dim, init_std=1.0, std_type='scalar')[source]¶
Initialize the heteroscedastic Gaussian distribution module.
- Parameters:
output_dim (int) – Dimension of the action/output space.
init_std (float) – Initial standard deviation (used to initialize MLP std head bias).
std_type (str) – Parameterization of the standard deviation: “scalar” or “log”.
- Return type:
None
- update(mlp_output)[source]¶
Update the Gaussian distribution from MLP output.
- Parameters:
mlp_output (torch.Tensor)
- Return type:
None
- deterministic_output(mlp_output)[source]¶
Extract the mean from the MLP output (first slice of the second-to-last dimension).
- Parameters:
mlp_output (torch.Tensor)
- Return type:
torch.Tensor
- as_deterministic_output_module()[source]¶
Return export-friendly module that extracts the mean from the MLP output.
- Return type:
torch.nn.Module
- property input_dim: list[int]¶
Return the input dimension required by the distribution.
The MLP must output a tensor of shape
[..., 2, output_dim]where the first slice along the second-to-last dimension is the mean and the second is the standard deviation (or log standard deviation).