Models¶
MLP Model¶
- class rsl_rl.models.mlp_model.MLPModel[source]¶
MLP-based neural model.
This model uses a simple multi-layer perceptron (MLP) to process 1D observation groups. Observations can be normalized before being passed to the MLP. The output of the model can be either deterministic or stochastic, in which case a distribution module is used to sample the outputs.
- is_recurrent: bool = False¶
Whether the model contains a recurrent module.
- __init__(obs, obs_groups, obs_set, output_dim, hidden_dims=(256, 256, 256), activation='elu', obs_normalization=False, distribution_cfg=None)[source]¶
Initialize the MLP-based model.
- Parameters:
obs (tensordict.TensorDict) – Observation Dictionary.
obs_groups (dict[str, list[str]]) – Dictionary mapping observation sets to lists of observation groups.
obs_set (str) – Observation set to use for this model (e.g., “actor” or “critic”).
output_dim (int) – Dimension of the output.
hidden_dims (tuple[int, ...] | list[int]) – Hidden dimensions of the MLP.
activation (str) – Activation function of the MLP.
obs_normalization (bool) – Whether to normalize the observations before feeding them to the MLP.
distribution_cfg (dict | None) – Configuration dictionary for the output distribution. If provided, the model outputs stochastic values sampled from the distribution.
- Return type:
None
- forward(obs, masks=None, hidden_state=None, stochastic_output=False)[source]¶
Forward pass of the MLP model.
- ..note::
The stochastic_output flag only has an effect if the model has a distribution (i.e.,
distribution_cfgwas provided) and defaults toFalse, meaning that even stochastic models will return deterministic outputs by default.
- Parameters:
obs (TensorDict)
masks (torch.Tensor | None)
hidden_state (HiddenState)
stochastic_output (bool)
- Return type:
torch.Tensor
- get_latent(obs, masks=None, hidden_state=None)[source]¶
Build the model latent by concatenating and normalizing selected observation groups.
- Parameters:
obs (TensorDict)
masks (torch.Tensor | None)
hidden_state (HiddenState)
- Return type:
torch.Tensor
- reset(dones=None, hidden_state=None)[source]¶
Reset the internal state for recurrent models (no-op).
- Parameters:
dones (torch.Tensor | None)
hidden_state (HiddenState)
- Return type:
None
Return the recurrent hidden state (
Nonefor MLP).- Return type:
torch.Tensor | tuple[torch.Tensor, torch.Tensor] | None
Detach therecurrent hidden state for truncated backpropagation (no-op).
- Parameters:
dones (torch.Tensor | None)
- Return type:
None
- property output_mean: torch.Tensor¶
Return the mean of the current output distribution.
- property output_std: torch.Tensor¶
Return the standard deviation of the current output distribution.
- property output_entropy: torch.Tensor¶
Return the entropy of the current output distribution.
- property output_distribution_params: tuple[torch.Tensor, ...]¶
Return raw parameters of the current output distribution.
- get_output_log_prob(outputs)[source]¶
Compute log-probabilities of outputs under the current distribution.
- Parameters:
outputs (torch.Tensor)
- Return type:
torch.Tensor
- get_kl_divergence(old_params, new_params)[source]¶
Compute KL divergence between two parameterizations of the distribution.
- Parameters:
old_params (tuple[torch.Tensor, ...])
new_params (tuple[torch.Tensor, ...])
- Return type:
torch.Tensor
- as_jit()[source]¶
Return a version of the model compatible with Torch JIT export.
- Return type:
torch.nn.Module
RNN Model¶
- class rsl_rl.models.rnn_model.RNNModel[source]¶
RNN-based neural model.
This model uses a recurrent neural network (RNN) to process 1D observation groups before passing the resulting latent to an MLP. Available RNN types are “lstm” and “gru”. Observations can be normalized before being passed to the RNN. The output of the model can be either deterministic or stochastic, in which case a distribution module is used to sample the outputs.
- is_recurrent: bool = True¶
Whether the model contains a recurrent module.
- __init__(obs, obs_groups, obs_set, output_dim, hidden_dims=(256, 256, 256), activation='elu', obs_normalization=False, distribution_cfg=None, rnn_type='lstm', rnn_hidden_dim=256, rnn_num_layers=1)[source]¶
Initialize the RNN-based model.
- Parameters:
obs (tensordict.TensorDict) – Observation Dictionary.
obs_groups (dict[str, list[str]]) – Dictionary mapping observation sets to lists of observation groups.
obs_set (str) – Observation set to use for this model (e.g., “actor” or “critic”).
output_dim (int) – Dimension of the output.
hidden_dims (tuple[int, ...] | list[int]) – Hidden dimensions of the MLP.
activation (str) – Activation function of the MLP.
obs_normalization (bool) – Whether to normalize the observations before feeding them to the MLP.
distribution_cfg (dict | None) – Configuration dictionary for the output distribution.
rnn_type (str) – Type of RNN to use (“lstm” or “gru”).
rnn_hidden_dim (int) – Dimension of the RNN hidden state.
rnn_num_layers (int) – Number of RNN layers.
- Return type:
None
- get_latent(obs, masks=None, hidden_state=None)[source]¶
Build the model latent by passing normalized observation groups through the RNN.
- Parameters:
obs (TensorDict)
masks (torch.Tensor | None)
hidden_state (HiddenState)
- Return type:
torch.Tensor
- reset(dones=None, hidden_state=None)[source]¶
Reset the recurrent hidden state of the RNN.
- Parameters:
dones (torch.Tensor | None)
hidden_state (HiddenState)
- Return type:
None
Return the recurrent hidden state of the RNN.
- Return type:
torch.Tensor | tuple[torch.Tensor, torch.Tensor] | None
Detach the recurrent hidden state for truncated backpropagation.
- Parameters:
dones (torch.Tensor | None)
- Return type:
None
CNN Model¶
- class rsl_rl.models.cnn_model.CNNModel[source]¶
CNN-based neural model.
This model uses one or more convolutional neural network (CNN) encoders to process one or more 2D observation groups before passing the resulting latent to an MLP. Any 1D observation groups are directly concatenated with the CNN latent and passed to the MLP. 1D observations can be normalized before being passed to the MLP. The output of the model can be either deterministic or stochastic, in which case a distribution module is used to sample the outputs.
- __init__(obs, obs_groups, obs_set, output_dim, hidden_dims=(256, 256, 256), activation='elu', obs_normalization=False, distribution_cfg=None, cnn_cfg=None, cnns=None)[source]¶
Initialize the CNN-based model.
- Parameters:
obs (TensorDict) – Observation Dictionary.
obs_groups (dict[str, list[str]]) – Dictionary mapping observation sets to lists of observation groups.
obs_set (str) – Observation set to use for this model (e.g., “actor” or “critic”).
output_dim (int) – Dimension of the output.
hidden_dims (tuple[int, ...] | list[int]) – Hidden dimensions of the MLP.
activation (str) – Activation function of the CNN and MLP.
obs_normalization (bool) – Whether to normalize the observations before feeding them to the MLP.
distribution_cfg (dict | None) – Configuration dictionary for the output distribution.
cnn_cfg (dict[str, dict] | dict[str, Any] | None) – Configuration of the CNN encoder(s).
cnns (nn.ModuleDict | dict[str, nn.Module] | None) – CNN modules to use, e.g., for sharing CNNs between actor and critic. If None, new CNNs are created.
- Return type:
None
- get_latent(obs, masks=None, hidden_state=None)[source]¶
Build the model latent by combining normalized 1D and CNN-encoded 2D observation groups.
- Parameters:
obs (TensorDict)
masks (torch.Tensor | None)
hidden_state (HiddenState)
- Return type:
torch.Tensor