Utils¶
Logger¶
- class rsl_rl.utils.logger.Logger[source]¶
Logger to save the learning metrics to different logging services.
- __init__(log_dir, cfg, env_cfg, num_envs, is_distributed, gpu_world_size, gpu_global_rank, device)[source]¶
Initialize buffers and logging state for a training run.
- Parameters:
log_dir (str | None)
cfg (dict)
env_cfg (dict | object)
num_envs (int)
is_distributed (bool)
gpu_world_size (int)
gpu_global_rank (int)
device (str)
- Return type:
None
- init_logging_writer()[source]¶
Initialize the logging writer and save the code state.
Note
The writer is constructed from
cfg["logger"], which should be a dict with a"class_name"key plus any additional constructor kwargs (seeLogWriter). The plain string aliases"wandb"and"neptune"are deprecated; use"WandbLogWriter"and"NeptuneLogWriter"in the dict form instead."tensorboard"(the default) is still accepted as a plain string.- Return type:
None
- process_env_step(rewards, dones, extras, intrinsic_rewards=None)[source]¶
Add metrics from the environment step to the buffers.
- Parameters:
rewards (torch.Tensor)
dones (torch.Tensor)
extras (dict)
intrinsic_rewards (torch.Tensor | None)
- Return type:
None
- log(it, start_it, total_it, collect_time, learn_time, loss_dict, learning_rate, action_std, rnd_weight, print_minimal=False, width=80, pad=40)[source]¶
Log the training metrics to the logging service and print them to the console.
If videos are available, they are uploaded to the logging service (W&B) as well.
- Parameters:
it (int)
start_it (int)
total_it (int)
collect_time (float)
learn_time (float)
loss_dict (dict)
learning_rate (float)
action_std (torch.Tensor)
rnd_weight (float | None)
print_minimal (bool)
width (int)
pad (int)
- Return type:
None
Log Writer¶
- class rsl_rl.utils.log_writer.LogWriter[source]¶
Abstract base class for logging backends.
Log writers are configured via
cfg["logger"], a dict with a"class_name"key pointing to the subclass and any additional keys forwarded as constructor kwargs. The class is resolved viaresolve_callable(). Onlyadd_scalar()must be implemented; all other methods are no-ops.- abstractmethod add_scalar(tag, scalar_value, global_step)[source]¶
Log a scalar metric.
- Parameters:
tag (str) – Name of the metric.
scalar_value (float) – Value of the metric.
global_step (int) – Current training iteration.
- Return type:
None
- store_config(env_cfg, train_cfg)[source]¶
Upload environment and training configuration. Called once at training start.
- Parameters:
env_cfg (dict | object)
train_cfg (dict)
- Return type:
None
- save_model(model_path, it)[source]¶
Upload a model checkpoint.
- Parameters:
model_path (str)
it (int)
- Return type:
None
Wandb Log Writer¶
- class rsl_rl.utils.wandb_log_writer.WandbLogWriter[source]¶
Summary writer for W&B.
- __init__(log_dir, project_name)[source]¶
Initialize a W&B run for logging.
- Parameters:
log_dir (str)
project_name (str)
- Return type:
None
- add_scalar(tag, scalar_value, global_step=None, walltime=None, new_style=False)[source]¶
Log a scalar to both TensorBoard and W&B.
- Parameters:
tag (str)
scalar_value (float)
global_step (int | None)
walltime (float | None)
new_style (bool)
- Return type:
None
- store_config(env_cfg, train_cfg)[source]¶
Upload environment and training configuration to W&B.
- Parameters:
env_cfg (dict | object)
train_cfg (dict)
- Return type:
None
- save_model(model_path, it)[source]¶
Upload a model checkpoint artifact to W&B.
- Parameters:
model_path (str)
it (int)
- Return type:
None
- save_file(path)[source]¶
Upload an arbitrary file artifact to W&B.
- Parameters:
path (str)
- Return type:
None
Neptune Log Writer¶
- class rsl_rl.utils.neptune_log_writer.NeptuneLogWriter[source]¶
Summary writer for Neptune.
- __init__(log_dir, project_name)[source]¶
Initialize a Neptune run for logging.
- Parameters:
log_dir (str)
project_name (str)
- Return type:
None
- add_scalar(tag, scalar_value, global_step=None, walltime=None, new_style=False)[source]¶
Log a scalar to both TensorBoard and Neptune.
- Parameters:
tag (str)
scalar_value (float)
global_step (int | None)
walltime (float | None)
new_style (bool)
- Return type:
None
- store_config(env_cfg, train_cfg)[source]¶
Upload environment and training configuration to Neptune.
- Parameters:
env_cfg (dict | object)
train_cfg (dict)
- Return type:
None
- save_model(model_path, it)[source]¶
Upload a model checkpoint artifact to Neptune.
- Parameters:
model_path (str)
it (int)
- Return type:
None
Utils¶
- rsl_rl.utils.utils.get_param(param, idx)[source]¶
Get a parameter for the given index.
- Parameters:
param (Any) – Parameter or list/tuple of parameters.
idx (int) – Index to get the parameter for.
- Return type:
Any
- rsl_rl.utils.utils.resolve_nn_activation(act_name)[source]¶
Resolve the activation function from the name.
Valid activation function names are:
"elu","selu","relu","crelu","lrelu","tanh","sigmoid","softplus","gelu","swish","mish","identity".- Parameters:
act_name (str) – Name of the activation function.
- Returns:
The activation function.
- Raises:
ValueError – If the activation function is not found.
- Return type:
torch.nn.Module
- rsl_rl.utils.utils.resolve_optimizer(optimizer_name)[source]¶
Resolve the optimizer from the name.
Valid optimizer names are:
"adam","adamw","sgd","rmsprop".- Parameters:
optimizer_name (str) – Name of the optimizer.
- Returns:
The optimizer.
- Raises:
ValueError – If the optimizer is not found.
- Return type:
torch.optim.Optimizer
- rsl_rl.utils.utils.resolve_callable(callable_or_name)[source]¶
Resolve a callable from a string, type, or return callable directly.
This function supports resolving callables from a direct callable input or from a string in one of these formats:
Direct callable: pass a type or function directly (for example,
MyClassormy_func).Qualified name with colon:
"module.path:Attr.Nested"(explicit, recommended).Qualified name with dot:
"module.path.ClassName"(implicit).Simple name: for example
"PPO"or"ActorCritic"(searched withinrsl_rl).
- Parameters:
callable_or_name (type | Callable | str) – A callable (type/function) or string name.
- Returns:
The resolved callable.
- Raises:
TypeError – If input is neither a callable nor a string.
ImportError – If the module cannot be imported.
AttributeError – If the attribute cannot be found in the module.
ValueError – If a simple name cannot be found in rsl_rl packages.
- Return type:
Callable
- rsl_rl.utils.utils.resolve_obs_groups(obs, obs_groups, default_sets)[source]¶
Validate the observation configuration and resolve missing observation sets.
The input is an observation dictionary obs containing observation groups and a configuration dictionary obs_groups where the keys are the observation sets and the values are lists of observation groups.
The configuration dictionary could for example look like:
{ "actor": ["group_1", "group_2"], "critic": ["group_1", "group_3"], }
This means that the ‘actor’ observation set will contain the observations “group_1” and “group_2” and the ‘critic’ observation set will contain the observations “group_1” and “group_3”. This function will check that all the observations in the ‘actor’ and ‘critic’ observation sets are present in the observation dictionary from the environment.
Additionally, if one of the default_sets, e.g. “critic”, is not present in the configuration dictionary, this function will:
Check if a group with the same name exists in the observations and assign this group to the observation set.
If 1. fails, it will assign the ‘policy’ observation group to the missing observation set.
If 2. fails, an error is raised.
- Parameters:
obs (tensordict.TensorDict) – Observations from the environment in the form of a dictionary.
obs_groups (dict[str, list[str]]) – Dictionary mapping observation sets to lists of observation groups.
default_sets (list[str]) – Default observation set names used by the algorithm. If not provided in
obs_groups, a default behavior gets triggered.
- Returns:
The resolved observation groups.
- Raises:
ValueError – If any observation set is an empty list.
ValueError – If any observation set contains an observation term that is not present in the observations.
ValueError – If a default observation set cannot be resolved according to the rules above.
- Return type:
dict[str, list[str]]
- rsl_rl.utils.utils.check_nan(obs, rewards, dones)[source]¶
Raise
ValueErrorif any environment output contains NaN.- Parameters:
obs (tensordict.TensorDict)
rewards (torch.Tensor)
dones (torch.Tensor)
- Return type:
None
- rsl_rl.utils.utils.compile_model(model, mode=None)[source]¶
Wrap a model with
torch.compile(), validating the compile mode.- Parameters:
model (torch.nn.Module) – The model to compile.
mode (str | None) – The
torch.compile()mode. CUDA-graph modes ("reduce-overhead","max-autotune") are rejectedreplay (because they are incompatible with the multi-model forward patterns used by the algorithms (graph)
instead. (overwrites the previous call's output buffer). Use "default" or "max-autotune-no-cudagraphs")
None (Defaults to)
disabled. (in which case compilation is)
- Returns:
The compiled model, or the original model if
modeisNone.- Raises:
ValueError – If
modeis one of the unsupported CUDA-graph modes.- Return type:
torch.nn.Module
- rsl_rl.utils.utils.split_and_pad_trajectories(tensor, dones)[source]¶
Split trajectories at done indices.
Split trajectories, concatenate them and pad with zeros up to the length of the longest trajectory. Return masks corresponding to valid parts of the trajectories.
- Example (transposed for readability):
- Input: [[a1, a2, a3, a4 | a5, a6],
[b1, b2 | b3, b4, b5 | b6]]
- Output:[[a1, a2, a3, a4], | [[True, True, True, True],
[a5, a6, 0, 0], | [True, True, False, False], [b1, b2, 0, 0], | [True, True, False, False], [b3, b4, b5, 0], | [True, True, True, False], [b6, 0, 0, 0]] | [True, False, False, False]]
Assumes that the input has the following order of dimensions: [time, number of envs, additional dimensions]
- Parameters:
tensor (torch.Tensor | TensorDict)
dones (torch.Tensor)
- Return type:
tuple[torch.Tensor | TensorDict, torch.Tensor]