Utils

Logger

class rsl_rl.utils.logger.Logger[source]

Logger to save the learning metrics to different logging services.

__init__(log_dir, cfg, env_cfg, num_envs, is_distributed, gpu_world_size, gpu_global_rank, device)[source]

Initialize buffers and logging state for a training run.

Parameters:
  • log_dir (str | None)

  • cfg (dict)

  • env_cfg (dict | object)

  • num_envs (int)

  • is_distributed (bool)

  • gpu_world_size (int)

  • gpu_global_rank (int)

  • device (str)

Return type:

None

init_logging_writer()[source]

Initialize the logging writer and save the code state.

Note

The writer is constructed from cfg["logger"], which should be a dict with a "class_name" key plus any additional constructor kwargs (see LogWriter). The plain string aliases "wandb" and "neptune" are deprecated; use "WandbLogWriter" and "NeptuneLogWriter" in the dict form instead. "tensorboard" (the default) is still accepted as a plain string.

Return type:

None

process_env_step(rewards, dones, extras, intrinsic_rewards=None)[source]

Add metrics from the environment step to the buffers.

Parameters:
  • rewards (torch.Tensor)

  • dones (torch.Tensor)

  • extras (dict)

  • intrinsic_rewards (torch.Tensor | None)

Return type:

None

log(it, start_it, total_it, collect_time, learn_time, loss_dict, learning_rate, action_std, rnd_weight, print_minimal=False, width=80, pad=40)[source]

Log the training metrics to the logging service and print them to the console.

If videos are available, they are uploaded to the logging service (W&B) as well.

Parameters:
  • it (int)

  • start_it (int)

  • total_it (int)

  • collect_time (float)

  • learn_time (float)

  • loss_dict (dict)

  • learning_rate (float)

  • action_std (torch.Tensor)

  • rnd_weight (float | None)

  • print_minimal (bool)

  • width (int)

  • pad (int)

Return type:

None

save_model(path, it)[source]

Save the model to external logging service if specified.

Parameters:
  • path (str)

  • it (int)

Return type:

None

stop_logging_writer()[source]

Stop the logging writer.

Return type:

None

Log Writer

class rsl_rl.utils.log_writer.LogWriter[source]

Abstract base class for logging backends.

Log writers are configured via cfg["logger"], a dict with a "class_name" key pointing to the subclass and any additional keys forwarded as constructor kwargs. The class is resolved via resolve_callable(). Only add_scalar() must be implemented; all other methods are no-ops.

abstractmethod add_scalar(tag, scalar_value, global_step)[source]

Log a scalar metric.

Parameters:
  • tag (str) – Name of the metric.

  • scalar_value (float) – Value of the metric.

  • global_step (int) – Current training iteration.

Return type:

None

store_config(env_cfg, train_cfg)[source]

Upload environment and training configuration. Called once at training start.

Parameters:
  • env_cfg (dict | object)

  • train_cfg (dict)

Return type:

None

save_model(model_path, it)[source]

Upload a model checkpoint.

Parameters:
  • model_path (str)

  • it (int)

Return type:

None

save_file(path)[source]

Upload an arbitrary file.

Parameters:

path (str)

Return type:

None

save_video(video, it)[source]

Upload a video file.

Parameters:
  • video (Path)

  • it (int)

Return type:

None

stop()[source]

Finalize and close the logging run.

Return type:

None

Wandb Log Writer

class rsl_rl.utils.wandb_log_writer.WandbLogWriter[source]

Summary writer for W&B.

__init__(log_dir, project_name)[source]

Initialize a W&B run for logging.

Parameters:
  • log_dir (str)

  • project_name (str)

Return type:

None

add_scalar(tag, scalar_value, global_step=None, walltime=None, new_style=False)[source]

Log a scalar to both TensorBoard and W&B.

Parameters:
  • tag (str)

  • scalar_value (float)

  • global_step (int | None)

  • walltime (float | None)

  • new_style (bool)

Return type:

None

store_config(env_cfg, train_cfg)[source]

Upload environment and training configuration to W&B.

Parameters:
  • env_cfg (dict | object)

  • train_cfg (dict)

Return type:

None

save_model(model_path, it)[source]

Upload a model checkpoint artifact to W&B.

Parameters:
  • model_path (str)

  • it (int)

Return type:

None

save_file(path)[source]

Upload an arbitrary file artifact to W&B.

Parameters:

path (str)

Return type:

None

save_video(video, it)[source]

Upload a video artifact once per filename to W&B.

Parameters:
  • video (Path)

  • it (int)

Return type:

None

stop()[source]

Finish the active W&B run.

Return type:

None

Neptune Log Writer

class rsl_rl.utils.neptune_log_writer.NeptuneLogWriter[source]

Summary writer for Neptune.

__init__(log_dir, project_name)[source]

Initialize a Neptune run for logging.

Parameters:
  • log_dir (str)

  • project_name (str)

Return type:

None

add_scalar(tag, scalar_value, global_step=None, walltime=None, new_style=False)[source]

Log a scalar to both TensorBoard and Neptune.

Parameters:
  • tag (str)

  • scalar_value (float)

  • global_step (int | None)

  • walltime (float | None)

  • new_style (bool)

Return type:

None

store_config(env_cfg, train_cfg)[source]

Upload environment and training configuration to Neptune.

Parameters:
  • env_cfg (dict | object)

  • train_cfg (dict)

Return type:

None

save_model(model_path, it)[source]

Upload a model checkpoint artifact to Neptune.

Parameters:
  • model_path (str)

  • it (int)

Return type:

None

save_file(path)[source]

Upload an arbitrary file artifact to Neptune.

Parameters:

path (str)

Return type:

None

stop()[source]

Finish the active Neptune run.

Return type:

None

Utils

rsl_rl.utils.utils.get_param(param, idx)[source]

Get a parameter for the given index.

Parameters:
  • param (Any) – Parameter or list/tuple of parameters.

  • idx (int) – Index to get the parameter for.

Return type:

Any

rsl_rl.utils.utils.resolve_nn_activation(act_name)[source]

Resolve the activation function from the name.

Valid activation function names are: "elu", "selu", "relu", "crelu", "lrelu", "tanh", "sigmoid", "softplus", "gelu", "swish", "mish", "identity".

Parameters:

act_name (str) – Name of the activation function.

Returns:

The activation function.

Raises:

ValueError – If the activation function is not found.

Return type:

torch.nn.Module

rsl_rl.utils.utils.resolve_optimizer(optimizer_name)[source]

Resolve the optimizer from the name.

Valid optimizer names are: "adam", "adamw", "sgd", "rmsprop".

Parameters:

optimizer_name (str) – Name of the optimizer.

Returns:

The optimizer.

Raises:

ValueError – If the optimizer is not found.

Return type:

torch.optim.Optimizer

rsl_rl.utils.utils.resolve_callable(callable_or_name)[source]

Resolve a callable from a string, type, or return callable directly.

This function supports resolving callables from a direct callable input or from a string in one of these formats:

  • Direct callable: pass a type or function directly (for example, MyClass or my_func).

  • Qualified name with colon: "module.path:Attr.Nested" (explicit, recommended).

  • Qualified name with dot: "module.path.ClassName" (implicit).

  • Simple name: for example "PPO" or "ActorCritic" (searched within rsl_rl).

Parameters:

callable_or_name (type | Callable | str) – A callable (type/function) or string name.

Returns:

The resolved callable.

Raises:
  • TypeError – If input is neither a callable nor a string.

  • ImportError – If the module cannot be imported.

  • AttributeError – If the attribute cannot be found in the module.

  • ValueError – If a simple name cannot be found in rsl_rl packages.

Return type:

Callable

rsl_rl.utils.utils.resolve_obs_groups(obs, obs_groups, default_sets)[source]

Validate the observation configuration and resolve missing observation sets.

The input is an observation dictionary obs containing observation groups and a configuration dictionary obs_groups where the keys are the observation sets and the values are lists of observation groups.

The configuration dictionary could for example look like:

{
    "actor": ["group_1", "group_2"],
    "critic": ["group_1", "group_3"],
}

This means that the ‘actor’ observation set will contain the observations “group_1” and “group_2” and the ‘critic’ observation set will contain the observations “group_1” and “group_3”. This function will check that all the observations in the ‘actor’ and ‘critic’ observation sets are present in the observation dictionary from the environment.

Additionally, if one of the default_sets, e.g. “critic”, is not present in the configuration dictionary, this function will:

  1. Check if a group with the same name exists in the observations and assign this group to the observation set.

  2. If 1. fails, it will assign the ‘policy’ observation group to the missing observation set.

  3. If 2. fails, an error is raised.

Parameters:
  • obs (tensordict.TensorDict) – Observations from the environment in the form of a dictionary.

  • obs_groups (dict[str, list[str]]) – Dictionary mapping observation sets to lists of observation groups.

  • default_sets (list[str]) – Default observation set names used by the algorithm. If not provided in obs_groups, a default behavior gets triggered.

Returns:

The resolved observation groups.

Raises:
  • ValueError – If any observation set is an empty list.

  • ValueError – If any observation set contains an observation term that is not present in the observations.

  • ValueError – If a default observation set cannot be resolved according to the rules above.

Return type:

dict[str, list[str]]

rsl_rl.utils.utils.check_nan(obs, rewards, dones)[source]

Raise ValueError if any environment output contains NaN.

Parameters:
  • obs (tensordict.TensorDict)

  • rewards (torch.Tensor)

  • dones (torch.Tensor)

Return type:

None

rsl_rl.utils.utils.compile_model(model, mode=None)[source]

Wrap a model with torch.compile(), validating the compile mode.

Parameters:
  • model (torch.nn.Module) – The model to compile.

  • mode (str | None) – The torch.compile() mode. CUDA-graph modes ("reduce-overhead", "max-autotune") are rejected

  • replay (because they are incompatible with the multi-model forward patterns used by the algorithms (graph)

  • instead. (overwrites the previous call's output buffer). Use "default" or "max-autotune-no-cudagraphs")

  • None (Defaults to)

  • disabled. (in which case compilation is)

Returns:

The compiled model, or the original model if mode is None.

Raises:

ValueError – If mode is one of the unsupported CUDA-graph modes.

Return type:

torch.nn.Module

rsl_rl.utils.utils.split_and_pad_trajectories(tensor, dones)[source]

Split trajectories at done indices.

Split trajectories, concatenate them and pad with zeros up to the length of the longest trajectory. Return masks corresponding to valid parts of the trajectories.

Example (transposed for readability):
Input: [[a1, a2, a3, a4 | a5, a6],

[b1, b2 | b3, b4, b5 | b6]]

Output:[[a1, a2, a3, a4], | [[True, True, True, True],

[a5, a6, 0, 0], | [True, True, False, False], [b1, b2, 0, 0], | [True, True, False, False], [b3, b4, b5, 0], | [True, True, True, False], [b6, 0, 0, 0]] | [True, False, False, False]]

Assumes that the input has the following order of dimensions: [time, number of envs, additional dimensions]

Parameters:
  • tensor (torch.Tensor | TensorDict)

  • dones (torch.Tensor)

Return type:

tuple[torch.Tensor | TensorDict, torch.Tensor]

rsl_rl.utils.utils.unpad_trajectories(trajectories, masks)[source]

Do the inverse operation of split_and_pad_trajectories().

Parameters:
  • trajectories (torch.Tensor | TensorDict)

  • masks (torch.Tensor)

Return type:

torch.Tensor | TensorDict