Environment

class rsl_rl.env.vec_env.VecEnv[source]

Abstract class for a vectorized environment.

The vectorized environment is a collection of environments that are synchronized. This means that the same type of action is applied to all environments and the same type of observation is returned from all environments.

num_envs: int

Number of environments.

num_actions: int

Number of actions.

max_episode_length: int | torch.Tensor

Maximum episode length.

The maximum episode length can be a scalar or a tensor. If it is a scalar, it is the same for all environments. If it is a tensor, it is the maximum episode length for each environment. This is useful for dynamic episode lengths.

episode_length_buf: torch.Tensor

Buffer for current episode lengths.

device: torch.device | str

Device to use.

cfg: dict | object

Configuration object.

abstractmethod get_observations()[source]

Return the current observations.

Returns:

The observations from the environment.

Return type:

tensordict.TensorDict

abstractmethod step(actions)[source]

Apply input action to the environment.

Parameters:

actions (torch.Tensor) – Input actions to apply. Shape: (num_envs, num_actions)

Returns:

Observations from the environment. rewards: Rewards from the environment. Shape: (num_envs,) dones: Done flags from the environment. Shape: (num_envs,) extras: Extra information from the environment.

Return type:

observations

Observations:

The observations TensorDict usually contains multiple observation groups. The obs_groups dictionary of the runner configuration specifies which observation groups are used for which purpose, i.e., it maps from required observation sets (e.g. actor) to lists of observation groups. The observation sets (keys of the obs_groups dictionary) currently used by rsl_rl are:

  • “actor”: Specified observation groups are used as input to the actor model.

  • “critic”: Specified observation groups are used as input to the critic model.

  • “student”: Specified observation groups are used as input to the student model.

  • “teacher”: Specified observation groups are used as input to the teacher model.

  • “rnd_state”: Specified observation groups are used as input to the RND extension.

Incomplete or incorrect configurations are handled in the resolve_obs_groups() function in rsl_rl/utils/utils.py, which provides detailed information on the expected configuration.

Extras:

The extras dictionary includes metrics such as the episode reward, episode length, etc. The following dictionary keys are used by rsl_rl:

  • “time_outs” (torch.Tensor): Timeouts for the environments. These correspond to terminations that

    happen due to time limits and not due to the environment reaching a terminal state. This is useful for environments that have a fixed episode length.

  • “log” (dict[str, float | torch.Tensor]): Additional information for logging and debugging purposes.

    The key should be a string and start with “/” for namespacing. The value can be a scalar or a tensor. If it is a tensor, the mean of the tensor is used for logging.