Storage

class rsl_rl.storage.rollout_storage.RolloutStorage[source]

Storage for the data collected during a rollout.

The rollout storage is populated by adding transitions during the rollout phase. It then returns a generator for learning, depending on the algorithm and the policy architecture.

class Transition[source]

Storage for a single state transition.

This class is populated incrementally during the rollout phase and then passed to RolloutStorage.add_transition() to record the data.

__init__()[source]

Initialize an empty transition container.

Return type:

None

observations: TensorDict | None

Observations at the current step.

actions: torch.Tensor | None

Actions taken at the current step.

rewards: torch.Tensor | None

Rewards received after the action.

dones: torch.Tensor | None

Done flags indicating episode termination.

values: torch.Tensor | None

Value estimates at the current step (RL only).

actions_log_prob: torch.Tensor | None

Log probability of the taken actions (RL only).

distribution_params: tuple[torch.Tensor, ...] | None

Parameters of the action distribution (RL only).

privileged_actions: torch.Tensor | None

Privileged (teacher) actions (distillation only).

hidden_states: tuple[HiddenState, HiddenState]

Hidden states for recurrent networks, e.g., (actor, critic).

clear()[source]

Reset all transition fields to None.

Return type:

None

class Batch[source]

A batch of data yielded by the rollout storage generators.

This class provides named access to mini-batch fields. Fields are optional to support different training modes (RL vs distillation) and architectures (feedforward vs recurrent).

__init__(observations=None, actions=None, values=None, advantages=None, returns=None, old_actions_log_prob=None, old_distribution_params=None, hidden_states=(None, None), masks=None, privileged_actions=None, dones=None)[source]

Initialize a batch container over rollout data.

Parameters:
  • observations (TensorDict | None)

  • actions (torch.Tensor | None)

  • values (torch.Tensor | None)

  • advantages (torch.Tensor | None)

  • returns (torch.Tensor | None)

  • old_actions_log_prob (torch.Tensor | None)

  • old_distribution_params (tuple[torch.Tensor, ...] | None)

  • hidden_states (tuple[HiddenState, HiddenState])

  • masks (torch.Tensor | None)

  • privileged_actions (torch.Tensor | None)

  • dones (torch.Tensor | None)

Return type:

None

observations: TensorDict | None

Batch of observations.

actions: torch.Tensor | None

Batch of actions.

values: torch.Tensor | None

Batch of value estimates (RL only).

advantages: torch.Tensor | None

Batch of advantage estimates (RL only).

returns: torch.Tensor | None

Batch of return targets (RL only).

old_actions_log_prob: torch.Tensor | None

Batch of log probabilities of the old actions (RL only).

old_distribution_params: tuple[torch.Tensor, ...] | None

Batch of parameters of the old action distribution (RL only).

privileged_actions: torch.Tensor | None

Batch of privileged (teacher) actions (distillation only).

dones: torch.Tensor | None

Batch of done flags (distillation only).

hidden_states: tuple[HiddenState, HiddenState]

Batch of hidden states for recurrent networks (RL recurrent only).

masks: torch.Tensor | None

Batch of trajectory masks for recurrent networks (RL recurrent only).

__init__(training_type, num_envs, num_transitions_per_env, obs, actions_shape, device='cpu')[source]

Allocate rollout buffers for a specific training mode and batch shape.

Parameters:
  • training_type (str)

  • num_envs (int)

  • num_transitions_per_env (int)

  • obs (tensordict.TensorDict)

  • actions_shape (tuple[int, ...] | list[int])

  • device (str)

Return type:

None

add_transition(transition)[source]

Add one transition to the storage at the current step index.

Parameters:

transition (Transition)

Return type:

None

clear()[source]

Reset the write cursor for the next rollout.

Return type:

None

generator()[source]

Yield per-timestep batches for distillation training.

Return type:

Generator[Batch, None, None]

mini_batch_generator(num_mini_batches, num_epochs=8)[source]

Yield shuffled flat mini-batches for feedforward RL updates.

Parameters:
  • num_mini_batches (int)

  • num_epochs (int)

Return type:

Generator[Batch, None, None]

recurrent_mini_batch_generator(num_mini_batches, num_epochs=8)[source]

Yield trajectory mini-batches with masks and recurrent hidden states.

Parameters:
  • num_mini_batches (int)

  • num_epochs (int)

Return type:

Generator[Batch, None, None]