Storage¶

class rsl_rl.storage.rollout_storage.RolloutStorage[source]¶

Storage for the data collected during a rollout.

The rollout storage is populated by adding transitions during the rollout phase. It then returns a generator for learning, depending on the algorithm and the policy architecture.

class Transition[source]¶

Storage for a single state transition.

This class is populated incrementally during the rollout phase and then passed to RolloutStorage.add_transition() to record the data.

__init__()[source]¶

Initialize an empty transition container.

Return type:: None

observations: TensorDict | None¶: Observations at the current step.

actions: torch.Tensor | None¶: Actions taken at the current step.

rewards: torch.Tensor | None¶: Rewards received after the action.

dones: torch.Tensor | None¶: Done flags indicating episode termination.

values: torch.Tensor | None¶: Value estimates at the current step (RL only).

actions_log_prob: torch.Tensor | None¶: Log probability of the taken actions (RL only).

distribution_params: tuple[torch.Tensor, ...] | None¶: Parameters of the action distribution (RL only).

privileged_actions: torch.Tensor | None¶: Privileged (teacher) actions (distillation only).

hidden_states: tuple[HiddenState, HiddenState]¶: Hidden states for recurrent networks, e.g., (actor, critic).

clear()[source]¶

Reset all transition fields to None.

Return type:: None

class Batch[source]¶

A batch of data yielded by the rollout storage generators.

This class provides named access to mini-batch fields. Fields are optional to support different training modes (RL vs distillation) and architectures (feedforward vs recurrent).

__init__(observations=None, actions=None, values=None, advantages=None, returns=None, old_actions_log_prob=None, old_distribution_params=None, hidden_states=(None, None), masks=None, privileged_actions=None, dones=None)[source]¶

Initialize a batch container over rollout data.

Parameters:

observations (TensorDict | None)
actions (torch.Tensor | None)
values (torch.Tensor | None)
advantages (torch.Tensor | None)
returns (torch.Tensor | None)
old_actions_log_prob (torch.Tensor | None)
old_distribution_params (tuple[torch.Tensor, ...] | None)
hidden_states (tuple[HiddenState, HiddenState])
masks (torch.Tensor | None)
privileged_actions (torch.Tensor | None)
dones (torch.Tensor | None)

Return type:

None

observations: TensorDict | None¶: Batch of observations.

actions: torch.Tensor | None¶: Batch of actions.

values: torch.Tensor | None¶: Batch of value estimates (RL only).

advantages: torch.Tensor | None¶: Batch of advantage estimates (RL only).

returns: torch.Tensor | None¶: Batch of return targets (RL only).

old_actions_log_prob: torch.Tensor | None¶: Batch of log probabilities of the old actions (RL only).

old_distribution_params: tuple[torch.Tensor, ...] | None¶: Batch of parameters of the old action distribution (RL only).

privileged_actions: torch.Tensor | None¶: Batch of privileged (teacher) actions (distillation only).

dones: torch.Tensor | None¶: Batch of done flags (distillation only).

hidden_states: tuple[HiddenState, HiddenState]¶: Batch of hidden states for recurrent networks (RL recurrent only).

masks: torch.Tensor | None¶: Batch of trajectory masks for recurrent networks (RL recurrent only).

__init__(training_type, num_envs, num_transitions_per_env, obs, actions_shape, device='cpu')[source]¶

Allocate rollout buffers for a specific training mode and batch shape.

Parameters:

training_type (str)
num_envs (int)
num_transitions_per_env (int)
obs (tensordict.TensorDict)
actions_shape (tuple[int, ...] | list[int])
device (str)

Return type:

None

add_transition(transition)[source]¶

Add one transition to the storage at the current step index.

Parameters:: transition (Transition)
Return type:: None

clear()[source]¶

Reset the write cursor for the next rollout.

Return type:: None

generator()[source]¶

Yield per-timestep batches for distillation training.

Return type:: Generator[Batch, None, None]

mini_batch_generator(num_mini_batches, num_epochs=8)[source]¶

Yield shuffled flat mini-batches for feedforward RL updates.

Parameters:

num_mini_batches (int)
num_epochs (int)

Return type:

Generator[Batch, None, None]

recurrent_mini_batch_generator(num_mini_batches, num_epochs=8)[source]¶

Yield trajectory mini-batches with masks and recurrent hidden states.

Parameters:

num_mini_batches (int)
num_epochs (int)

Return type:

Generator[Batch, None, None]