Storage¶
- class rsl_rl.storage.rollout_storage.RolloutStorage[source]¶
Storage for the data collected during a rollout.
The rollout storage is populated by adding transitions during the rollout phase. It then returns a generator for learning, depending on the algorithm and the policy architecture.
- class Transition[source]¶
Storage for a single state transition.
This class is populated incrementally during the rollout phase and then passed to
RolloutStorage.add_transition()to record the data.- observations: TensorDict | None¶
Observations at the current step.
- actions: torch.Tensor | None¶
Actions taken at the current step.
- rewards: torch.Tensor | None¶
Rewards received after the action.
- dones: torch.Tensor | None¶
Done flags indicating episode termination.
- values: torch.Tensor | None¶
Value estimates at the current step (RL only).
- actions_log_prob: torch.Tensor | None¶
Log probability of the taken actions (RL only).
- distribution_params: tuple[torch.Tensor, ...] | None¶
Parameters of the action distribution (RL only).
- privileged_actions: torch.Tensor | None¶
Privileged (teacher) actions (distillation only).
Hidden states for recurrent networks, e.g., (actor, critic).
- class Batch[source]¶
A batch of data yielded by the rollout storage generators.
This class provides named access to mini-batch fields. Fields are optional to support different training modes (RL vs distillation) and architectures (feedforward vs recurrent).
- __init__(observations=None, actions=None, values=None, advantages=None, returns=None, old_actions_log_prob=None, old_distribution_params=None, hidden_states=(None, None), masks=None, privileged_actions=None, dones=None)[source]¶
Initialize a batch container over rollout data.
- Parameters:
observations (TensorDict | None)
actions (torch.Tensor | None)
values (torch.Tensor | None)
advantages (torch.Tensor | None)
returns (torch.Tensor | None)
old_actions_log_prob (torch.Tensor | None)
old_distribution_params (tuple[torch.Tensor, ...] | None)
hidden_states (tuple[HiddenState, HiddenState])
masks (torch.Tensor | None)
privileged_actions (torch.Tensor | None)
dones (torch.Tensor | None)
- Return type:
None
- observations: TensorDict | None¶
Batch of observations.
- actions: torch.Tensor | None¶
Batch of actions.
- values: torch.Tensor | None¶
Batch of value estimates (RL only).
- advantages: torch.Tensor | None¶
Batch of advantage estimates (RL only).
- returns: torch.Tensor | None¶
Batch of return targets (RL only).
- old_actions_log_prob: torch.Tensor | None¶
Batch of log probabilities of the old actions (RL only).
- old_distribution_params: tuple[torch.Tensor, ...] | None¶
Batch of parameters of the old action distribution (RL only).
- privileged_actions: torch.Tensor | None¶
Batch of privileged (teacher) actions (distillation only).
- dones: torch.Tensor | None¶
Batch of done flags (distillation only).
Batch of hidden states for recurrent networks (RL recurrent only).
- masks: torch.Tensor | None¶
Batch of trajectory masks for recurrent networks (RL recurrent only).
- __init__(training_type, num_envs, num_transitions_per_env, obs, actions_shape, device='cpu')[source]¶
Allocate rollout buffers for a specific training mode and batch shape.
- Parameters:
training_type (str)
num_envs (int)
num_transitions_per_env (int)
obs (tensordict.TensorDict)
actions_shape (tuple[int, ...] | list[int])
device (str)
- Return type:
None
- add_transition(transition)[source]¶
Add one transition to the storage at the current step index.
- Parameters:
transition (Transition)
- Return type:
None
- generator()[source]¶
Yield per-timestep batches for distillation training.
- Return type:
Generator[Batch, None, None]