D3: Divide, Discover, Deploy - Factorized Skill Learning

Abstract

Unsupervised Skill Discovery (USD) allows agents to autonomously learn diverse behaviors without task-specific rewards. While recent USD methods have shown promise, their application to real-world robotics remains underexplored.

We propose a modular USD framework that employs user-defined factorization of the state space to learn disentangled skill representations. Different skill discovery algorithms (METRA, DIAYN) are assigned to each factor based on the desired behavior. We introduce symmetry-based inductive biases tailored to individual factors and incorporate a style factor with regularization penalties to promote safe and robust behaviors.

We evaluate our framework using a quadrupedal robot (ANYmal-D) and demonstrate zero-shot transfer of learned skills to real hardware. Our results show that factorization and symmetry lead to structured, human-interpretable behaviors, while the style factor enhances safety and diversity. The learned skills perform on par with oracle policies on downstream navigation tasks.

Overview

Our framework builds on factorized MDPs where the state space is divided into N factors (e.g., position, heading, height). Each factor is paired with a skill component and an appropriate USD algorithm: METRA for unbounded factors like position (encouraging state-space exploration), and DIAYN for bounded factors like orientation (promoting distinguishable behaviors).

Key innovations include: (1) Factor weighting to handle conflicting skills, (2) Symmetry augmentation exploiting robot morphology, (3) Style factor for safe fallback behaviors, and (4) Regularization penalties for hardware compatibility.

Hardware Deployment

Our learned skills transfer zero-shot from simulation to real hardware on the ANYmal-D quadruped. The structured skill space enables intuitive control where each skill dimension corresponds to a specific behavior. Below we showcase various skills discovered through our framework, all from a single policy conditioned on different skill variables.

Key results: • 85% reduction in unsafe contacts • 90% of oracle performance on navigation • 20x better diversity with mixed USD algorithms

Walking
Controlled by position factor

Pitching
Controlled by orientation factor

Walking + Pitching
Controlled by position + orientation factors

Rotation
Controlled by heading factor

Crouching
Controlled by height factor

Crouching + Rotation
Controlled by height + heading factors

All behaviors were learned entirely in simulation without any task-specific rewards and deployed zero-shot on real hardware. Skills are controlled by directly commanding the corresponding latent skill variables.

BibTeX

@inproceedings{cathomen2025d3, author = {Cathomen, Rafael and Mittal, Mayank and Vlastelica, Marin and Hutter, Marco}, title = {Divide, Discover, Deploy: Factorized Skill Learning with Symmetry and Style Priors}, booktitle = {Conference on Robot Learning (CoRL)}, year = {2025}, }

Divide, Discover, Deploy: Factorized Skill Learningwith Symmetry and Style Priors

Abstract

Overview

Hardware Deployment

BibTeX

Divide, Discover, Deploy: Factorized Skill Learning
with Symmetry and Style Priors