Less is More 🍋: Scalable Visual Navigation from Limited Data
Abstract
Imitation learning provides a powerful framework for goal-conditioned visual navigation in mobile robots, enabling obstacle avoidance while respecting human preferences and social norms. However, its effectiveness depends critically on the quality and diversity of training data. In this work, we show how classical geometric planners can be leveraged to generate syn- thetic trajectories that complement costly human demonstrations. We train Less is More (LiMo), a transformer-based visual navigation policy that predicts goal-conditioned SE(2) trajectories from a single RGB observation, and find that augmenting limited expert demonstrations with planner-generated supervi- sion yields substantial performance gains. Through ablations and complementary qualitative and quantitative analyses, we characterize how dataset scale and diversity affect planning performance. We demonstrate real-robot deployment and argue that robust visual navigation is enabled not by simply collecting more demonstrations, but by strategically curating diverse, high- quality datasets. Our results suggest that scalable, embodiment- specific geometric supervision is a practical path toward data- efficient visual navigation.
Dataset Curation
The use of a geometric planner enables the automatic, scalable generation of diverse expert demonstrations. While we use the real-world robot path walked during dataset collection, we also sample 10 random goals in front of the robot and use a geometric path planner to annotate paths based on co-registered elevation maps.
Policy Architecture
LiMo takes a single RGB image and a robot-centric goal pose (x,y,θ) as input. Image features are extracted with a frozen DINOv2 encoder and combined with learned positional embeddings. A transformer decoder, conditioned on the goal embedding, predicts a sequence of waypoint embeddings, which are linearly projected to N robot-centric waypoints (x,y,θ) forming the output trajectory.
Qualitative Performance and Embodied Behavior
LiMo demonstrates strong geometric understanding and embodiment-aware navigation behavior. The policy successfully plans feasible trajectories through complex environments, staircases, and rough natural terrain. By leveraging scalable, automatically-generated geometric demonstrations during training, LiMo develops a sophisticated understanding of the underlying scene geometry and embodiment, enabling it to plan trajectories that are not only geometrically feasible but also aligned with ANYmal's specific locomotion capabilities and physical constraints.
The videos below show the predictions of LiMo on GrandTour missions.
Deployment
We deploy LiMo in closed-loop on an ANYmal D quadruped robot in real-world environments not part of the training data. The policy runs on a NVIDIA Jetson Orin on-board the robot at 6 Hz, using purely vision-based inputs to generate collision-free local trajectories. A simple lookahead path follower node tracks the predicted waypoints.
Corridor Following: Navigating tight corridors with constrained lateral space
Dynamic Obstacle Avoidance: Reacting to moving obstacles in real-time
Obstacle Course Navigation: Planning non-trivial paths through cluttered environments
BibTeX
@misc{inglin2026morescalablevisualnavigation,
title={Less Is More: Scalable Visual Navigation from Limited Data},
author={Yves Inglin and Jonas Frey and Changan Chen and Marco Hutter},
year={2026},
eprint={2601.17815},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2601.17815},
}