WildOS: Open-Vocabulary Object Search in the Wild
Abstract
Autonomous navigation in complex, unstructured outdoor environments requires robots to operate over long ranges without prior maps and limited depth sensing. In such settings, relying solely on geometric frontiers for exploration is often insufficient; In such settings, the ability to reason semantically about where to go and what is safe to traverse is crucial for robust, efficient exploration. This work presents WildOS, a unified system for long-range, open-vocabulary object search that combines safe geometric exploration with semantic visual reasoning. WildOS builds a sparse navigation graph to maintain spatial memory, while utilizing a foundation-model-based vision module, ExploRFM, to score frontier nodes of the graph. ExploRFM simultaneously predicts traversability, visual frontiers, and object similarity in image space, enabling real-time, onboard semantic navigation tasks. The resulting vision-scored graph enables the robot to explore semantically meaningful directions while ensuring geometric safety. Furthermore, we introduce a particle-filter-based method for coarse localization of the open-vocabulary target query, that estimates candidate goal positions beyond the robot’s immediate depth horizon, enabling effective planning toward distant goals. Extensive closed-loop field experiments across diverse off-road and urban terrains demonstrate that WildOS enables robust navigation, significantly outperforming purely geometric and purely vision-based baselines in both efficiency and autonomy. Our results highlight the potential of vision foundation models to drive open-world robotic behaviors that are both semantically informed and geometrically grounded.
Method Overview
Frontier Annotations on GrandTour
Red regions (visual frontiers) in the image indicate candidate locations for further exploration.
Example predictions of Visual Traversability and Frontiers from ExploRFM in varied terrains.
Open-Vocabulary Object Search
Q1: Does the complete WildOS system enable successful end-to-end object search from language queries?
Key Insight
WildOS successfully integrates language grounding, vision-based localization, and geometric planning to enable end-to-end open-vocabulary object search operating in real time on a deployed robot platform.
Fence Approach: Vision-Guided Navigation
Q2: Does integrating vision-based scoring with the navigation graph improve navigation performance compared to pure-geometry approaches?
Key Insight
WildOS achieves lower average distance and time with notably smaller variance compared to baselines. Vision based scoring enables the robot to plan efficient routes around obstructions rather than heading straight toward blocked directions. This replicates human-like reasoning that prefers affordable directions over direct ones.
Dead End Recovery: Spatial Memory
Q3: Does the navigation graph improve robustness and memory compared to purely vision-based navigation?
Key Insight
Persistent spatial memory is essential for long-horizon autonomy. By maintaining a structured representation of previously explored regions and deferred frontiers, WildOS can recover from dead-ends and replan effectively, whereas memoryless vision-only strategies remain prone to oscillation and repeated failure.
Urban Navigation: Cross-Terrain Generalization
Q4: Does WildOS generalize effectively across diverse outdoor terrains?
Key Insight
WildOS exhibits strong generalization across diverse terrains — from off-road unstructured environments to urban settings — enabled by foundation-model features. The system adapts seamlessly without requiring retraining or environment-specific tuning, highlighting the potential of vision foundation models to drive open-world robotic behaviors.
Visualization Legend
Visual Outputs: Frontier and Traversability maps from ExploRFM are thresholded and shown in Jet and Inverse Jet colormaps respectively. When the full model visualization is shown (outputs from all three cameras), we overlay the full heatmaps (without thresholding) on the image using a Jet colormap for both.
Navigation Graph: Edges are shown in red, free nodes in green, and frontier nodes in blue.
Scored Graph: Frontier nodes are surrounded with a score ring indicating the score ( color according to the Jet colormap) in the goal direction.
Goal & Planning: The triangulated goal is shown as a cyan sphere, with projected particles in white. The planned high-level path is shown in green.
This color scheme is followed throughout the visualizations unless stated otherwise.
BibTeX
@article{YourPaperKey2024,
title={WildOS: Open-Vocabulary Object Search in the Wild},
author={Shah, Hardik and Tevere, Erica and Atha, Deegan and Kaufmann, Marcel and Khattak, Shehryar and Patel, Manthan and Hutter, Marco and Frey, Jonas and Spieler, Patrick},
journal={Conference/Journal Name},
year={2026},
url={https://leggedrobotics.github.io/wildos/}
}