We propose a domain adaptation method, MoDA, which adapts a pretrained embodied agent to a new, noisy environment without ground-truth supervision. Map-based memory provides important contextual information for visual navigation, and exhibits unique spatial structure mainly composed of flat walls and rectangular obstacles. Our adaptation approach encourages the inherent regularities on the estimated maps to guide the agent to overcome the prevalent domain discrepancy in a novel environment. Specifically, we propose an efficient learning curriculum to handle the visual and dynamics corruptions in an online manner, self-supervised with pseudo clean maps generated by style transfer networks. Because the map-based representation provides spatial knowledge for the agent’s policy, our formulation can deploy the pretrained policy networks from simulators in a new setting. We evaluate MoDA in various practical scenarios and show that our proposed method quickly enhances the agent’s performance in downstream tasks including localization, mapping, exploration, and point-goal navigation.
MoDA suggests an integrated domain adaptation method for visual and dynamics corruptions, which fine-tunes an agent in an efficient learning curriculum. The agent collects a new map dataset to learn map style transfer networks. Given the set of ground-truth maps Dgt(grey) obtained from the noiseless simulator, the set of noisy maps Dnoisy(green) is collected by the pretrained agent deployed in a novel environment amidst visual and dynamics corruptions. We then learn two map-to-map translation networks for the egocentric map Sego(yellow) and global map Sglobal (blue) to translate the maps in Dnoisyinto the style of the maps in Dgt.
This video shows a demo of a navigation agent's exploration performance under visual and dynamics corruptions with and without adaptation. We use the Habitat simulator and unseen scenes of Gibson and Matterport3D for evaluation.
Qualitative result of mapping (top) and localization (bottom) obtained from agents observing the identical sequence of RGB observations and odometry sensor readings. The reconstructed maps (blue) are aligned on the ground-truth maps (grey), and the estimated pose trajectories (blue line) are compared to the ground-truth trajectories (red line).
@inproceedings{lee2022moda, title={MoDA: Map style transfer for self-supervised Domain Adaptation of embodied agents}, author={Lee, Eun Sun and Kim, Junho and Park, SangWon and Kim, Young Min}, booktitle={European Conference on Computer Vision}, pages={338--354}, year={2022}, organization={Springer} }