Learning Long-Range Vision for an Offroad Robot
Candidate: Raia Hadsell
Advisor: Yann LeCunn

Abstract

Teaching a robot to perceive and navigate in an unstructured natural world is a difficult task. Without learning, navigation systems are short-range and extremely limited. With learning, the robot can be taught to classify terrain at longer distances, but these classifiers can be fragile as well, leading to extremely conservative planning. A robust, high-level learning-based perception system for a mobile robot needs to continually learn and adapt as it explores new environments. To do this, a strong feature representation is necessary that can encode meaningful, discriminative patterns as well as invariance to irrelevant transformations. A simple realtime classifier can then be trained on those features to predict the traversability of the current terrain.

One such method for learning a feature representation is discussed in detail in this work. Dimensionality reduction by learning an invariant mapping (DrLIM) is a weakly supervised method for learning a similarity measure over a domain. Given a set of training samples and their pairwise relationships, which can be arbitrarily defined, DrLIM can be used to learn a function that is invariant to complex transformations of the inputs such as shape distortion and rotation.

The main contribution of this work is a self-supervised learning process for long-range vision that is able to accurately classify complex terrain, permitting improved strategic planning. As a mobile robot moves through offroad environments, it learns traversability from a stereo obstacle detector. The learning architecture is composed of a static feature extractor, trained offline for a general yet discriminative feature representation, and an adaptive online classifier. This architecture reduces the effect of concept drift by allowing the online classifier to quickly adapt to very few training samples without overtraining. After experiments with several different learned feature extractors, we conclude that unsupervised or weakly supervised learning methods are necessary for training general feature representations for natural scenes.

The process was developed and tested on the LAGR mobile robot as part of a fully autonomous vision-based navigation system.