Depth from RGB
Winner Score (Relative Error)
David Eigen, Christian Puhrsch and Rob Fergus .33m
Mohammad Haris Baig and Lorenzo Torresani .39m

Normals from RGB
Winner Score (Mean Abs Angular Distance)
David Eigen, Christian Puhrsch and Rob Fergus 30.5 °
Lubor Ladicky, Bernhard Zeisl and Marc Pollefeys 36.1 °

RGBD Segmentation
Winner Score (Jaccard Index)
Dan Banica and Cristian Sminchisescu .32
Saurabh Gupta, Ross Girshick, Pablo Arbelaez and Jitendra Malik .30


Understanding the 3D world is one of the fundamental challenges in computer vision. A wide variety of approaches have been developed to either reconstruct the 3D world or recognize it. However, until very recently the interactions between these two tasks were mostly ignored. This is perhaps surprising as knowing the 3D world greatly simplifies the recognition task. Furthermore, knowing that we are looking at a particular object, greatly constraints 3D reconstruction, e.g., we expect a wall to be planar.

Inspired by the great success of the PASCAL VOC challenge, we propose a set of challenges to study how reconstruction and recognition algorithms can jointly be exploited to push forward the state-of-the-art in visual perception tasks. Towards this goal, we propose a set of benchmarks that cover both outdoor scenarios in the context of autonomous driving, as well as indoor scenes for personal robotics. We take advantage of the KITTI, NYU and Sun3D datasets and extend them in a variety of ways to provide the community with a set of challenges ranging from low level to high level vision. We envision this workshop to be the first one in a series of workshops which will help push forward the performance of the field.

Towards this goal, we have created two training sets, one in the outdoor setting and one in the indoor setting, which contain labelings for all reconstruction and recognition tasks. This way, participants can exploit semantics for reconstruction and reconstruction for semantic analysis. Participants are allowed to use as many sources of information as they want in order to solve each challenge. In the outdoor scenario, we will provide stereo imagery, point clouds from a laser scanner as well as video. In the indoor case, we will provide RGB-D data captured by a set of different devices. The following table shows the tasks that compose our challenges.

Reconstruction Tasks Recognition Tasks
Stereo 2D detection
Optical Flow 2D Tracking
Visual Odometry 3D Detection

Outdoor Challenges

Reconstruction Tasks Recognition Tasks
Depth from RGB Semantic Segmentation
Normals from RGB Instance Segmentation

Indoor Challenges