Predicting depth is an essential component in understanding the 3D geometry of a scene. While for stereo images local correspondences suffice for estimation, finding depth relations from a single image requires integration of both global and local information. We address this by employing two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally. Our method achieves state-of-the-art results on both NYU Depth and KITTI single-image depth prediction, and matches detailed depth boundaries without the need for superpixelation.
Please also see our newer work predicting depth, surface normals, and semantic labels.
|NIPS 2014 Paper:|