VLG Group
Group Meetings
Y. LeCun's website
CS at Courant
Courant Institute

Face Detection and Pose Estimation

  • Time Period: September 2003-June 2004.
  • Participants: Margarita Osadchy (Technion), Matt Miller (NEC Labs), Yann LeCun (Courant Institute/CBLL).
  • Video: watch a video of the system in action: [AVI, 4.9MB].
  • Talks/Posters:
    • Slides: Synergistic Face Detection and Pose Estimation. Slides of a talk delivered at the Object Recognition workshop, Taormina, Sicily, October 2004. [DjVu (466KB)]; [PDF (1.1MB)].
    • Poster:Synergistic Face Detection and Pose Estimation. Poster presented at NIPS 2004, Vancouver, December 2004. [DjVu (234KB)]; [PDF (648KB)].
  • Publications:
    • [Osadchy, Miller, and LeCun, 2004] Synergistic Face Detection and Pose Estimation Proc. NIPS 2004.
    • [LeCun and Huang, 2005]. Loss Functions for Discriminative Training of Energy-Based Models. Proc. AI Stats 2005. This paper is not specifically about face detection, but about the general concept of Energy-Based Model. The loss function used for the face detector derives from this concept.
    • [Vaillant, Monrocq, LeCun, 1994]. An original approach for the localisation of objects in images, IEE Proc. on Vision, Image, and Signal Processing 1994. This is an older paper about Yann LeCun's early work on face detection using convolutional networks.


[click picture to enlarge]
click to enlarge
Everyone is detected.

We developed a novel method for real-time, simultaneous multi-view face detection and facial pose estimation. The method employs a convolutional network to map face images to points on a manifold parameterized by pose, and non-face images to points away from that manifold. This network is trained by optimizing an energy function of three variables: image, pose, and face/non-face label.

The system was trained with 53,000 grayscale images of faces, manually annotated with the pose (position, size, pitch, yaw, roll), and 53,000 images of non-faces.

We tested the resulting system, in a single configuration, on three standard data sets -- one for frontal pose, one for rotated faces, and one for profiles -- and founnd that its performance on each set is comparable to previous multi-view face detectors that can only handle one form of pose variation.

We also show experimentally that the system's accuracy on both face detection and pose estimation is improved by training for the two tasks together.

The main conceptual difference between the convolutional net approach and the popular "Viola-Jones" approach is that convolutional networks have a fixed number of feature detectors that are highly optimized through gradient-based learning, while the V-J system generates a very large number of very simple (binary) features, and uses AdaBoost to select a good subset.

Our system runs in real time (a few frames per second) on a laptop, processing each full frame independently of each other. It operates on grayscale images, and therefore does not rely on the color information.

[click picture to enlarge]
click to enlarge
The usual suspects are rounded up.


Related Work

Members of our group have used convolutional nets for object spotting in the past:
  • [Matan, Burges, LeCun, Denker 1992]: Multi-Digit Recognition Using a Space Displacement Neural Network, NIPS 4, 1992. This paper describes our early work using convolutional nets for spotting and recognizing handwritten characters in an image.
  • Back in 1989/1990, we applied convolutional nets to the detection of the ID numbers that are painted on the side of railroad cars. Sadly, this work was never published. .

Other groups have used convolutional networks for object detection.

  • Christophe Garcia and Manolis Delakis have a nice paper in the Nov 2004 issue of IEEE PAMI, describing a face detector based on convolutional nets. They also have an online demo.
  • In 1995, Steve Nowlan and John Platt used convolutional nets to do hand detection and tracking: [Steven Nowlan, John Platt 1995]: a convolutional neural network hand tracker, Proc. NIPS 7, 1995 [DjVu].
  • In 1994, Wolf and Platt used convolutional nets to locate addresses on postal envelopes: [Wolf, Platt 1994]: Postal Address Block Location Using a Convolutional Locator Network, Proc. NIPS 6, 1994. [DjVu].


If you want to experiment with convolutional nets, A full implementation of is included in the gblearn2 library distributed with the Lush language.

The Torch C++ Library by Ronan Collobert, Samy Bengio and Johnny Mariethoz also has an implementation of convolutional networks (somewhat inspired by the Lush version).

More Pics

somewhat unusual suspects.

Seriously deviant stuff:

Superdupont by Lob, Gotlib and Sole Salammbo by Philippe Druillet

Valerian et Laureline by Christin and Meziere Valerian et Laureline by Christin and Meziere Valerian et Laureline by Christin and Meziere