Participants: Graham Taylor, Rob Fergus, Chris Bregler, Yann LeCun (Courant Institute/CBLL).
Sponsors: DARPA, ONR.
Description: A trainable system was built to recognize actions in videos.
The first layer is a Convolutional Gated Restricted Boltzmann Machine, which is trained
in an unsupervised manner. It automatically learns features that primarily encode motion.
The second layer uses sparse coding to learn mid-level features in an unspervised manner.
The feature vectors thereby obtained are pooled over time, using a max-pooling operation,
and fed to a Support vector Machine. Excellent performance was obtained on the Hollywood-2
dataset. A similar system was built to recognize actions on the KTH dataset. It also
uses a CGRBM at the first layer, but uses a 3D (spatio-temporal) convolutional network
architecture for the following layers.
Latest Video
Watch the real-time demo of our action recognition system (August 2010):
Publications
149.
W. Taylor, Graham, Rob Fergus, Yann LeCun and Christoph Bregler: Convolutional Learning of Spatio-temporal Features, Proc. European Conference on Computer Vision (ECCV'10), 2010, \cite{taylor-eccv-10}.