VLG Group
Group Meetings
Y. LeCun's website
CS at Courant
Courant Institute

Machine Learning and Pattern Recognition: Final Projects

[ Course Homepage | Schedule and Course Material | Mailing List ]

This is the list of final project topics.

Projects can be done individually, in groups of 2, or (with permission) in groups of 3.

You can propose your own project topic, subject to approval by the instructor.

You must send a .tar or .tgz file to the TA with your code and a PDF file describing your project and the results you obtained.

Implement Standard Learning Algorithms with Eblearn

Eblearn is a C++ library that implements classes and functionalities similar to Lush's gblearn2 package for gradient-based and energy-based learning.

This series of projects consists in implementing a number of standard algorithms and applications using eblearn. Suggested algorithms include Adaboost, PCA, K-Means, Mixtures of Gaussians.

Students should contact Pierre Sermanet (pierre DOT sermanet AT gmail DOT com), the principal maintainer of EBLearn, for information about what module and functionaltiy should be implemented.

Predicting Financial Data

We have a dataset consisting of description vectors of various companies, together with a variable that indicated whether the company defaulted on their loans.

The project consists in predicting whether the company will default using various methods, including neural nets, logistic regression, SVM, and perhaps other methods (each project team should pick a good subset).

The complication resides in the fact that some variables are missing, hence a latent variable inference model should be used.

Contact the instructor to obtain the dataset.

Implementing Learning Modules on NVidia GPUs with CUDA

CUDA is a C-like language for writing program for NVidia GPU cards.

Lush has a framework for writing, compiling, and calling CUDA functions,

A small number of modules (multiple convolutions, etc) have been implemented with this framework, but more are needed. For example, fully-connected neural nets, and sparse coding algorithms.

Students interested in this should contact Benoit Corda (cordaben AT gmail DOT com) for the CUDA/Lush interface code and information about which functions to implement.

You need to have access to an NVidia graphic card with a GPU that supports CUDA (GeForce 9xxx, GTX-2xx, GTX-3xx, GTX-4xx, etc).

A couple of publicly accessible CIMS servers have CUDA GPU cards in them. A number of CBLL servers also have CUDA cards but they are heavily used.

Learning Sparse and Overcomplete Representations for Music and Speech

Learning algorithms to produce sparse and overcomplete representations have been popular for image recognition, image denoising, and other visual tasks. The project consists in applying these methods to musical signal analysis and speech.

The method can be used as a first step in a "blind source separation" system that could separate instrument tracks from a monophonic recording.

This would consist in adapting existing code (in Lush and/or C++) and run it on audio datasets.

Some information on this class of algorithms is available here.

Belief Propagation for Music Composition

The purpose of the project is to build a factor graph model that implements the rules of simple counterpoint.

Given a few notes, the model can be used to "fill in the blanks" by finding melodies that satisfy all the rules.

Each rule can be seen as a factor in a factor graph.

The best note sequence will be obtained with Belief Propagation or some other efficient inference procedure.

Lp pooling for Convolutional Nets

Convolutional networks have been trained with a number of different pooling functions, such as average, average followed by tanh, and max.

This project concerns the implementation and test of other pooling functions, such as (SUM_x X_i^p)^(1/p), also known as "Lp norm", as well as 1/b log[ SUM_i exp(b*X_i) ], also known as log-sum pooling.

This will have to be implemented within Lush and/or with EBlearn, and tested on the MNIST or CIfAR 10 dataset.

Students should contact Koray Kavukcuoglu for the code and datasets (koray AT cs DOT nyu DOT edu).

Trainable Contrast Normalization

Convolutional nets and multiplayer neural nets perform well when the state of each layer has zero mean, and when the state variables all have similar standard deviations.

The goal of this project is to implement a trainable subtractive and divisive contrast normalization layer for convolutional networks. (current such modules are set up by hand and not trained).

The idea is to make a trainable convolutional module so that every variable in a 3D set of feature maps is replaced by itself minus a (trainable) linear combination of its neighbors, in such a way that the output is sparse (e.g. so that it minimizes the L1 norm of the state).

Students should contact Koray Kavukcuoglu for the code and datasets (koray AT cs DOT nyu DOT edu).

Recurrent ConvNet for Image segmentation

The idea is to train a convolutional network to perform image segmentation (for example, to label the edges in an image). This will be tested on the Berkeley image segmentation dataset.

Background information on using ConvNets for segmentation can be found here.

The convnet architecture would be recurrent, in that multiple successive layers would share the same convolution kernels.

Parallelizing Stochastic Gradient

Stochastic gradient is difficult to parallelize because each sample is run sequentially (unlike batch training where the samples can be run in parallel). The present project idea is to train multiple convnets in parallel with stochastic gradient, and to synchronize their weight parameters periodically, rarely enough so that the communication cost is minimal.

Lush provides an object serialization infrastructure that makes it easy to run multiple Lush processes that talk to each other though sockets.

Multi-step Denoising Autoencoder

(details to follow).

Field of Experts

Implement the so-called "Field of Experts model" in Lush (see reference below).

Reference: Stefan Roth and Michael J. Black: Fields of Experts: A Framework for Learning Image Priors. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 860-867, June 2005. Link to paper.

This model can be used for image restoration, denoising etc.

Text/Image Segmenter for DjVuLibre

Implement a foreground/background segmenter for DjVuLibre. DjVuLibre is the open source version of the DjVu system. It currently lacks a good segmenter that can separate the text and drawings from the backgrounds and continous-tone images in a scanned document.

The project will consist in building a foreground/background segmenter by using various clustering methods.

Train a Full-Page OCR

Train the convolutional net implementated in Lush on snippet of images extracted from scanned pages. The snippets should include neighboring characters, so that the network output will be robust to "distracting" characters on each side of the character to be recognized.

The purpose is to demonstrate a segmentation-free "brute force" OCR by sweeping the convolutional network over an entire page of text.

Students can use the convolutional network implementations from the gblearn2 library in Lush (in lush/packages/gblearn2) or from the eblearn C++ library Eblearn.

Learning with Conjugate Gradient in Eblearn

Implement and test Polak-Ribiere and Fletcher-Reeves Conjugate Gradient optimizations within the Eblearn C++ library.

Compare it with stochastic gradient on regression problems.

This comes down to implementing a new subclass of the "parameter" class in which the "update" method will be redefined.

Optional: implement quasi-Newton methods such as limited-memory BFGS.

