Homework #2: Backpropagation and multi-module gradient-based learning.

The purpose of this assignment is to implement and experiment
with multi-module gradient-based learning and backpropagation.

Your assignment should be sent by email to the TA
- include "homework 02" in the subject line
- Send your email in plain text (no msword, no html, no postscript, no pdf).
- late submissions will be penalized with the following formula:
  corrected_grade = actual_grade / (1 + days_late/15).
- you must implement the code by yourself, but you are 
  encouraged to discuss your results with other students.
- Include your source code as attachment(s).

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
You must implement various modules and learning machine
architectures and train them on several datasets.

Much of the code is implemented and provided in Lush.
You merely have to fill in the blanks in the file "modules.lsh"
and modify homework-02.lsh to get the results.

You can re-implement this in another object-oriented language 
if you prefer (C++, Java, ...), but you will probably create
more work for yourself if you do so.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

PART 1:  Implement the necessary modules.

1.1: Implement the euclidean module. 
  The Euclidean module has two vector inputs and 1 scalar output
  computed as output =  1/2 || input1 - input2 ||^2
  You must implement the fprop and bprop methods.

1.2: Implement the linear module.
  The linear module has one input vector and one output vector.
  output = w * input, where w is a trainable weight matrix.
  You must implement the fprop and bprop methods.
  The bprop method must propagate gradient back to the
  input and to the weights.

1.3: Implement the tanh module.
   The tanh module has a vector input and a vector output of the
   same size. output_i = tanh( input_i + bias_i ).
   bias is a vector of (trainable) internal parameters.
   You must implement the fprop and bprop methods.
   The bprop method must compute gradients with respect to
   the input and to the bias vector.
   Use the functions idx-tanh and idx-dtanh defined in sigmoid.lsh

1.3: Implement nn-layer (one layer of a neural net)
   The nn-layer is a cascade of two modules: a linear
   module followed by a tanh module.
   You must implement the fprop and bprop methods and the constructor.

1.4: Implement nn-2layer (a 2 layer neural net (one hidden layer))
   The nn-2layer is a cascade of two nn-layer modules.
   You must implement the fprop and bprop methods and the constructor.

1.5: devise an automatic scheme for testing the correctness
   of the bprop method of any module. Explain how it would work.

1.6: (extra credit): implement the above scheme.


PART 2: train a 2-layer neural net on the spambase dataset.
 (the data is provided in spambase.ldat).

 train a 2-layer neural net on the spambase dataset
 (the same set used in the first homework).

2.1: train a linear module with a euclidean cost on this
 set with 2000 training samples and 2000 test samples. 
 report the loss and the error rate on the training 
 set and the test set.
 You should be able to get less than 10% error on the 
 test set.

2.2 train a 2-layer neural net with euclidean cost on this set
 with 2000 training samples and 2000 test samples. 
 By playing with the size of the hidden layer, the stopping
 criterion and the regularizer, you should be able to get 
 around 7% error on the test set. Report the loss and the error 
 rate on the training set and the test set.


PART 3: train a 2-layer neural net on the isolet dataset.

  This dataset is a spoken letter recognition dataset.
  This is a relatively large set with over 600 input features,
  26 categories, and over 6000 training samples. The file 
  isolet.names provides a description of the data.

  The data is provided in four files in Lush matrix format:
  isolet-train.mat: training set input vectors 
  isolet-train-labels.mat: training set labels (0..25)
  isolet-test.mat: test set input vectors
  isolet-test-labels.mat: test set labels (0..25)

  The data is also provided in the original comma-separated 
  value if you want to use another language than Lush:
  isolet-train.data
  isolet-test.data


3.1: train a linear module with a euclidean cost on this
  set using 4000 training samples and 1000 test samples. 
  report the loss and the error rate on the training 
  set and the test set.

3.2: train 2-layer networks (with one hidden layer) containing
  5, 10, 20, 40, and 80 hidden units using 4000 training
  samples and 1000 test samples.  
  For each network size report the loss and the error 
  rate on the training set and the test set.
  You should get less than 5% error on the test set
  with an appropriately sized-network.
  WARNING: a training run will take several minutes.