Homework #2: Backpropagation and multi-module gradient-based learning. The purpose of this assignment is to implement and experiment with multi-module gradient-based learning and backpropagation. Your assignment should be sent by email to the TA - include "homework 02" in the subject line - Send your email in plain text (no msword, no html, no postscript, no pdf). - late submissions will be penalized with the following formula: corrected_grade = actual_grade / (1 + days_late/15). - you must implement the code by yourself, but you are encouraged to discuss your results with other students. - Include your source code as attachment(s). ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ You must implement various modules and learning machine architectures and train them on several datasets. Much of the code is implemented and provided in Lush. You merely have to fill in the blanks in the file "modules.lsh" and modify homework-02.lsh to get the results. You can re-implement this in another object-oriented language if you prefer (C++, Java, ...), but you will probably create more work for yourself if you do so. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ PART 1: Implement the necessary modules. 1.1: Implement the euclidean module. The Euclidean module has two vector inputs and 1 scalar output computed as output = 1/2 || input1 - input2 ||^2 You must implement the fprop and bprop methods. 1.2: Implement the linear module. The linear module has one input vector and one output vector. output = w * input, where w is a trainable weight matrix. You must implement the fprop and bprop methods. The bprop method must propagate gradient back to the input and to the weights. 1.3: Implement the tanh module. The tanh module has a vector input and a vector output of the same size. output_i = tanh( input_i + bias_i ). bias is a vector of (trainable) internal parameters. You must implement the fprop and bprop methods. The bprop method must compute gradients with respect to the input and to the bias vector. Use the functions idx-tanh and idx-dtanh defined in sigmoid.lsh 1.3: Implement nn-layer (one layer of a neural net) The nn-layer is a cascade of two modules: a linear module followed by a tanh module. You must implement the fprop and bprop methods and the constructor. 1.4: Implement nn-2layer (a 2 layer neural net (one hidden layer)) The nn-2layer is a cascade of two nn-layer modules. You must implement the fprop and bprop methods and the constructor. 1.5: devise an automatic scheme for testing the correctness of the bprop method of any module. Explain how it would work. 1.6: (extra credit): implement the above scheme. PART 2: train a 2-layer neural net on the spambase dataset. (the data is provided in spambase.ldat). train a 2-layer neural net on the spambase dataset (the same set used in the first homework). 2.1: train a linear module with a euclidean cost on this set with 2000 training samples and 2000 test samples. report the loss and the error rate on the training set and the test set. You should be able to get less than 10% error on the test set. 2.2 train a 2-layer neural net with euclidean cost on this set with 2000 training samples and 2000 test samples. By playing with the size of the hidden layer, the stopping criterion and the regularizer, you should be able to get around 7% error on the test set. Report the loss and the error rate on the training set and the test set. PART 3: train a 2-layer neural net on the isolet dataset. This dataset is a spoken letter recognition dataset. This is a relatively large set with over 600 input features, 26 categories, and over 6000 training samples. The file isolet.names provides a description of the data. The data is provided in four files in Lush matrix format: isolet-train.mat: training set input vectors isolet-train-labels.mat: training set labels (0..25) isolet-test.mat: test set input vectors isolet-test-labels.mat: test set labels (0..25) The data is also provided in the original comma-separated value if you want to use another language than Lush: isolet-train.data isolet-test.data 3.1: train a linear module with a euclidean cost on this set using 4000 training samples and 1000 test samples. report the loss and the error rate on the training set and the test set. 3.2: train 2-layer networks (with one hidden layer) containing 5, 10, 20, 40, and 80 hidden units using 4000 training samples and 1000 test samples. For each network size report the loss and the error rate on the training set and the test set. You should get less than 5% error on the test set with an appropriately sized-network. WARNING: a training run will take several minutes.