{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Computer Vision CSCI-GA.2272-001 Assignment 1\n", "\n", "## Introduction\n", "\n", "This assignment is an introduction to using PyTorch for training simple neural net models. Two different datasets will be used: \n", "- MNIST digits [handwritten digits]\n", "- CIFAR-10 [32x32 resolution color images of 10 object classes].\n", "\n", "## Requirements\n", "\n", "You should perform this assignment in PyTorch, modify this ipython notebook\n", "\n", "To install PyTorch, follow instructions at http://pytorch.org/\n", "\n", "Please submit your assignment by uploading this iPython notebook to NYU classes.\n", "\n", "## Warmup [10%]\n", "\n", "It is always good practice to visually inspect your data before trying to train a model, since it lets you check for problems and get a feel for the task at hand.\n", "\n", "MNIST is a dataset of 70,000 grayscale hand-written digits (0 through 9).\n", "60,000 of these are training images. 10,000 are a held out test set. \n", "\n", "CIFAR-10 is a dataset of 60,000 color images (32 by 32 resolution) across 10 classes\n", "(airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). \n", "The train/test split is 50k/10k.\n", "\n", "Use `matplotlib` and ipython notebook's visualization capabilities to display some of these images.\n", "[See this PyTorch tutorial page](http://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py) for hints on how to achieve this.\n", "\n", "** Relevant Cell: \"Data Loading\" **\n", "\n", "## Training a Single Layer Network on MNIST [20%]\n", "\n", "Start by running the training on MNIST.\n", "By default if you run this notebook successfully, it will train on MNIST.\n", "\n", "This will initialize a single layer model train it on the 50,000 MNIST training images for 10 epochs (passes through the training data). \n", "\n", "The loss function [cross_entropy](http://pytorch.org/docs/master/nn.html?highlight=cross_entropy#torch.nn.functional.cross_entropy) computes a Logarithm of the Softmax on the output of the neural network, and then computes the negative log-likelihood w.r.t. the given `target`.\n", "\n", "The default values for the learning rate, batch size and number of epochs are given in the \"options\" cell of this notebook. \n", "Unless otherwise specified, use the default values throughout this assignment. \n", "\n", "Note the decrease in training loss and corresponding decrease in validation errors.\n", "\n", "Paste the output into your report.\n", "(a): Add code to plot out the network weights as images (one for each output, of size 28 by 28) after the last epoch. Grab a screenshot of the figure and include it in your report. (Hint threads: [#1](https://discuss.pytorch.org/t/understanding-deep-network-visualize-weights/2060/2?u=smth) [#2](https://github.com/pytorch/vision#utils) )\n", "\n", "(b): Reduce the number of training examples to just 50. [Hint: limit the iterator in the `train` function]. \n", "Paste the output into your report and explain what is happening to the model.\n", "\n", "## Training a Multi-Layer Network on MNIST [20%]\n", "\n", "- Add an extra layer to the network with 1000 hidden units and a `tanh` non-linearity. [Hint: modify the `Net` class]. Train the model for 10 epochs and save the output into your report.\n", "- Now set the learning rate to 10 and observe what happens during training. Save the output in your report and give a brief explanation\n", "\n", "## Training a Convolutional Network on CIFAR [50%]\n", "\n", "To change over to the CIFAR-10 dataset, change the `options` cell's `dataset` variable to `'cifar10'`.\n", "\n", "- Create a convolutional network with the following architecture:\n", " - Convolution with 5 by 5 filters, 16 feature maps + Tanh nonlinearity.\n", " - 2 by 2 max pooling.\n", " - Convolution with 5 by 5 filters, 128 feature maps + Tanh nonlinearity.\n", " - 2 by 2 max pooling.\n", " - Flatten to vector.\n", " - Linear layer with 64 hidden units + Tanh nonlinearity.\n", " - Linear layer to 10 output units.\n", "\n", "Train it for 20 epochs on the CIFAR-10 training set and copy the output\n", "into your report, along with a image of the first layer filters.\n", "\n", "Hints: [Follow the first PyTorch tutorial](http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py) or look at the [MNIST example](https://github.com/pytorch/examples/tree/master/mnist)\n", "\n", "- Give a breakdown of the parameters within the above model, and the overall number." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from __future__ import print_function\n", "import argparse\n", "import torch\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "import torch.optim as optim\n", "from torchvision import datasets, transforms\n", "from torch.autograd import Variable" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# options\n", "dataset = 'cifar10' # options: 'mnist' | 'cifar10'\n", "batch_size = 64 # input batch size for training\n", "epochs = 10 # number of epochs to train\n", "lr = 0.01 # learning rate" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Data Loading\n", "# Warning: this cell might take some time when you run it for the first time, \n", "# because it will download the datasets from the internet\n", "if dataset == 'mnist':\n", " data_transform = transforms.Compose([\n", " transforms.ToTensor(),\n", " transforms.Normalize((0.1307,), (0.3081,))\n", " ])\n", " trainset = datasets.MNIST(root='.', train=True, download=True, transform=data_transform)\n", " testset = datasets.MNIST(root='.', train=False, download=True, transform=data_transform)\n", "\n", "elif dataset == 'cifar10':\n", " data_transform = transforms.Compose([\n", " transforms.ToTensor(),\n", " transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),\n", " ])\n", " trainset = datasets.CIFAR10(root='.', train=True, download=True, transform=data_transform)\n", " testset = datasets.CIFAR10(root='.', train=True, download=True, transform=data_transform)\n", "\n", "train_loader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=0)\n", "test_loader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "## network and optimizer\n", "if dataset == 'mnist':\n", " num_inputs = 784\n", "elif dataset == 'cifar10':\n", " num_inputs = 3072\n", "\n", "num_outputs = 10 # same for both CIFAR10 and MNIST, both have 10 classes as outputs\n", "\n", "class Net(nn.Module):\n", " def __init__(self, num_inputs, num_outputs):\n", " super(Net, self).__init__()\n", " self.linear = nn.Linear(num_inputs, num_outputs)\n", "\n", " def forward(self, input):\n", " input = input.view(-1, num_inputs) # reshape input to batch x num_inputs\n", " output = self.linear(input)\n", " return output\n", "\n", "network = Net(num_inputs, num_outputs)\n", "optimizer = optim.SGD(network.parameters(), lr=lr)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def train(epoch):\n", " network.train()\n", " for batch_idx, (data, target) in enumerate(train_loader):\n", " data, target = Variable(data), Variable(target)\n", " optimizer.zero_grad()\n", " output = network(data)\n", " loss = F.cross_entropy(output, target)\n", " loss.backward()\n", " optimizer.step()\n", " if batch_idx % 100 == 0:\n", " print('Train Epoch: {} [{}/{} ({:.0f}%)]\\tLoss: {:.6f}'.format(\n", " epoch, batch_idx * len(data), len(train_loader.dataset),\n", " 100. * batch_idx / len(train_loader), loss.data[0]))\n", "\n", "def test():\n", " network.eval()\n", " test_loss = 0\n", " correct = 0\n", " for data, target in test_loader:\n", " data, target = Variable(data, volatile=True), Variable(target)\n", " output = network(data)\n", " test_loss += F.cross_entropy(output, target, size_average=False).data[0] # sum up batch loss\n", " pred = output.data.max(1, keepdim=True)[1] # get the index of the max log-probability\n", " correct += pred.eq(target.data.view_as(pred)).cpu().sum()\n", "\n", " test_loss /= len(test_loader.dataset)\n", " print('\\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\\n'.format(\n", " test_loss, correct, len(test_loader.dataset),\n", " 100. * correct / len(test_loader.dataset)))\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for epoch in range(1, epochs + 1):\n", " train(epoch)\n", " test()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 2 }