THE small NORB DATASET, V1.0

Fu Jie Huang, Yann LeCun
Courant Institute, New York University
October, 2005

This database is intended for experiments in 3D object reocgnition from shape. It contains images of 50 toys belonging to 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. The objects were imaged by two cameras under 6 lighting conditions, 9 elevations (30 to 70 degrees every 5 degrees), and 18 azimuths (0 to 340 every 20 degrees).

The training set is composed of 5 instances of each category (instances 4, 6, 7, 8 and 9), and the test set of the remaining 5 instances (instances 0, 1, 2, 3, and 5).

TERMS / COPYRIGHT

This database is provided for research purposes. It cannot be sold. Publications that include results obtained with this database should reference the following paper:

Y. LeCun, F.J. Huang, L. Bottou, Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) 2004

CONTENT

The files are gzipped for download purpose. After uncompressed, they are in a simple binary matrix format, with file postfix ".mat". The file format is explained in a later section.

The "-dat" files store the image sequences. The "-cat" files store the corresponding category of the images. Each "-dat" file stores 24,300 image pairs (5 categories, 5 instances, 6 lightings, 9 elevations, and 18 azimuths). The corresponding "-cat" file contains 24,300 category labels (0 for animal, 1 for human, 2 for plane, 3 for truck, 4 for car).

Each "-info" file stores 24,300 4-dimensional vectors, which contain additional information about the corresponding images:
- 1. the instance in the category (0 to 9)
- 2. the elevation (0 to 8, which mean cameras are 30, 35,40,45,50,55,60,65,70 degrees from the horizontal respectively)
- 3. the azimuth (0,2,4,...,34, multiply by 10 to get the azimuth in degrees)
- 4. the lighting condition (0 to 5)

For regular training and testing, "-dat" and "-cat" files are sufficient. "-info" files are provided in case some other forms of classification or preprocessing are needed.

FILE FORMAT

The files are stored in the so-called "binary matrix" file format, which is a simple format for vectors and multidimensional matrices of various element types. Binary matrix files begin with a file header which describes the type and size of the matrix, and then comes the binary image of the matrix.

The header is best described by a C structure:

struct header {
int magic; // 4 bytes
int ndim; // 4 bytes, little endian
int dim[3];
};

Note that when the matrix has less than 3 dimensions, say, it's a 1D vector, then dim[1] and dim[2] are both 1. When the matrix has more than 3 dimensions, the header will be followed by further dimension size information. Otherwise, after the file header comes the matrix data, which is stored with the index in the last dimension changes the fastest.

The magic number encodes the element type of the matrix:
- 0x1E3D4C51 for a single precision matrix
- 0x1E3D4C52 for a packed matrix
- 0x1E3D4C53 for a double precision matrix
- 0x1E3D4C54 for an integer matrix
- 0x1E3D4C55 for a byte matrix
- 0x1E3D4C56 for a short matrix

Since the files are generated on an Intel machine, they use the little-endian scheme to encode the 4-byte integers. Pay attention when you read the files on machines that use big-endian.

The "-dat" files store a 4D tensor of dimensions 24300x2x96x96. Each files has 24,300 image pairs, (obviously, each pair has 2 images), and each image is 96x96 pixels. The "-cat" files store a 2D vector of dimension 24,300x1. The "-info" files store a 2D matrix of dimensions 24300x4.

Here's a piece of Matlab code to show how to read an example file, which stores a 24300x2x96x96 tensor. (to get rid of the endian confusion, we read bytes and assemble the integers by hand here):

>> fid=fopen('smallnorb-5x46789x9x18x6x2x96x96-training-dat.mat','r');
>> fread(fid,4,'uchar'); % result = [85 76 61 30], byte matrix(in base 16: [55 4C 3D 1E])
>> fread(fid,4,'uchar'); % result = [4 0 0 0], ndim = 4
>> fread(fid,4,'uchar'); % result = [236 94 0 0], dim0 = 24300 (=94*256+236)
>> fread(fid,4,'uchar'); % result = [2 0 0 0], dim1 = 2
>> fread(fid,4,'uchar'); % result = [96 0 0 0], dim2 = 96
>> fread(fid,4,'uchar'); % result = [96 0 0 0], dim3 = 96
>> imshow(transpose(reshape(fread(fid,96*96),96,96)),[0 255]); % show the first image

>> fid=fopen('smallnorb-5x46789x9x18x6x2x96x96-training-cat.mat','r');
>> fread(fid,4,'uchar'); % result = [84 76 61 30], int matrix (54 4C 3D 1E)
>> fread(fid,4,'uchar'); % result = [1 0 0 0], ndim = 1
>> fread(fid,4,'uchar'); % result = [236 94 0 0], dim0 = 24300
>> fread(fid,4,'uchar'); % result = [1 0 0 0] (ignore this integer)
>> fread(fid,4,'uchar'); % result = [1 0 0 0] (ignore this integer)
>> fread(fid,10,'int'); % result = [0 1 2 3 4 0 1 2 3 4] (only on little-endian)

>> fid=fopen('smallnorb-5x46789x9x18x6x2x96x96-training-info.mat','r');
>> fread(fid,4,'uchar'); % result = [84 76 61 30] (integer matrix)
>> fread(fid,4,'uchar'); % result = [2 0 0 0] ndim = 2
>> fread(fid,4,'uchar'); % result = [236 94 0 0] dim0 = 24300
>> fread(fid,4,'uchar'); % result = [4 0 0 0] dim1 = 1
>> fread(fid,4,'uchar'); % result = [1 0 0 0] (ignore this integer)
>> fread(fid,4,'int'); % result = [8 6 4 4] (on little-endian CPU)

readme(same as this papge)
smallnorb-5x46789x9x18x6x2x96x96-training-dat.mat.gz
smallnorb-5x46789x9x18x6x2x96x96-training-cat.mat.gz
smallnorb-5x46789x9x18x6x2x96x96-training-info.mat.gz
smallnorb-5x01235x9x18x6x2x96x96-testing-dat.mat.gz
smallnorb-5x01235x9x18x6x2x96x96-testing-cat.mat.gz
smallnorb-5x01235x9x18x6x2x96x96-testing-info.mat.gz