|
Data for MATLAB hackers
Here are some datasets in MATLAB format.
I'm working on better documentation, but
if you decide to use one of these and don't have
enough info, send me a note and I'll try to help.
Also, if you discover something, let me know
and I'll try to include it for others.
There is a Matlab Tutorial here.
Handwritten Digits
- MNIST Handwritten Digits
[data/mnist_all.mat]
[training pictures:
0
1
2
3
4
5
6
7
8
9
]
[testing pictures:
0
1
2
3
4
5
6
7
8
9
]
8-bit grayscale images of "0" through "9";
about 6K training examples of each class; 1K test examples
- USPS Handwritten Digits
[data/usps_all.mat]
[pictures:
0
1
2
3
4
5
6
7
8
9
]
8-bit grayscale images of "0" through "9";
1100 examples of each class.
- Binary Alphadigits
[data/binaryalphadigs.mat]
[picture]
Binary 20x16 digits of "0" through "9" and capital "A" through "Z".
39 examples of each class.
From Simon Lucas' (sml@essex.ac.uk),
Algoval system.
Faces
- If you want a real face dataset, I strongly recommend the UMass
project: Labelled Faces in
the Wild.
- Frey Face
[data/frey_rawface.mat]
[picture]
From Brendan Frey. Almost 2000 images of Brendan's face,
taken from sequential frames of a small video. Size: 20x28.
- Olivetti Faces
[data/olivettifaces.mat]
[picture]
Grayscale faces 8 bit [0-255], a few images of several different people.
400 total images, 64x64 size.
From the Oivetti database at ATT.
- UMist Faces
[data/umist_cropped.mat]
[picture]
Grayscale faces 8 bit [0-255], a few images (views)
of 20 different people.
575 total images, 112x92 size, manually cropped by Daniel
Graham at UMist
Citation: Characterizing Virtual Eigensignatures
for General Purpose Face Recognition, Daniel B Graham
and Nigel M Allinson. In Face Recognition:
From Theory to Applications ; NATO ASI Series F, Computer and Systems
Sciences, Vol. 163; H. Wechsler, P. J. Phillips, V. Bruce, F.
Fogelman-Soulie and T. S. Huang (eds), pp 446-456, 1998.
[original uncropped data]
Text
- Word Counts from Encyclopedia Articles
Here's a tiny subset of word counts from some Grolier encyclopedia articles.
Only the 15K most common words are used in the vocabulary, and only
about 31K articles are represented.
The data is represented as a sparse matrix of counts.
In the csv file, for each article there is one line of the form:
article_number,word_id,word_count,word_id,word_count,...
In the matlab sparse matrix, each row is a word and each column is an
article and the entries are the counts.
[the word list
csv ascii data
matlab sparse matrix data ]
- PNAS Titles
The titles of every paper to appear in the Proceedings of the
National Academy of Sciences until March2005, along with
the date of publication of the paper. The data was obtained
by crawling the PNAS website and downloading the table of
contents from every issue of every volume and yielded about 80,000
papers over the years 1915-2005.
[raw html
ascii data
matlab raw data ]
- NIPS Conference Papers Vols0-12
[matlab or
raw data]
A whole lot of fun! I massaged the OCR'd data
from NIPS1-12 (the pre-electronic submission era)
that Yann made available.
I've included a
tarball of
the massaged raw data, as well as a
matlab package
which is nicely read in and pre-processed.
See the
readme file
for the raw data massaging notes and the
matlab notes file
for explanations of the matlab data.
There are also a couple of extra matlab files, containing
conference and page number info
which you can't make yourself but seems boring to me, and
word counts by author
which is cool, but you could easily have made yourself.
NEW: Check out Gal's
page with updated data for NIPS1-17.
- 20 Newsgroups
[data/20news_w100.mat]
A tiny version of the 20newsgroups data, with binary occurance
data for 100 words across 16242 postings.
I've also tagged the postings by the highest level
domain in the array "newsgroups".
Articulatory Speech Data
- See this page for resources
relating to the University of Wisconsin X-ray Microbeam Database
(UBDB). Hopefully coming soon.
[ |
Information |
Research |
Teaching |
Professional |
]
Sam Roweis, Vision, Learning and Graphics Group,
NYU, www.cs.nyu.edu/~roweis
|