We are working on audio fingerprinting techniques that enable computers
to recognize songs. Our algorithm represents songs in terms of "music
phonemes", elementary units of music sound that are learned from data,
and uses weighted finite-state transducers to construct a compact and
efficient index of a large database of songs. The image above depicts an
example of such a transducer. As a result, songs can be recognized
quickly and accurately when only a recording of a short "audio snippet"
is available and even when the recording is distorted.
We have created a working system with a database of 15,000 songs. We
have also proven new bounds on the size of the indexing finite automata
used that guarantee the compactness of this representation as the number
of songs indexed increases and suggests that out techniques scale to
much larger song data sets.
- Mehryar Mohri, Pedro Moreno, and Eugene Weinstein. Efficient and robust music identification with weighted finite-state transducers. To Appear in the IEEE Transactions on Audio, Speech, and Language Processing.
- Mehryar Mohri, Pedro Moreno, and Eugene Weinstein. Robust music identification, detection, and analysis. International Conference on Music Information Retrieval (ISMIR), September 2007, Vienna, Austria. [PDF] [Poster PDF].
- Mehryar Mohri, Pedro Moreno, and Eugene Weinstein, Factor Automata of Automata
and Applications. International Conference on Implementation and Application of Automata (CIAA), July 2007, Prague,
Czech Republic. [PDF] [Slides PDF].
- Eugene Weinstein and Pedro Moreno. Music Identification with Weighted
Finite-State Transducers. International Conference on Acoustics,
Speech, and Signal Processing (ICASSP) 2007, April 2007, Honolulu,