Course#: CSCI-GA.3033-009

Instructor: Mehryar Mohri

Grader/TA: TBD.

Mailing List

Description

This course gives a computer science presentation of automatic speech
recognition, the problem of transcribing accurately spoken
utterances. The description includes the essential algorithms for
creating large-scale speech recognition systems. The algorithms and
techniques presented are now used in most research and industrial
systems.

Many of the learning and search algorithms and techniques currently
used in natural language processing, computational biology, and other
areas of application of machine learning were originally designed for
tackling speech recognition problems. Speech recognition continues to
feed computer science with challenging problems, in particular because
of the size of the learning and search problems it generates.

The objective of the course is thus not just to familiarize students with particular algorithms used in speech recognition, but rather use that as a basis to explore general text and speech and machine learning algorithms relevant to a variety of other areas in computer science. The course will make use of several software libraries and will study recent research and publications in this area.

This course is also open to undergraduate students.

Lectures

Here are some of the topics covered by this course.

- Lecture 01: introduction to speech recognition, statistical formulation.
- Lecture 02: finite automata and transducers.
- Lecture 03: weighted transducer algorithms.
- Lecture 04: weighted transducer software library.
- Lecture 05: n-gram language models.
- Lecture 06: language modeling software library.
- Lecture 07: maximum entropy (Maxent) models.
- Lecture 08: expectation-maximization (EM) algorithm, hidden Markov models (HMMs).
- Lecture 09: acoustic models, Gaussian mixture models.
- Lecture 10: pronunciation models, decision trees, context-dependent models.
- Lecture 11: search algorithms, transducer optimizations, Viterbi decoder.
- Lecture 12: n-best algorithms, lattice generation, rescoring.
- Lecture 13: discriminative training (
**invited lecture: Murat Saraclar**). - Lecture 14: structured prediction algorithms.
- Lecture 15: adaptation.
- Lecture 16: active learning.
- Lecture 17: semi-supervised learning.

Reading and Software Material

There is no single textbook covering the material presented in this course. The following are some recommended books or papers. An extensive list of recommended papers for further reading is provided in the lecture slides.

Books

- Frederick Jelinek.
*Statistical Methods for Speech Recognition*. MIT Press, Cambridge, MA, 1998. - Lawrence Rabiner and Biing-Hwang Juang.
*Fundamentals of Speech Recognition*. Prentice Hall, 1993.

- B. H. Juang and L. R. Rabiner.
*Automatic Speech Recognition - A Brief History of the Technology*. Elsevier Encyclopedia of Language and Linguistics, Second Edition, 2005. - Mehryar Mohri. Statistical Natural Language Processing. In M. Lothaire, editor, Applied Combinatorics on Words. Cambridge University Press, 2005.
- Mehryar Mohri. Weighted automata algorithms. In Manfred Droste, Werner Kuich, and Heiko Vogler, editors, Handbook of Weighted Automata. Monographs in Theoretical Computer Science, pages 213-254. Springer, 2009.
- Lawrence Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of IEEE, Vol. 77, No. 2, pp. 257, 1989.

- FSM Library (Finite-State Machine Library).
- OpenFst Library (Finite-State Transducer Library).
- GRM Library (Grammar Library).
- DCD Library (Decoder Library).

Location and Time

Room 109 Warren Weaver Hall,

251 Mercer Street.

Mondays 5:00 PM - 6:50 PM.

Prerequisite

Familiarity with basics in linear algebra, probability, and analysis of algorithms. No specific knowledge about signal processing or other engineering material is required.

Interest in theoretical and applied machine learning or prior acquaintance with machine learning concepts as presented or discussed in "Foundations of Machine Learning" or the Ph.D. seminar in machine learning, or with natural language processing will be helpful.

Coursework

3 assignments.

The standard high level of integrity is expected from all students, as with all CS courses.

Homework assignments

Previous years