G22.3033-011
Voice Networking Applications
Graduate Division, Computer Science
Zebadiah Kimmel (kimmel@cs.nyu.edu)

Classes meet weekly Monday 5-7 pm (WWH 102)
First class: Monday, Jan 28, 2002
Last class: Monday, May 6, 2002
No classes on: Feb 18 (President's Day) or March 11 (Spring Break)

Course Schedule
Mailing List


Goals

Advances in networking in recent years, culminating in the spread of the Internet, have made it possible for machines to communicate or "talk to" each other in a myriad of specialized ways: HTTP, NFS, WebDAV, SOAP, HTML/XML, SSL, CIFS, etc. However, human communication with machines has been largely restricted to visual communication (e.g. through text or graphical user interfaces) and finger motions (e.g. typing) for decades. Think about it: do you say much when you are sitting at your computer? Humans have "heard" machines through the eyes and "talked to" machines through the fingers....until now.

A fundamental change in human-computer interaction is beginning to take place, in the form of speech recognition and synthesis technologies. Such technologies have improved tremendously over the past few years, to the point where it is now possible to build software applications that successfully incorporate voice input (from humans) and voice output (from machines). Some simple but impressive examples may be found at 800-555-1212 (public information directory) or 650-847-0000 (corporate name directory).

In this course, we will look at how to use off-the-shelf tools and equipment to build applications that integrate the use of voice technologies. We are going to concentrate on software running on top of Windows 2000, because the world of voice technologies is currently most accessible and most advanced on Windows. If you have a Win2K (or possibly other flavor of Windows) machine at home, you'll be able to run all of the software and do most of the projects at home. We will also be using some VoIP phones on loan from Cisco--please do not let anything happen to these phones. We will primarily use software from Nuance, Cisco, Microsoft, and IBM for implementations.

This course is a survey of how to build networked applications that integrate off-the-shelf voice technologies. If you are interested in topics such as how to actually build a speech processing system, signal processing, or natural language processing, then take a look at this great course being taught by Prof. Dan Melamed: Empirical Natural Language Processing.

Texts and Notes

Prerequisites

Required: knowledge of object-oriented programming. If you know Java or C#, you are good to go.

Helpful but not required: knowledge of XML; knowledge of finite automata, regular expressions, and grammars; knowledge of HTTP; knowledge of TCP/UDP/IP.

Side requirement: you must not be afraid to talk to inanimate objects.

Grading

Mini-projects: 30%
Final project and oral questions: 70%

We will have mini-projects periodically, and a larger final project towards the end of the semester. There are no exams.

On all projects (mini and final), you may work singly or in groups of up to four. All group members receive the same score for each project.

Grading on each final project will be based on:

Some final project ideas: