Caltech Distinguished Graduate Student Lecture (Everhart Lecture Series) Monday May 18, 1998, 22 Gates 4:45pm Reception; 5-6pm Lecture and Discussion ------------------------------------------------------------------------- Machines that Talk and Listen: Speech Processing by Humans and Machines Sam Roweis, Hopfield Group, Caltech ------------------------------------------------------------------------- Imagine reading the New York Times if you were blind. Or having a conversation with someone who speaks a different language. How about sending a fax or accessing your email while driving? Speaking is the most natural method of communication between humans. It would be great if our machines could learn to speak and listen. For decades engineers have promised us that this technology is just around the corner. But so far, truly conversant computers have remained merely a promise. What would we want them to do? Ideally machines should be able to to read aloud (convert text into an acoustic waveform), transcribe conversations (convert an acoustic waveform into text) and recognize the sex, identity, or even mood of a speaker (voiceprint analysis). Unfortunately, even the best speech synthesizers sound like a talking moose, speech recognizers get about half the words wrong and speaker identification systems are slow and unreliable. In comparison, the best humans read eloquently, transcribe flawlessly and recognize people they know after a few words. In addition, when these tasks are made more difficult by adding noise and distortions, machine performance degrades dramatically while human performance remains almost unaffected. We must be using a very different strategy than our current machines to solve these problems. This persistent gap between the abilities of computers and people motivates speech researchers to search for new approaches and ideas. In this lecture I will give an overview of the speech technology field. I will show how far we have come by demonstrating today's state of the art systems and explaining how they work. Finally, I will discuss the future of the field by outlining exciting areas for new research and briefly describing my own work which focuses on applying models of human speech production to develop a new representation of speech information.