The principal goal of this paper is to describe the considerations that go into making a model of the timbre of the human voice for musical use. In the course of the paper I will compare a number of existing models and some of my own research.
Timbre is defined by the American Standards Association as, "that attribute of auditory sensation in terms of which a listener can judge that two sounds similarly presented and having the same loudness and pitch are dissimilar." This definition is broad compared to modern thoughts on timbre. Timbre is thought, today, to describe a number of features of sound and their combination. It is not my purpose, here, to discuss all aspects of timbre, only those relevant to the perception of vocal timbre. These are attack, steady-state color, note transition and vibrato.
This paper will begin with a description of the issues involved in modeling physical sound systems. Following this is a section describing how modeling the human voice is similar to modeling other sound systems and also the differences and difficulties of modeling the human voice. My research on issues important to modeling the human voice will then be described. It includes the areas of: the physical system of the voice, basic vocal timbre perception, and vocal pedagogy. Finally, some current models of vocal timbre will be presented and evaluated.