Synthesizers are not without their limitations, however.
©American Speech-Language-Hearing Association Hillenbrand & Houde: Speech Synthesis Using Damped Sinusoids 1092-43-xxxx.Journal of Speech, Language, and Hearing Research The precise psychological status of formant frequencies in human speech perception remains a matter of some debate, with some investigators arguing in favor of pattern matching based on the gross shape of the spectrum (e.g., Bladon, 1982 Bladon & Lindblom, 1981 Zahorian & Jagharghi, 1993 but see also Hedlin, 1982 Klatt, 1982). Further, formant synthesizers are heavily used in studies investigating the neural representation of speech and in cochlear implant research, because they provide the ability to clearly and simply specify the ways in which various test stimuli differ from one another.
The continued reliance on formant synthesizers in speech research is due in part to the fact that the underlying control parameters-chiefly fundamental frequency, degree of periodicity, and formants-are widely assumed to have some level of psychological reality,1 making the method suitable for exploring a wide range of issues in speech perception, such as context effects, cue trading, talker normalization, perceptual compensation for coarticulation, phonetic boundary effects, normalization for speaking rate, and a variety of related issues that have formed the core of speech perception research for several decades. Despite these developments, formant synthesizers continue to be widely used in experiments that are aimed at shedding light on a variety of fundamental questions in speech perception. Some of these techniques, such as the sinusoidal method introduced by McAuley and Quatieri (1986), produce speech that is of such high quality that it can be essentially indistinguishable from the original utterance upon which it is modeled. He last several decades have seen a proliferation of methods for the synthesis of high quality speech. KEY WORDS: speech synthesis, spectral peaks, speech perception, vocoder A perceptual evaluation of speech produced by the damped sinewave synthesizer showed excellent sentence intelligibility, excellent intelligibility for vowels in /hVd/ syllables, and fair intelligibility for consonants in CV nonsense syllables. For unvoiced speech, the damped sinusoids are pulsed on and off at random intervals.
If a periodicity measure indicates that a given analysis frame is voiced, the damped sinusoids are pulsed at a rate corresponding to the measured fundamental period. The signal is resynthesized by summing exponentially damped sinusoids at frequencies corresponding to peaks in the masked spectra. In a rough simulation of lateral suppression, the running average is then subtracted from the smoothed spectrum (with negative spectral values set to zero), producing a masked spectrum. A masking threshold is then computed for each frame as the running average of spectral amplitudes over an 800-Hz window. The spectrum analysis begins with the calculation of a smoothed Fourier spectrum. Houde RIT Research Corporation Rochester, NYĪ speech synthesizer was developed that operates by summing exponentially damped sinusoids at frequencies and amplitudes corresponding to peaks derived from the spectrum envelope of the speech signal. Hillenbrand Department of Speech Pathology and Audiology Western Michigan University Kalamazoo Speech Synthesis Using Damped Sinusoids James M.