7

Table Of Contents
Chapter 13 Vocoder—Basics 169
Analyzing Speech Signals
The principles you’ve been introduced to thus far are insufficient for the transmission of
speech signals.
The reason is that human speech consists of a series of voiced sounds (tonal sounds)
and unvoiced sounds (noisy sounds). The main distinction between voiced and
unvoiced sounds is that voiced sounds are produced by an oscillation of the vocal
cords, while unvoiced sounds are produced by blocking and restricting the air flow
with lips, tongue, palate, throat, and larynx.
Should speech containing voiced and unvoiced sounds be used as a Vocoders analysis
signal, but the synthesis engine doesn’t differentiate between voiced and unvoiced
sounds, the result will sound rather toothless. To avoid this, the synthesis section of the
Vocoder must produce different sounds for the voiced and unvoiced parts of the signal.
In Logic’s EVOC 20 PS and the EVOC 20 TO Vocoder plug-ins, there is an Unvoiced/
Voiced detector. This unit detects the unvoiced portions of the sound in the analysis
signal and then substitutes the corresponding portions in the synthesis signal with
Noise, a mixture of Noise + Synth or with the original signal (Blend). If the U/V Detector
detects voiced parts, it passes this information to the Synthesis section, which uses the
normal synthesis signal for these portions. Control over unvoiced/voiced sound
detection, type, and level is found in the U/V Detection section of Logic’s vocoder plug-
ins.
Tips for Better Speech Intelligibility
The classic vocoder effect is very demanding, with regard to the quality of both the
analysis and synthesis signals. Furthermore, the vocoder parameters need to be set
carefully. Following, are some tips on both topics.
Editing the Analysis and Synthesis Signals
Compressing the Side Chain
The less the level changes, the better the intelligibility of the vocoder. We therefore
recommend that compression be used in most cases.
Enhancing High Frequency Energy
The vocoder, in a way, always generates the intersection point of the analysis and
synthesis signals. To explain: If theres no treble portion in the analysis signal, the
resulting vocoder output will also lack treble. This is also the case when the synthesis
signal features a lot of high frequency content. This is true of each frequency band. As
such, the vocoder demands a stable level in all frequency bands from both input
signals, in order to obtain the best results.