System information

Speech Processing

The dream of having our technical inventions talk to us is older than the telephone

itself. Each new advance in technology spurs a new wave of eager experimentation.

Generally, results never quite meet expectations, possibly because as soon as a machine

says something that sounds intelligent, most people assume that it is intelligent.

People who program and maintain computers realize their limitations, and thus tend

to allow for their weaknesses. Everybody else just expects their computers and software

to work. The amount of thinking a user must do to interact with a computer is often

inversely proportional to the amount of thinking the design team did. Simple interfaces

belie complex design decisions.

The challenge, therefore, is to design a system that has anticipated the most common

desires of its users, and can also adroitly handle unexpected challenges.

Festival

The Festival text-to-speech server can transform text into spoken words. While this is

a whole lot of fun to play with, there are many challenges to overcome (for more on

integrating Festival with Asterisk, refer back to “Text-to-Speech Utilities”

on page 440).

For Asterisk, an obvious value of text-to-speech might be the ability to have your tel-

ephone system read your emails back to you. If you’ve noticed the somewhat poor

grammar, punctuation, and spelling typically found in email messages these days, you

can perhaps appreciate the challenges this poses.

One cannot help but wonder if the emergence of text-to-speech will inspire a new

generation of people dedicated to proper writing. Seeing spelling and punctuation er-

rors on the screen is frustrating enough—having to hear a computer speak such things

will require a level of Zazen that few possess.

Speech recognition

If text-to-speech is rocket science, speech recognition is science fiction.

Speech recognition can actually work very well, but unfortunately this is generally true

only if you provide it with the right conditions—and the right conditions are not those

found on a telephone network. Even a perfect PSTN connection is considered to be at

the lowest acceptable limit for accurate speech recognition. Add in compressed and

lossy VoIP connections, or a cell phone, and you will discover far more limitations

than uses.

Asterisk now has an entire speech API, so that outside companies (or even open source

projects) can tie their speech recognition engines into Asterisk. One company that has

done this is LumenVox. By using LumenVox’s speech recognition engine along with

The Future of Asterisk | 587