System information
Speech Processing
The dream of having our technical inventions talk to us is older than the telephone
itself. Each new advance in technology spurs a new wave of eager experimentation.
Generally, results never quite meet expectations, possibly because as soon as a machine
says something that sounds intelligent, most people assume that it is intelligent.
People who program and maintain computers realize their limitations, and thus tend
to allow for their weaknesses. Everybody else just expects their computers and software
to work. The amount of thinking a user must do to interact with a computer is often
inversely proportional to the amount of thinking the design team did. Simple interfaces
belie complex design decisions.
The challenge, therefore, is to design a system that has anticipated the most common
desires of its users, and can also adroitly handle unexpected challenges.
Festival
The Festival text-to-speech server can transform text into spoken words. While this is
a whole lot of fun to play with, there are many challenges to overcome (for more on
integrating Festival with Asterisk, refer back to “Text-to-Speech Utilities”
on page 440).
For Asterisk, an obvious value of text-to-speech might be the ability to have your tel-
ephone system read your emails back to you. If you’ve noticed the somewhat poor
grammar, punctuation, and spelling typically found in email messages these days, you
can perhaps appreciate the challenges this poses.
One cannot help but wonder if the emergence of text-to-speech will inspire a new
generation of people dedicated to proper writing. Seeing spelling and punctuation er-
rors on the screen is frustrating enough—having to hear a computer speak such things
will require a level of Zazen that few possess.
Speech recognition
If text-to-speech is rocket science, speech recognition is science fiction.
Speech recognition can actually work very well, but unfortunately this is generally true
only if you provide it with the right conditions—and the right conditions are not those
found on a telephone network. Even a perfect PSTN connection is considered to be at
the lowest acceptable limit for accurate speech recognition. Add in compressed and
lossy VoIP connections, or a cell phone, and you will discover far more limitations
than uses.
Asterisk now has an entire speech API, so that outside companies (or even open source
projects) can tie their speech recognition engines into Asterisk. One company that has
done this is LumenVox. By using LumenVox’s speech recognition engine along with
The Future of Asterisk | 587