11.5

Table Of Contents
Chapter 7: Customizing and optimizing Vocabularies
Definition: The language model
In addition to a word list, a vocabulary has a language model that contains statistical information.
The statistics help predict which words are most likely to occur in the context of a user's speech.
This information includes:
n unigram probability: The likelihood that a word occurs in text compared to other words in the
same vocabulary. For example, if the verb write is more likely to occur in text compared with
the name Wright, then write will have a higher unigram probability.
n bigram and trigram probabilities: The likelihood that a two-word or three-word sequence occurs
in text. For example, if the bigram Mr. Wright is more likely than Mr. write, then the language
model should favor Mr. Wright even though write has a higher unigram probability than
Wright. In this context the bigram/trigram probability outweighs the unigram probability.
285