Datasheet
3 - 5
USER'S GUIDE
guistic rules are applied to all other abbreviations, depending on their structure:
"NASA" is pronounced as any “normal” word but abbreviations such as "Ph. D."
and "MBA" will be pronounced letter per letter.
Abbreviations composed of the first letter of each word and pronounced as
normal words are called acronyms: IRIS, short for "Image Recognition Inte-
grated Systems" is an example. Initialisms such as "PC" and "VAT" are also
composed of the first letter of each word but are pronounced letter by letter. To
ensure that a word is read letter per letter, it suffices to insert dots between the
letters: "N.A.T.O." for example yields a different result than "NATO".
Next, the conversion of spelling to phonetics takes place. Using a rule-
based method, the letters are replaced by sounds, words are cut up in syllables
and the necessary accents are identified. In short, the ortography is replaced by
an accurate phonetic transcription.
This is by no means an easy task, as the English language perfectly illustrates.
Some letters, such as x, c and q do not correspond to sounds! In the French
language, the /o/ sound can be spelled in ten different ways: au, aux, eau (bu-
reau), eaux (bureaux), os (dos), aut (haut), ault (Renault), o (zéro), ot (abricot)
and ôt (plutôt). The playwright G. B. Shaw once jokingly proposed to write the
word "fish" as "ghoti": "gh" is pronounced as /f/ in "enough", the "o" is pronounced
as /i/ in "women" and "ti" is pronounced as /sh/ in "nation".
As individual sounds, so-called “phonemes”, and not words, are the basic seg-
ments of a phrase, the introduction of new words and technical terms unknown to
the system poses no threat. Shakespearean monologue and brand new words are
pronounced with equal ease.
To solve ambiguous cases such as the noun "record" and the verb "record"
in phrases such as "Let’s record a beautiful record" or the verb "live" (/liv/) and
the adjective "live" (/laiv/) in "The host of the live show lives in New York", a
complete linguistic analysis of the recognized text on sentence and word level is
performed. (The technical term for such words is “homographs”: they are writ-
ten identically, but pronounced differently.)
4chapter3.p65 2/27/2001, 8:47 AM5