Zoom out Search Issue
IEEE SIGNAL PROCESSING MAGAZINE [48] MARCH 2015
SIGNAL PROCESSING
APPROACHES
In this section, the focus is on cre-
ating practical enhancement sys-
tems. We start with a discussion of
various modifications that can be
made and then discuss three
approaches to enhancement and
their performance. Specific applica-
tions are described in “Making Mobile Phones More Intelligi-
ble” and “Making It Work for Hearing Instruments.”
SPEECH MODIFICATIONS
The basic paradigm of intelligibility enhancement discussed in
this article is to select a modification operation to be used for
preprocessing the signal and a measure of intelligibility, and
then to adjust the parameters of the modification operation to
maximize the measure. We discuss the classes of modifications
that have been used or can be used and report on current
knowledge about their effectiveness.
Enhancement operators can be classified according to a
number of criteria. Operators can be classified generically as
time-varying or time-invariant and as linear or nonlinear. Most
intelligibility enhancement operators are time-invariant and
nonlinear. However, low-level operators that use a linear filter-
ing of the signal [8] have been used and perform well (if the fil-
ter is adapted, the operator is nonlinear).
Additional classifications can be made based on the specific
processing performed on the message. Depending on the
abstraction level where a modification takes place, we identify
lexical (high level), prosodic (midlevel), and spectral and
temporal (low level) modifications.
In accordance with the Markov
chain model of the communication
process, presented in the section
“Defining Intelligibility,” a high-
level modification affects the mes-
sage representation at the lower
levels. The operator can be indepen-
dent or dependent on the environ-
mental disturbance, i.e., it can be nonadaptive or adaptive.
Finally, depending on the origin of a modification there are 1)
mimicking strategies, i.e., modifications that attempt to mimic
modifications used consciously or subconsciously by humans
producing speech in adverse conditions, and 2) rational strate-
gies based on, e.g., expert insight in the human auditory periph-
ery and in cognition [3] or of the sound field [16], [29].
In unpublished work of the Listening Talker (LISTA) project
(http://listeningtalker.org), 44 possible modifications were iden-
tified. This includes the modification strategies used in essen-
tially all existing intelligibility enhancement systems. The
effectiveness of some of the listed modifications on the intelligi-
bility in noisy environments is reviewed in [7] and [9].
As is discussed in [7] and [9], mimicking strategies such as
pitch modification, vowel space adjustment, and uniform speak-
ing rate reduction do not improve intelligibility consistently
when applied to natural speech. This outcome suggests that
such modifications may have an auxiliary role or may be the
result of physical limitations in the speech production mecha-
nism. Other mimicking candidate modifications include chang-
ing the relative duration of phonetic units and shortening units
that are more sensitive to energetic masking in favor of more
robust units. As of now, no conclusions can be drawn about the
benefit from such modifications. In the remainder of this sec-
tion we focus on rational strategies.
Lexical speech modifications consist of, among others: 1) rep-
etition to provide additional cues and 2) rephrasing to increase
correct recognition probability as a result of better noise robust-
ness and/or higher predictability. While repetition does not facili-
tate intelligibility optimization, rephrasing provides an intuitive
and attractive modification class. The section “Measures of Intelli-
gibility” discussed high-level modification measures that can, at
least in principle, be used for this purpose. A practical rephrasing
approach is presented in [19]: rather than comparing the mea-
sures directly, the method compares the sensitivity to noise
addition of each formulation, according to the probability of
correct recognition. The approach does not consider the pre-
dictability of the formulation, which is a major factor in intelli-
gibility. An indirect indication of the expected gain from
increasing the predictability of a formulation, e.g., by vocabu-
lary size reduction, can be obtained by comparing the outcomes
of intelligibility evaluations using closed-set [14] and open-set
vocabulary bases [9]. The considerably higher intelligibility gain
for closed-set evaluation suggests that it is feasible to design a
modification system achieving intelligibility gain by improving
the predictability of the formulation.
MAKING MOBILE PHONES MORE INTELLIGIBLE
Mobile telephony is often conducted in the presence of
acoustical background noise such as traffic or babble
noise. In this situation, the listener perceives a mixture
of clean speech and environmental noise from the
near-end side, which generally leads to an increased lis-
tening effort and possibly to reduced speech intelligi-
bility. As the noise signal generally cannot be changed,
the manipulation of the far-end signal is the only way
to effectively improve speech intelligibility and to ease
listening effort for the near-end listener.
In the mobile phone application, the algorithmic delay of
the processing is crucial since the allowed round-trip delay
of the communication system is limited. This places a severe
constraint on the modification operator. Furthermore, the
restrictions of the microloudspeakers of mobile phones
need to be considered. The maximum thermal load of the
microloudspeaker constitutes a major limitation, which can
be taken into account with a constraint on the total audio
power. Finally, the ear of the near-end listener is usually
next to the loudspeaker and must be protected from dam-
age and pain. This can be ensured by power limitations for
the critical bands.
THE CLASSIC SII HAS PROVEN
TO BE HIGHLY CORRELATED
WITH SPEECH INTELLIGIBILITY
IN MANY CONDITIONS AND HAS
BEEN USED AS A BASIS FOR SPEECH
INTELLIGIBILITY ENHANCEMENT.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®