Zoom out Search Issue
IEEE SIGNAL PROCESSING MAGAZINE [49] MARCH 2015
Low-level modifications do not require knowledge of the
intended message transcription. These can be subdivided into
spectral, temporal, and spatial signal modifications as well as com-
binations thereof.
Straightforward spectral shaping is employed in [8] and [12].
This modification facilitates both low
complexity and a high intelligibility
gain, e.g., [9], making these
approaches particularly suitable for
application in mobile telephony.
Spectrotemporal energy redistri-
bution is considered in [6], where
the glimpse proportion is opti-
mized. The use of a genetic algo-
rithm to perform the optimization
makes this method interesting primarily from a theoretical per-
spective. A low-complexity approach with high intelligibility
gain that performs spectrotemporal energy redistribution by opti-
mizing a perceptual distortion measure is presented in [14].
A particular class of spectrotemporal energy redistribution is
obtained with dynamic range compression. This approach can
either be nonadaptive or adaptive. In a large-scale subjective evalu-
ation of proposed speech modification systems [9], most of the
entries that incorporated dynamic range compression, including
those related to the descriptions in [5], [23], [28], performed well.
Intelligibility can also be enhanced by controlling the spatial
sound field near the ear with a multitude of remote loudspeakers.
As discussed in more detail in the section “Enhancement over
Multiple Spatial Points,” if users are wearing microphones near
their ears, reverberation and cross-talk between different messages
can be reduced by feedback [16]. The goal is that only the desired
signal is present at the ear of a user. If microphones are further
from the ears of the listeners, the emerging field of multizone
audio rendering becomes relevant, e.g., [29].
INTELLIGIBILITY ENHANCEMENT SYSTEMS
This section describes three practical methods for intelligibility
optimization approaches. The described approaches are based on
different principles.
SII-BASED ENHANCEMENT
State-of-the-art systems have been developed based on the decompo-
sition into band-importance and band-audibility functions [4], [8],
[28]. We provided a recent perspective on this decomposition in the
section “Measures Operating on Spectral Band Powers.” This section
describes implementations that closely follow the SII standard.
The computation of the SII [18] uses a carefully calibrated
specification of the speech spectrum
2
a
v
,Ti
and the noise spectrum
2
v
v
,Ei
(where i is a critical or third-octave band index) as measured
over an entire utterance, including minor pauses. The approach
accounts for both the hearing threshold and the loss of intelligibil-
ity at very high presentation (loudness) levels, using information
stored in tables. For an acoustic time-domain speech signal
,a
T
the equivalent speech spectrum level in dB, commonly denoted as
,E
i
is computed as
()
()
,lo
g l
ogE
f
10 10
,
i
i
10
2
10
0
2
a
v
v
=-
D
,Ti
(10)
where
0
2
v denotes the digital reference power per hertz corre-
sponding to the reference sound pressure of 20
n Pa and f
,iD
is
the frequency bandwidth of the ith
subband in hertz. The equivalent
disturbance spectrum level,
,D
i
is
computed in three steps: first the
calibration (10) is applied, and then
the threshold of hearing and in-
stantaneous masking are account-
ed for. In [4] the threshold of
hearing and in [8] both the thresh-
old of hearing and instantaneous
masking are neglected.
The band-audibility function of the SII also accounts for the
decrease in intelligibility at high presentation (loudness) levels,
which is not accounted for in (9). Consequently, it depends on
both the SNR in the band and the absolute presentation level
.E
i
The band-audibility function is identical for different bands and
MAKING IT WORK FOR HEARING INSTRUMENTS
Hearing instruments aim to compensate for a hearing
loss. Typically, this is done by amplifying a sound
recording, followed by dynamic range compression to
ensure the signal remains within the audible and com-
fortable range. Environmental noise degrades intelligi-
bility for hearing instrument users in two ways. A first
degradation is due to noise recorded by the micro-
phones. To decrease the impact of this noise, noise
reduction is applied to the recorded signal prior to
amplification for hearing loss compensation.
A second degradation depends on the fitting: the user
may experience direct environmental noise, leaking
through the hearing instrument vent. This leakage
degrades the intelligibility and can be overcome by pro-
cessing the signal with the application of a speech intel-
ligibility enhancement algorithm before play-out as
discussed in the article.
Adopting the concept of interpretation noise, the
patient’s hearing loss can be measured and modeled by the
noise process .v
L
The environmental noise that reaches the
ear through the hearing instrument vent can be modeled
by the process v
E
of (4). Dynamic range compression can be
taken into account by expressing the desired output range
in terms of (frequency-dependent) absolute power con-
straints. Given this model, the hearing instrument can be
optimized using one of the measures discussed in
the section “Measures Operating on Spectral Band Powers”
in a constrained fashion. The resulting integrated solution
compares favorably with an ad hoc concatenation of pro-
cessing steps, facilitates a conceptual understanding of the
hearing impairment, and is likely to lead to an effective
control of the instrument.
THE MAXIMUM PROBABILITY
OF CORRECT PHONEME
RECOGNITION IS AN EXAMPLE
OF A CRITERION THAT FAVORS
SIGNAL FEATURES THAT RESEMBLE
THOSE OF CLEAN SPEECH.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®