Zoom out Search Issue
IEEE SIGNAL PROCESSING MAGAZINE [115] MARCH 2015
impairment, these subjects can become candidates for HA or CI
devices. Recently, a number of factors, such as aging population,
enlargement of candidacy criteria, and technological advances
have drawn great attention to HA and CI research and develop-
ment. For users of such assistive listening devices, however, envir-
onmental distortions, such as reverberation and additive noise
(and their combined effects) significantly degrade speech intelligi-
bility and reduce perceived quality to unacceptable levels [1]. As
such, current research has focused on the development of speech
enhancement techniques (e.g., noise suppression, feedback can-
cellation) to meet this demand. To assure that the developed algo-
rithms are behaving as expected, quality and intelligibility
monitoring must be performed.
Traditionally, subjective tests have been used to assure that
acceptable levels of speech quality and intelligibility are attained.
For CI devices, two approaches are commonly taken. The first
makes use of vocoded speech to simulate CI hearing and presents
vocoded speech to NH listeners for identification. The second
approach is more direct and presents degraded (or enhanced)
speech stimuli directly to HI CI users for analysis (e.g., [1]). For
HA users, this latter approach has been commonly used to investi-
gate the effects of various HA signal processing techniques, such
as noise suppression and feedback cancellation, on the perceived
speech quality. Subjective testing, however, is laborious, time-con-
suming, and expensive. As such, automated, repeatable, fast, and
cost-effective objective quality/intelligibility monitoring tools need
to be developed, thus replacing the listeners with an auditory-
inspired computational algorithm.
Reliable objective quality/intelligibility measurement tools
can play key roles in the development, fitting, and online pro-
cessing of different assistive listening devices. In the develop-
ment stage, for example, different processing algorithms can be
optimized to improve the final perceived speech quality/intelli-
gibility. Wide dynamic-range compression algorithms have been
developed to improve the audibility of low-intensity speech
sounds. It is well known, however, that the time-varying gain
changes can introduce unwanted nonlinear distortions. As such,
objective tools provide a means of evaluating the tradeoffs
between audibility and distortion, thus allowing for optimal
parameters to be set. Moreover, for HA fitting, objective meas-
ures can be used to provide presettings tailored to the individual
hearing loss, thus providing more effective starting points for
the adjustment of the HA. Furthermore, the settings that pro-
vide optimum intelligibility may not be the ones that result in
maximum quality, thus toggling between settings based on an
intelligibility and on a quality index can provide a meaningful
comparison for the HA user. Finally, objective tools can be used
in the real-time adaptation of, e.g., speech enhancement algo-
rithms (i.e., model-in-the-loop), such that the processing guar-
antees optimal quality/intelligibility as the user moves from one
(noisy/reverberant) environment to another.
Signal-based objective metrics can be classified as intrusive
or nonintrusive, depending on the need for a reference signal
or not, respectively. While significant research and standardiza-
tion efforts have been placed in developing objective measures
for telephone speech with NH listeners [2], only a small num-
ber of objective measurement tools targeted toward CI/HA
users have been developed. Given the rapidly aging population
and the projected increase of hearing loss that comes with
growing older, it is of great importance that the advantages and
drawbacks of existing tools be characterized, as well as com-
pared to each other on data sets collected under different prac-
tical experimental conditions.
In this article, we present several existing tools that have
been recently developed for users of assistive listening devices;
seven of the investigated tools belong to the intrusive class and
five are nonintrusive. All the metrics were evaluated on the
same data sets comprising speech processed under different
complex listening conditions, such as noise, reverberation,
noise-plus-reverberation, as well as under different nonlinear
effects, such as frequency compression and speech enhance-
ment (i.e., noise suppression and dereverberation). Advantages
and limitations of the investigated tools are presented and sug-
gestions as to which metrics are to be used under different spe-
cific scenarios are given, thus serving as a useful guide for
researchers and developers of assisted listening devices.
OBJECTIVE SPEECH QUALITY AND
INTELLIGIBILITY PREDICTION
Over the last two decades, significant standardization efforts
have been made by the International Telecommunications
Union (ITU-T) to standardize both intrusive and nonintrusive
algorithms for telephone speech using NH listeners [2]. On the
other hand, only a handful of algorithms have been proposed
that are specifically tuned to assistive listening devices. To
overcome this limitation, recent studies have explored the use
of NH-optimized tools, as well as proposed modifications to
such tools to tailor them to assistive listening devices (e.g.,
[3]). In the following sections, several such measures, both
intrusive and nonintrusive, are described. The choice of meas-
ures used in this study was guided not only by their applicabil-
ity to the task at hand, but also by the availability of publicly
available source code (or code that could be licensed at a rea-
sonable cost).
INTRUSIVE METRICS
NORMALIZED COVARIANCE METRIC
The normalized covariance metric (NCM) measure estimates
speech intelligibility based on the covariance between the enve-
lopes of the time-aligned reference and processed speech sig-
nals [4]–[6]. Computation of NCM values depends on deriving
speech temporal envelopes, via the Hilbert transform, from
outputs of a gammatone filter bank used to emulate cochlear
processing. The normalized correlation between the reference
and processed speech envelopes produces an estimate of the so-
called apparent signal-to-noise ratio (SNR)
SNR
app
^h
given by
() ,logk
r
r
10
1
SNR
[,]
k
k
10
2
2
15 15
app
=
-
-
eo
=
G
(1)
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®