Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

111

112

113

114

115

116

117

118

119

120

IEEE SIGNAL PROCESSING MAGAZINE [115] MARCH 2015

impairment, these subjects can become candidates for HA or CI

devices. Recently, a number of factors, such as aging population,

enlargement of candidacy criteria, and technological advances

have drawn great attention to HA and CI research and develop-

ment. For users of such assistive listening devices, however, envir-

onmental distortions, such as reverberation and additive noise

(and their combined effects) significantly degrade speech intelligi-

bility and reduce perceived quality to unacceptable levels [1]. As

such, current research has focused on the development of speech

enhancement techniques (e.g., noise suppression, feedback can-

cellation) to meet this demand. To assure that the developed algo-

rithms are behaving as expected, quality and intelligibility

monitoring must be performed.

Traditionally, subjective tests have been used to assure that

acceptable levels of speech quality and intelligibility are attained.

For CI devices, two approaches are commonly taken. The first

makes use of vocoded speech to simulate CI hearing and presents

vocoded speech to NH listeners for identification. The second

approach is more direct and presents degraded (or enhanced)

speech stimuli directly to HI CI users for analysis (e.g., [1]). For

HA users, this latter approach has been commonly used to investi-

gate the effects of various HA signal processing techniques, such

as noise suppression and feedback cancellation, on the perceived

speech quality. Subjective testing, however, is laborious, time-con-

suming, and expensive. As such, automated, repeatable, fast, and

cost-effective objective quality/intelligibility monitoring tools need

to be developed, thus replacing the listeners with an auditory-

inspired computational algorithm.

Reliable objective quality/intelligibility measurement tools

can play key roles in the development, fitting, and online pro-

cessing of different assistive listening devices. In the develop-

ment stage, for example, different processing algorithms can be

optimized to improve the final perceived speech quality/intelli-

gibility. Wide dynamic-range compression algorithms have been

developed to improve the audibility of low-intensity speech

sounds. It is well known, however, that the time-varying gain

changes can introduce unwanted nonlinear distortions. As such,

objective tools provide a means of evaluating the tradeoffs

between audibility and distortion, thus allowing for optimal

parameters to be set. Moreover, for HA fitting, objective meas-

ures can be used to provide presettings tailored to the individual

hearing loss, thus providing more effective starting points for

the adjustment of the HA. Furthermore, the settings that pro-

vide optimum intelligibility may not be the ones that result in

maximum quality, thus toggling between settings based on an

intelligibility and on a quality index can provide a meaningful

comparison for the HA user. Finally, objective tools can be used

in the real-time adaptation of, e.g., speech enhancement algo-

rithms (i.e., model-in-the-loop), such that the processing guar-

antees optimal quality/intelligibility as the user moves from one

(noisy/reverberant) environment to another.

Signal-based objective metrics can be classified as intrusive

or nonintrusive, depending on the need for a reference signal

or not, respectively. While significant research and standardiza-

tion efforts have been placed in developing objective measures

for telephone speech with NH listeners [2], only a small num-

ber of objective measurement tools targeted toward CI/HA

users have been developed. Given the rapidly aging population

and the projected increase of hearing loss that comes with

growing older, it is of great importance that the advantages and

drawbacks of existing tools be characterized, as well as com-

pared to each other on data sets collected under different prac-

tical experimental conditions.

In this article, we present several existing tools that have

been recently developed for users of assistive listening devices;

seven of the investigated tools belong to the intrusive class and

five are nonintrusive. All the metrics were evaluated on the

same data sets comprising speech processed under different

complex listening conditions, such as noise, reverberation,

noise-plus-reverberation, as well as under different nonlinear

effects, such as frequency compression and speech enhance-

ment (i.e., noise suppression and dereverberation). Advantages

and limitations of the investigated tools are presented and sug-

gestions as to which metrics are to be used under different spe-

cific scenarios are given, thus serving as a useful guide for

researchers and developers of assisted listening devices.

OBJECTIVE SPEECH QUALITY AND

INTELLIGIBILITY PREDICTION

Over the last two decades, significant standardization efforts

have been made by the International Telecommunications

Union (ITU-T) to standardize both intrusive and nonintrusive

algorithms for telephone speech using NH listeners [2]. On the

other hand, only a handful of algorithms have been proposed

that are specifically tuned to assistive listening devices. To

overcome this limitation, recent studies have explored the use

of NH-optimized tools, as well as proposed modifications to

such tools to tailor them to assistive listening devices (e.g.,

[3]). In the following sections, several such measures, both

intrusive and nonintrusive, are described. The choice of meas-

ures used in this study was guided not only by their applicabil-

ity to the task at hand, but also by the availability of publicly

available source code (or code that could be licensed at a rea-

sonable cost).

INTRUSIVE METRICS

NORMALIZED COVARIANCE METRIC

The normalized covariance metric (NCM) measure estimates

speech intelligibility based on the covariance between the enve-

lopes of the time-aligned reference and processed speech sig-

nals [4]–[6]. Computation of NCM values depends on deriving

speech temporal envelopes, via the Hilbert transform, from

outputs of a gammatone filter bank used to emulate cochlear

processing. The normalized correlation between the reference

and processed speech envelopes produces an estimate of the so-

called apparent signal-to-noise ratio (SNR)

SNR

app

given by

() ,logk

SNR

[,]

15 15

app

(1)

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND