Zoom out Search Issue
IEEE SIGNAL PROCESSING MAGAZINE [118] MARCH 2015
bank was replaced by the filter bank used in the speech coding
strategy of the CI devices used by the listeners in the subjective
test. Second, speech content variability was reduced by means
of a modulation spectrum thresholding scheme [27]. Finally, to
model the reduced sensitivity of the HI listeners, the 4–128 Hz
range of the eight modulation filter bank center frequencies of
the original SRMR metric was reduced to 4–30 Hz. The SRMR-
CI has been tested as a correlate of intelligibility for CI users
under clean, noisy, reverberant, noise-plus-reverberation, and
speech-enhanced conditions [26], [27].
Similar to the modified SRMR-CI metric previously
described, an alternate modification to the original SRMR met-
ric has been performed to tailor it to HA devices [28]. First, the
gammatone filter bank in the original SRMR implementation
was modified to take into account the listener’s individual
hearing loss thresholds obtained via an audiogram. More spe-
cifically, the Q-factors of each of the filters were adjusted to
simulate the hearing loss due to outer hair cell damage. Hence,
as hearing loss increased, so did the filter bandwidths (i.e.,
Q-factors decreased). Additionally, the temporal Hilbert enve-
lopes were compressed using a nonlinear compression func-
tion, similar to that used in the HASQI metric, to further
model outer hair cell losses. For HA devices, it was found that
the original 4–128 Hz range of modulation filter bank center
frequencies was optimal, thus no changes were implemented in
the modulation filter bank. The SRMR-HA was tested as a cor-
relate of subjective quality for HA users in noisy, reverberant,
and speech-enhanced conditions [28].
EXPERIMENTAL SETUP
In this section, the data sets used in the experiments as well as
the evaluation criteria that will be used to characterize the per-
formance of the investigated metrics are described.
CI SPEECH INTELLIGIBILITY DATA SET
This database is described in full detail in [1] and in the references
therein. The material comprises speech data presented to CI users
within the framework of an intelligibility subjective test. The
speech sentences presented to the CI users were taken from the
well-known IEEE sentence corpus. Four recorded room impulse
responses were convolved with the clean speech data to simulate
reverberant speech with reverberation times (RT60) of 0.3, 0.6,
0.8, and 1 s. Speech-shaped noise was also added to the anechoic
and the reverberant signals to generate noise-only and noise-plus-
reverberation degradation conditions, respectively. Noise was
added at SNRs of -5, 0, 5, and 10 dB for the anechoic samples and
5 and 10 dB for the reverberant samples. For the noise-plus-rever-
beration condition, the reverberant signals served as reference for
SNR computation. Additionally, the database includes sentences
enhanced using an ideal reverberant masking (IRM) strategy [29].
These sentences were under reverberant conditions with RT60 s of
0.6, 0.8, and 1 s, and all of the noise-plus-reverberation conditions
previously described. The IRM algorithm was configured to use
two to three different threshold values for each condition. Speech
files were sampled at 16 kHz with 16-bit resolution.
Eleven adult CI users were recruited to participate in the sub-
jective intelligibility experiments. The participants were all native
speakers of American English with postlingual deafness and had
an average age of 64 years. All participants had a minimum of one
year experience using their device routinely, with some being
bilaterally implanted for over six years. For consistency, all partici-
pants were temporarily fitted with a SPEAR3 research processor
(22 filter bank channels with Mel-like spacing) with parameters
matching the individual CI user’s clinical settings. Participants
were presented with 31 lists of 20 sentences randomly selected
from the IEEE database, each list being corrupted by the afore-
mentioned degradation conditions. Degraded stimuli were pre-
sented directly to the audio input of the research processor and
the level was adjusted individually for comfort at the beginning of
the experiment. Listeners were instructed to repeat aloud each
sentence after its presentation. A tester then marked the words
correctly identified by the subject according to the ground truth
transcript. Finally, the number of words correctly recognized by
the listener were divided by the total number of presented words
to find the per-participant intelligibility scores. More details about
the listening test can be found in [1].
HA SPEECH QUALITY DATA SETS
Two speech quality data sets collected with HA users were used in
the experiments described herein. The first database explores the
effects of frequency lowering, an amplification strategy for HI lis-
teners with severe to profound high frequency sensorineural hear-
ing loss that has gained renewed attention recently. Nonlinear
frequency compression (NFC) is a particular type of frequency
lowering algorithm, wherein the input spectral content beyond a
cutoff frequency (CF) is compressed by a factor determined by the
compression ratio (CR) before further processing by the HA. Thus,
NFC moves high frequency energy to lower frequency regions
(where there is better residual hearing acuity) increasing the
chances of audibility and potential benefit. We refer the interested
reader to [30] and the references therein for more details about
the database and NFC processing.
The speech material presented to the listeners consisted of
IEEE sentences, spoken by two males and two females and
recorded through HAs with different NFC strategies; more specifi-
cally: 1)
CF4kHz= and :;CR 21= 2)
,CF 2 kHz
= :;CR 21=
3) ,CF 3 kHz= :;CR 21= 4) ,CF 3 kHz= :;CR 61= and 5)
,CF 3 kHz= :.CR 10 1= In addition, two “anchor” stimuli were
created for each sentence: peak clipping at 25% of maximum sig-
nal amplitude and lowpass filtering at 2 kHz. In this study, the
anchor conditions are not used during metric performance com-
parison to place emphasis solely on the effects of NFC. As such, of
the available 32 stimuli [4 speakers
# (5 NFC conditions + 2
anchors + 1 clean reference)], only 24 are used in the analysis
presented in the section “Experimental Results.”
Quality ratings of this database were obtained with 11 HI listen-
ers with severe to profound hearing loss. Each participant was fitted
with a Phonak Savia behind-the-ear (BTE) HA and seated in a dou-
ble-walled sound booth in front of a speaker and a computer moni-
tor. Ratings of speech quality were obtained using the [20–100]
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®