Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

111

112

113

114

115

116

117

118

119

120

IEEE SIGNAL PROCESSING MAGAZINE [118] MARCH 2015

bank was replaced by the filter bank used in the speech coding

strategy of the CI devices used by the listeners in the subjective

test. Second, speech content variability was reduced by means

of a modulation spectrum thresholding scheme [27]. Finally, to

model the reduced sensitivity of the HI listeners, the 4–128 Hz

range of the eight modulation filter bank center frequencies of

the original SRMR metric was reduced to 4–30 Hz. The SRMR-

CI has been tested as a correlate of intelligibility for CI users

under clean, noisy, reverberant, noise-plus-reverberation, and

speech-enhanced conditions [26], [27].

Similar to the modified SRMR-CI metric previously

described, an alternate modification to the original SRMR met-

ric has been performed to tailor it to HA devices [28]. First, the

gammatone filter bank in the original SRMR implementation

was modified to take into account the listener’s individual

hearing loss thresholds obtained via an audiogram. More spe-

cifically, the Q-factors of each of the filters were adjusted to

simulate the hearing loss due to outer hair cell damage. Hence,

as hearing loss increased, so did the filter bandwidths (i.e.,

Q-factors decreased). Additionally, the temporal Hilbert enve-

lopes were compressed using a nonlinear compression func-

tion, similar to that used in the HASQI metric, to further

model outer hair cell losses. For HA devices, it was found that

the original 4–128 Hz range of modulation filter bank center

frequencies was optimal, thus no changes were implemented in

the modulation filter bank. The SRMR-HA was tested as a cor-

relate of subjective quality for HA users in noisy, reverberant,

and speech-enhanced conditions [28].

EXPERIMENTAL SETUP

In this section, the data sets used in the experiments as well as

the evaluation criteria that will be used to characterize the per-

formance of the investigated metrics are described.

CI SPEECH INTELLIGIBILITY DATA SET

This database is described in full detail in [1] and in the references

therein. The material comprises speech data presented to CI users

within the framework of an intelligibility subjective test. The

speech sentences presented to the CI users were taken from the

well-known IEEE sentence corpus. Four recorded room impulse

responses were convolved with the clean speech data to simulate

reverberant speech with reverberation times (RT60) of 0.3, 0.6,

0.8, and 1 s. Speech-shaped noise was also added to the anechoic

and the reverberant signals to generate noise-only and noise-plus-

reverberation degradation conditions, respectively. Noise was

added at SNRs of -5, 0, 5, and 10 dB for the anechoic samples and

5 and 10 dB for the reverberant samples. For the noise-plus-rever-

beration condition, the reverberant signals served as reference for

SNR computation. Additionally, the database includes sentences

enhanced using an ideal reverberant masking (IRM) strategy [29].

These sentences were under reverberant conditions with RT60 s of

0.6, 0.8, and 1 s, and all of the noise-plus-reverberation conditions

previously described. The IRM algorithm was configured to use

two to three different threshold values for each condition. Speech

files were sampled at 16 kHz with 16-bit resolution.

Eleven adult CI users were recruited to participate in the sub-

jective intelligibility experiments. The participants were all native

speakers of American English with postlingual deafness and had

an average age of 64 years. All participants had a minimum of one

year experience using their device routinely, with some being

bilaterally implanted for over six years. For consistency, all partici-

pants were temporarily fitted with a SPEAR3 research processor

(22 filter bank channels with Mel-like spacing) with parameters

matching the individual CI user’s clinical settings. Participants

were presented with 31 lists of 20 sentences randomly selected

from the IEEE database, each list being corrupted by the afore-

mentioned degradation conditions. Degraded stimuli were pre-

sented directly to the audio input of the research processor and

the level was adjusted individually for comfort at the beginning of

the experiment. Listeners were instructed to repeat aloud each

sentence after its presentation. A tester then marked the words

correctly identified by the subject according to the ground truth

transcript. Finally, the number of words correctly recognized by

the listener were divided by the total number of presented words

to find the per-participant intelligibility scores. More details about

the listening test can be found in [1].

HA SPEECH QUALITY DATA SETS

Two speech quality data sets collected with HA users were used in

the experiments described herein. The first database explores the

effects of frequency lowering, an amplification strategy for HI lis-

teners with severe to profound high frequency sensorineural hear-

ing loss that has gained renewed attention recently. Nonlinear

frequency compression (NFC) is a particular type of frequency

lowering algorithm, wherein the input spectral content beyond a

cutoff frequency (CF) is compressed by a factor determined by the

compression ratio (CR) before further processing by the HA. Thus,

NFC moves high frequency energy to lower frequency regions

(where there is better residual hearing acuity) increasing the

chances of audibility and potential benefit. We refer the interested

reader to [30] and the references therein for more details about

the database and NFC processing.

The speech material presented to the listeners consisted of

IEEE sentences, spoken by two males and two females and

recorded through HAs with different NFC strategies; more specifi-

cally: 1)

CF4kHz= and :;CR 21= 2)

,CF 2 kHz

= :;CR 21=

3) ,CF 3 kHz= :;CR 21= 4) ,CF 3 kHz= :;CR 61= and 5)

,CF 3 kHz= :.CR 10 1= In addition, two “anchor” stimuli were

created for each sentence: peak clipping at 25% of maximum sig-

nal amplitude and lowpass filtering at 2 kHz. In this study, the

anchor conditions are not used during metric performance com-

parison to place emphasis solely on the effects of NFC. As such, of

the available 32 stimuli [4 speakers

# (5 NFC conditions + 2

anchors + 1 clean reference)], only 24 are used in the analysis

presented in the section “Experimental Results.”

Quality ratings of this database were obtained with 11 HI listen-

ers with severe to profound hearing loss. Each participant was fitted

with a Phonak Savia behind-the-ear (BTE) HA and seated in a dou-

ble-walled sound booth in front of a speaker and a computer moni-

tor. Ratings of speech quality were obtained using the [20–100]

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND