Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

IEEE SIGNAL PROCESSING MAGAZINE [49] MARCH 2015

Low-level modifications do not require knowledge of the

intended message transcription. These can be subdivided into

spectral, temporal, and spatial signal modifications as well as com-

binations thereof.

Straightforward spectral shaping is employed in [8] and [12].

This modification facilitates both low

complexity and a high intelligibility

gain, e.g., [9], making these

approaches particularly suitable for

application in mobile telephony.

Spectrotemporal energy redistri-

bution is considered in [6], where

the glimpse proportion is opti-

mized. The use of a genetic algo-

rithm to perform the optimization

makes this method interesting primarily from a theoretical per-

spective. A low-complexity approach with high intelligibility

gain that performs spectrotemporal energy redistribution by opti-

mizing a perceptual distortion measure is presented in [14].

A particular class of spectrotemporal energy redistribution is

obtained with dynamic range compression. This approach can

either be nonadaptive or adaptive. In a large-scale subjective evalu-

ation of proposed speech modification systems [9], most of the

entries that incorporated dynamic range compression, including

those related to the descriptions in [5], [23], [28], performed well.

Intelligibility can also be enhanced by controlling the spatial

sound field near the ear with a multitude of remote loudspeakers.

As discussed in more detail in the section “Enhancement over

Multiple Spatial Points,” if users are wearing microphones near

their ears, reverberation and cross-talk between different messages

can be reduced by feedback [16]. The goal is that only the desired

signal is present at the ear of a user. If microphones are further

from the ears of the listeners, the emerging field of multizone

audio rendering becomes relevant, e.g., [29].

INTELLIGIBILITY ENHANCEMENT SYSTEMS

This section describes three practical methods for intelligibility

optimization approaches. The described approaches are based on

different principles.

SII-BASED ENHANCEMENT

State-of-the-art systems have been developed based on the decompo-

sition into band-importance and band-audibility functions [4], [8],

[28]. We provided a recent perspective on this decomposition in the

section “Measures Operating on Spectral Band Powers.” This section

describes implementations that closely follow the SII standard.

The computation of the SII [18] uses a carefully calibrated

specification of the speech spectrum

,Ti

and the noise spectrum

,Ei

(where i is a critical or third-octave band index) as measured

over an entire utterance, including minor pauses. The approach

accounts for both the hearing threshold and the loss of intelligibil-

ity at very high presentation (loudness) levels, using information

stored in tables. For an acoustic time-domain speech signal

the equivalent speech spectrum level in dB, commonly denoted as

is computed as

()

,lo

g l

ogE

10 10

,Ti

(10)

where

v denotes the digital reference power per hertz corre-

sponding to the reference sound pressure of 20

n Pa and f

,iD

the frequency bandwidth of the ith

subband in hertz. The equivalent

disturbance spectrum level,

computed in three steps: first the

calibration (10) is applied, and then

the threshold of hearing and in-

stantaneous masking are account-

ed for. In [4] the threshold of

hearing and in [8] both the thresh-

old of hearing and instantaneous

masking are neglected.

The band-audibility function of the SII also accounts for the

decrease in intelligibility at high presentation (loudness) levels,

which is not accounted for in (9). Consequently, it depends on

both the SNR in the band and the absolute presentation level

The band-audibility function is identical for different bands and

MAKING IT WORK FOR HEARING INSTRUMENTS

Hearing instruments aim to compensate for a hearing

loss. Typically, this is done by amplifying a sound

recording, followed by dynamic range compression to

ensure the signal remains within the audible and com-

fortable range. Environmental noise degrades intelligi-

bility for hearing instrument users in two ways. A first

degradation is due to noise recorded by the micro-

phones. To decrease the impact of this noise, noise

reduction is applied to the recorded signal prior to

amplification for hearing loss compensation.

A second degradation depends on the fitting: the user

may experience direct environmental noise, leaking

through the hearing instrument vent. This leakage

degrades the intelligibility and can be overcome by pro-

cessing the signal with the application of a speech intel-

ligibility enhancement algorithm before play-out as

discussed in the article.

Adopting the concept of interpretation noise, the

patient’s hearing loss can be measured and modeled by the

noise process .v

The environmental noise that reaches the

ear through the hearing instrument vent can be modeled

by the process v

of (4). Dynamic range compression can be

taken into account by expressing the desired output range

in terms of (frequency-dependent) absolute power con-

straints. Given this model, the hearing instrument can be

optimized using one of the measures discussed in

the section “Measures Operating on Spectral Band Powers”

in a constrained fashion. The resulting integrated solution

compares favorably with an ad hoc concatenation of pro-

cessing steps, facilitates a conceptual understanding of the

hearing impairment, and is likely to lead to an effective

control of the instrument.

THE MAXIMUM PROBABILITY

OF CORRECT PHONEME

RECOGNITION IS AN EXAMPLE

OF A CRITERION THAT FAVORS

SIGNAL FEATURES THAT RESEMBLE

THOSE OF CLEAN SPEECH.

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND