Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

IEEE SIGNAL PROCESSING MAGAZINE [52] MARCH 2015

ENHANCEMENT OVER MULTIPLE

SPATIAL POINTS

We have considered preprocessing

techniques that do not consider the

spatial aspects of the rendering sce-

nario. In this section, we show that

spatial aspects can also be exploited to

enhance intelligibility. In announce-

ment scenarios in public spaces such

as airports, train stations, or shopping

malls, environmental noise and rever-

beration contribute to a reduced intelligibility for the listeners. If

different messages are communicated to different spatial regions,

acoustic leakage between regions [16] exacerbates the problem.

The impact on intelligibility is particularly large for hearing-

impaired persons.

Consider a scenario in a public environment where

N mes-

sages are conveyed via the public address (PA) system to N lis-

teners wearing a hearing instrument. A possibility is to

downstream the corresponding signals directly to the listeners,

but listeners often wear an open fit (nonoccluded) hearing

instrument, where the direct signal also is mixed in at the ear-

drum. Instead of using direct downlink connections, it is possi-

ble to preprocess all speech signals jointly at the PA system so as

to minimize the expected distortion at the eardrums of the lis-

teners. The distortion measure can be based on any (mathemat-

ically well-behaved) model for speech quality or intelligibility,

such as some of the models discussed in the section “Practical

Measures of Intelligibility.”

Let

[, ,, ],aaa a

,, ,

TTT TN12

f= a

and a

(defined similarly)

be the (complex-valued) short-time DFT coefficients of the source

speech signals, enhanced signals (at the PA system), and received

signals at the listeners, respectively. The signals

are captured by

the microphones of the hearing instruments. For simplicity, we

neglect production and interpretation noises of the section “Defin-

ing Intelligibility” and assume that degradations are purely acousti-

cal and consist of noise, reverberation, and cross-talk between

messages. It is easy to see that if we use stacked-vector notation for

the signals

,Ti

and ,a

,L i

,, , ,iN12f= upon preprocessing, all

effects can be included in the affine

signal model given by [16]

,aH va

L ET E

(17)

where the channel matrix

col-

lects all reverberation and cross-talk

transfer coefficients between produc-

tion and reception points, and

additive noise in the environment.

Consider also a distortion mea-

sure

(, ),da a

T L

smooth (continuously differentiable) as a function

of a

, which quantifies the distortion between the reference pro-

duced coefficients a

and what is eventually listened to, .a

Our

aim is to find the modification aa

that minimizes the

expected distortion according to ,d jointly for all talker-listener

points, i.e., we want to solve the optimization problem

[( , )],Hvda aminimize E

TET E

(18)

where the expectation is taken only over the acoustic disturbances

since we have direct access to the speech of the talker a

and therefore take it to be deterministic.

Generic necessary conditions can be derived for solving (18) in

terms of a functional description of the distortion measure

.d The

conditions are [16]

(, ) ,HHvE

aa 0

TET E

;

(19)

where ()

$ is the Hermitian transpose, and / /v 12

22 /

///j

122 22-

^^^^h

is a complex differential operator,

expressed in terms of the real differential operators / v22

and

/,v22

in Hessian (vertical) notation, with respect to the real

and imaginary components of the variable ,v respectively. The

meaning of (19) is that, for optimality, it is required to choose the

preprocessed speech

such as to make the complex gradient of

the distortion measure with respect to the listener DFT bins in all

zones orthogonal to all columns of the channel matrix .H

IN ANNOUNCEMENT

SCENARIOS IN PUBLIC

SPACES SUCH AS AIRPORTS,

TRAIN STATIONS, OR SHOPPING

MALLS, ENVIRONMENTAL NOISE

AND REVERBERATION CONTRIBUTE

TO A REDUCED INTELLIGIBILITY

FOR THE LISTENERS.

Simulated World

Far-End Speech

Modify

Speech

Environmental

Noise

Environmental

Noise

Maximize p

)

Optmized

Parameters

Real World

Modify

Speech

Loudspeaker

Rendering

Listener

[FIG3]

The intelligibility enhancement using a phoneme-level measure.

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND