Zoom out Search Issue
IEEE SIGNAL PROCESSING MAGAZINE [52] MARCH 2015
ENHANCEMENT OVER MULTIPLE
SPATIAL POINTS
We have considered preprocessing
techniques that do not consider the
spatial aspects of the rendering sce-
nario. In this section, we show that
spatial aspects can also be exploited to
enhance intelligibility. In announce-
ment scenarios in public spaces such
as airports, train stations, or shopping
malls, environmental noise and rever-
beration contribute to a reduced intelligibility for the listeners. If
different messages are communicated to different spatial regions,
acoustic leakage between regions [16] exacerbates the problem.
The impact on intelligibility is particularly large for hearing-
impaired persons.
Consider a scenario in a public environment where
N mes-
sages are conveyed via the public address (PA) system to N lis-
teners wearing a hearing instrument. A possibility is to
downstream the corresponding signals directly to the listeners,
but listeners often wear an open fit (nonoccluded) hearing
instrument, where the direct signal also is mixed in at the ear-
drum. Instead of using direct downlink connections, it is possi-
ble to preprocess all speech signals jointly at the PA system so as
to minimize the expected distortion at the eardrums of the lis-
teners. The distortion measure can be based on any (mathemat-
ically well-behaved) model for speech quality or intelligibility,
such as some of the models discussed in the section “Practical
Measures of Intelligibility.”
Let
[, ,, ],aaa a
,, ,
T
TTT TN12
f= a
T
u
and a
L
(defined similarly)
be the (complex-valued) short-time DFT coefficients of the source
speech signals, enhanced signals (at the PA system), and received
signals at the listeners, respectively. The signals
a
L
are captured by
the microphones of the hearing instruments. For simplicity, we
neglect production and interpretation noises of the section “Defin-
ing Intelligibility” and assume that degradations are purely acousti-
cal and consist of noise, reverberation, and cross-talk between
messages. It is easy to see that if we use stacked-vector notation for
the signals
a
,Ti
and ,a
,L i
,, , ,iN12f= upon preprocessing, all
effects can be included in the affine
signal model given by [16]
,aH va
L ET E
=+
u
(17)
where the channel matrix
H
E
col-
lects all reverberation and cross-talk
transfer coefficients between produc-
tion and reception points, and
v
E
is
additive noise in the environment.
Consider also a distortion mea-
sure
(, ),da a
T L
smooth (continuously differentiable) as a function
of a
L
, which quantifies the distortion between the reference pro-
duced coefficients a
T
and what is eventually listened to, .a
L
Our
aim is to find the modification aa
TT
7
u
that minimizes the
expected distortion according to ,d jointly for all talker-listener
points, i.e., we want to solve the optimization problem
[( , )],Hvda aminimize E
TET E
a
T
+
u
u
(18)
where the expectation is taken only over the acoustic disturbances
,H
E
,v
E
since we have direct access to the speech of the talker a
T
and therefore take it to be deterministic.
Generic necessary conditions can be derived for solving (18) in
terms of a functional description of the distortion measure
.d The
conditions are [16]
(, ) ,HHvE
a
d
aa 0
*
L
TET E
H
E
2
2
+=
u
;
E
(19)
where ()
H
$ is the Hermitian transpose, and / /v 12
*
22 /
^
^
h
h
///j
vv
122 22-
01
^^^^h
hh
h
is a complex differential operator,
expressed in terms of the real differential operators / v22
0
^h
and
/,v22
1
^h
in Hessian (vertical) notation, with respect to the real
and imaginary components of the variable ,v respectively. The
meaning of (19) is that, for optimality, it is required to choose the
preprocessed speech
a
T
u
such as to make the complex gradient of
the distortion measure with respect to the listener DFT bins in all
zones orthogonal to all columns of the channel matrix .H
E
IN ANNOUNCEMENT
SCENARIOS IN PUBLIC
SPACES SUCH AS AIRPORTS,
TRAIN STATIONS, OR SHOPPING
MALLS, ENVIRONMENTAL NOISE
AND REVERBERATION CONTRIBUTE
TO A REDUCED INTELLIGIBILITY
FOR THE LISTENERS.
Simulated World
Far-End Speech
Modify
Speech
Environmental
Noise
Environmental
Noise
Maximize p
LT
(M
T
|M
T
)
Optmized
Parameters
Real World
Modify
Speech
Loudspeaker
Rendering
Listener
[FIG3]
The intelligibility enhancement using a phoneme-level measure.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®