Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [24] MARCH 2015
MVDR USING RTFs
By constraining the desired component in the output signal to
be equal to the speech component at an arbitrarily chosen refer-
ence microphone
r [24], the constraint in(6) becomes
,yshswh
!
,s
H
r00 00
0
== (11)
which is equivalent to ,1wh
H
0
=
u
where the RTF h
0
u
is defined as
.
hh
h
h
h
h
h
1h
h
,,
,
,
,
,
,
rrr r
M
T
0
0
0
0
01
0
02
0
0
gg==
9
u
;
E
(12)
By substituting the ATFs h
0
with the RTFs h
0
u
in (8) and (10),
the modifed MVDR filter is obtained as
.hww
hh
h
hh
h
,r
HH
0
0
1
0
1
0
0
1
0
1
0
MVDRMVDR
xx
xx
vv
vv
U
U
U
U
== =
)
-
-
-
-
u
uu
u
uu
u
(13)
Note that blind identification of RTFs is significantly easier than
blind identification of ATFs. When noise and interference are
absent, this can simply be achieved by dividing the crosspower
spectral densities of the microphone signals. When noise and/or
interference are present, methods exploiting the nonstationarity
of speech or based on the generalized eigenvalue decomposition
have been proposed, e.g., [24] and [25].
GSC
The constrained optimization problem of the MVDR beamformer
in (7) can be transformed into an unconstrained optimization
problem, leading to the highly popular GSC structure [22]–[25],
consisting of three main blocks (see Figure 5): 1) a fixed beam-
former (FB), ensuring the fulfillment of the constraint in (6) or
(11), 2) a blocking matrix (BM), creating so-called noise refer-
ences
,u
m
and 3) a multichannel interference canceler ,g
m
minimizing the residual interference and noise in the output of
the FB that is correlated with the noise references. If the target
signal leaks into the noise references due to a mismatched BM
(e.g., caused by RTF estimation errors or by DOA errors, micro-
phone mismatch, and reverberation when using free-field
HRIRs), the target signal will be partially canceled as well. To
mitigate this target signal cancellation, the interference
canceler is typically adapted only during periods when the tar-
get source is inactive; see, e.g., [23]. Moreover, several tech-
niques have been proposed to reduce the speech leakage
components in the noise references, e.g., [24], [25], and/or limit
the distorting effect of the remaining speech leakage [23], [26],
[27], e.g., by imposing a quadratic inequality constraint or by
using the so-called speech-distortion-regularized GSC [27].
APPLICATION IN ALDs
The GSC or one of its more robust variants can be considered as
the current state-of-the-art solution for monaural hearing devices
with an end-fire microphone array configuration, e.g., [28]–[30].
A very popular variant is the adaptive directional microphone
(ADM) [15], [28], [29], where the fixed beamformer and the BM
are differential beamformers forming a front- and back-oriented
cardioid pattern, and an adaptive scalar minimizes the energy
arriving from the back hemisphere. A two-microphone imple-
mentation was indeed shown to achieve a considerable speech
intelligibility improvement for hearing aid users (about 3.4 dB
improvement for three babble noise sources) [29].
MULTICHANNEL WIENER FILTER
The second popular class of multichannel signal enhancement
techniques is associated with the multichannel Wiener filter
(MWF), e.g., [2, ch. 3, 6, 14], [27], [31]. It produces a minimum
mean square error (MMSE) estimate of either the target source [2,
ch. 3], the speech component at an arbitrarily chosen microphone
[2, ch. 6,14], [31], or a reference speech signal [2, ch. 14], [27]. To
trade off speech distortion and noise reduction, the so-called
speech-distortion-weighted MWF was introduced [27], [31].
Similarly to the MVDR using RTFs, the MWF neither
requires a priori information about the microphone configura-
tion nor the position of the target source, making it an appeal-
ing approach from a robustness point of view. On the other
hand, relying on the second-order statistics of the desired and
undesired signal components implies that, for the assumed
nonstationary processes, these statistics must be estimated with
sufficient accuracy at all times; cf. the section “Estimation of
Interference and Noise Statistics.”
MMSE ESTIMATION FOR THE MWF
The MWF aims to extract the target source by minimizing the
mean square error (MSE) between the (unknown) source signal
s
0
and the beamformer output, i.e.,
{} { }.argmin argminEs y Eswwx
H
0
2
0
2
MWF
ww
=-=-(14)
Assuming the target source and the interfering sources and
noise to be uncorrelated, the solution of (14) is given by
{} ,Eswxh
*
ss
1
0
1
0MWF
xx xx
00
zUU==
--
(15)
requiring the ATFs h
0
and the target source PSD
ss
00
z to be
estimated, which is a nontrivial task. However, similarly to the
MVDR using RTFs, we can also design an MWF aiming at
extracting the speech component at an arbitrarily chosen refer-
ence microphone
r by
,argmin Ehswwx
,r
H
00
2
MWF
w
=-
u
"
,
(16)
which yields
{} .Ehs hwx hh h
,
**
,
*
r
ss
H
ss
r
1
00
0
0
1
0
0
MWF
xx
vv
00 00
zzUU==+
-
-
u
^h
(17)
Although it appears that the ATFs and the target source PSD are
required to compute (17), the (rank-1) crosspower spectral
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®