Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [25] MARCH 2015
density matrix hh
ss
H
0
0
00
z
can be estimated from the second-
order statistics of the microphone signals; cf. the section “Esti-
mation of Interference and Noise Statistics.”
SPEECH-DISTORTION-WEIGHTED MWF
The MMSE criterion in (16) can be easily generalized to allow
for a tradeoff between noise reduction and speech distortion
[27], [31] by introducing a weighting factor
[, ]:0 3!
n
,argmin Ehs s Ewwhwv
,r
HH
00 00
22
SDW
w
n=-+
u
""
,,
(18)
which is referred to as the speech-distortion-weighted MWF
(SDW-MWF). The solution of (18) is given by
.hwhh h
,
*
ss
H
ss
r
0
0
1
0
0
SDW vv
00 00
znzU=+
-
u
^h
(19)
The smaller the factor
n is chosen, the smaller the resulting
speech distortion. If ,1
n = the MMSE criterion (16) is
obtained. If ,12
n the residual noise level will be reduced at
the expense of increased speech distortion.
RELATIONSHIP BETWEEN MWF AND MVDR
It is interesting to note that the MWF can be decomposed as an
MVDR beamformer, exploiting the spatial information of the
target and interfering sources, followed by a single-channel
Wiener filter (SWF) [2, ch. 3], [32], i.e.,
,w
hh
h
yy yy
yy
H
0
1
0
1
0
SDW
SDW SWFpostfilter
MVDR beamformer
vv
vv
ss vv
ss
#
znz
z
U
U
=
+
-
-
-
u
uu
u
123444 444
12344 44
(20)
where
yy
ss
z and
yy
vv
z denote the PSDs of the desired and unde-
sired components at the output of the MVDR beamformer
w
MVDR
u
using RTFs.
APPLICATION IN ALDs
In [1], a three-microphone MWF implementation for a monaural
hearing device was evaluated at different test sites and compared
with other single- and multimicrophone noise reduction tech-
niques. In this study it was shown that overall the MWF achieved
the largest speech intelligibility improvements (up to 7 dB), even
in highly reverberant environments.
BLIND SOURCE SEPARATION
Generalizing the approach of extracting a single desired source,
BSS algorithms aim at extracting multiple sources from observed
mixtures without requiring prior knowledge on the positions of
the sources and the microphones, spatiotemporal signal statistics,
or the mixing system. Moreover, they do not need any reference
information on the activity of the sources in the spectrotemporal
domain. On the other hand, they do require knowledge on the
total number of sources and can only separate sources that can be
modeled as point sources. Considering time-varying mixing sys-
tems, we disregard approaches that perform BSS based on learn-
ing from a large amount of data and focus on independent
component analysis (ICA)-based methods that are—similar to
adaptive filtering approaches—suited to time-varying acoustic
scenes [4], [33]–[35].
For the following, we rewrite the STFT signal model in (3) as
,sxhnHsn
p
p
P
p
0
1
=+=+
=
-
/
(21)
describing M noisy observations x of the convolutive mixture of
P point sources .s
p
To obtain estimates of the original sources ,
p
s
a linear demixing/separation system W is applied, consisting of
M P# filters with frequency response ,w
mp
,, ,m M01f=-
,, .pP01f=- The P separated signals ,y
q
stacked in the vec-
tor ,y are then obtained as
.yWxWHsWn
HH H
== + (22)
Known methods for identifying optimum demixing filters W
are based on the assumption that the signals to be separated are
mutually statistically independent and that enforcing statisti-
cally independent outputs
y
q
of the demixing system yields
good estimates of the desired separated source signals .
p
s For
the mostly assumed case where the number of microphones is
larger than or equal to the number of sources
(),M P$ an
appropriate generic cost function ()J , for frame ,, describing
an estimate of the Kullback–Leibler divergence between the
joint probability density function (pdf) of the output signals
y
q
and the desired independent outputs, can be formulated as [4,
ch. 4]:
() ( ,)
((,))
((, ))
,log
K
py
p
1
y
J
,
,
K
yL
q
P
q
00
1
0
1
ICA
PLy
q
,,bm
lm
lm
=
3
ml==
-
=
-
t
t
%
//
(23)
where ((,))py
,yL q
q
lm
t
denotes an estimate for the L-variate pdf
of a segment of length L of the qth output signal ,y
q
and
((, ))p y
,PLy
lm
t
denotes an estimate for the PL-variate joint pdf
for all P output signals. Averaging over K frames accounts for
the nonstationarity of the data, while the windowing function
(, ),
bm describes the weight of a block average at time m for the
cost function at time ,, in a similar way as for recursive least
squares adaptation. Forming gradients of this cost function, or
simplified versions, with respect to the demixing matrix
W
allows for maximization of statistical independency with respect
to individual data frames (online adaptation,
,(,)K 10,
bm==
for ),,!
m as well as for an entire recording (offline adaptation,
,)K 1constant2
b = [35].
It should be noted that using the statistical independence
assumption only, the separation system
W can at best be
obtained up to a linear filtering uncertainty and a permutation
of the outputs, and thus cannot itself identify the inverse mixing
system which would solve the deconvolution problem and per-
fectly dereverberate the source signals [36].
Numerous algorithms have been proposed for ICA of convol-
utive mixtures, which are often categorized as either time-
domain or frequency-domain algorithms. Time-domain
algorithms estimate the demixing system
W as finite impulse
response filters [35], whereas frequency-domain algorithms
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®