Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

IEEE SIGNAL PROCESSING MAGAZINE [25] MARCH 2015

density matrix hh

can be estimated from the second-

order statistics of the microphone signals; cf. the section “Esti-

mation of Interference and Noise Statistics.”

SPEECH-DISTORTION-WEIGHTED MWF

The MMSE criterion in (16) can be easily generalized to allow

for a tradeoff between noise reduction and speech distortion

[27], [31] by introducing a weighting factor

[, ]:0 3!

,argmin Ehs s Ewwhwv

00 00

SDW

n=-+

(18)

which is referred to as the speech-distortion-weighted MWF

(SDW-MWF). The solution of (18) is given by

.hwhh h

SDW vv

00 00

znzU=+

(19)

The smaller the factor

n is chosen, the smaller the resulting

speech distortion. If ,1

n = the MMSE criterion (16) is

obtained. If ,12

n the residual noise level will be reduced at

the expense of increased speech distortion.

RELATIONSHIP BETWEEN MWF AND MVDR

It is interesting to note that the MWF can be decomposed as an

MVDR beamformer, exploiting the spatial information of the

target and interfering sources, followed by a single-channel

Wiener filter (SWF) [2, ch. 3], [32], i.e.,

yy yy

SDW

SDW SWFpostfilter

MVDR beamformer

ss vv

znz

123444 444

12344 44

(20)

where

z and

z denote the PSDs of the desired and unde-

sired components at the output of the MVDR beamformer

MVDR

using RTFs.

APPLICATION IN ALDs

In [1], a three-microphone MWF implementation for a monaural

hearing device was evaluated at different test sites and compared

with other single- and multimicrophone noise reduction tech-

niques. In this study it was shown that overall the MWF achieved

the largest speech intelligibility improvements (up to 7 dB), even

in highly reverberant environments.

BLIND SOURCE SEPARATION

Generalizing the approach of extracting a single desired source,

BSS algorithms aim at extracting multiple sources from observed

mixtures without requiring prior knowledge on the positions of

the sources and the microphones, spatiotemporal signal statistics,

or the mixing system. Moreover, they do not need any reference

information on the activity of the sources in the spectrotemporal

domain. On the other hand, they do require knowledge on the

total number of sources and can only separate sources that can be

modeled as point sources. Considering time-varying mixing sys-

tems, we disregard approaches that perform BSS based on learn-

ing from a large amount of data and focus on independent

component analysis (ICA)-based methods that are—similar to

adaptive filtering approaches—suited to time-varying acoustic

scenes [4], [33]–[35].

For the following, we rewrite the STFT signal model in (3) as

,sxhnHsn

=+=+

(21)

describing M noisy observations x of the convolutive mixture of

P point sources .s

To obtain estimates of the original sources ,

a linear demixing/separation system W is applied, consisting of

M P# filters with frequency response ,w

,, ,m M01f=-

,, .pP01f=- The P separated signals ,y

stacked in the vec-

tor ,y are then obtained as

.yWxWHsWn

HH H

== + (22)

Known methods for identifying optimum demixing filters W

are based on the assumption that the signals to be separated are

mutually statistically independent and that enforcing statisti-

cally independent outputs

of the demixing system yields

good estimates of the desired separated source signals .

s For

the mostly assumed case where the number of microphones is

larger than or equal to the number of sources

(),M P$ an

appropriate generic cost function ()J , for frame ,, describing

an estimate of the Kullback–Leibler divergence between the

joint probability density function (pdf) of the output signals

and the desired independent outputs, can be formulated as [4,

ch. 4]:

() ( ,)

((,))

((, ))

,log

ICA

PLy

,,bm

ml==

(23)

where ((,))py

,yL q

denotes an estimate for the L-variate pdf

of a segment of length L of the qth output signal ,y

and

((, ))p y

,PLy

denotes an estimate for the PL-variate joint pdf

for all P output signals. Averaging over K frames accounts for

the nonstationarity of the data, while the windowing function

(, ),

bm describes the weight of a block average at time m for the

cost function at time ,, in a similar way as for recursive least

squares adaptation. Forming gradients of this cost function, or

simplified versions, with respect to the demixing matrix

allows for maximization of statistical independency with respect

to individual data frames (online adaptation,

,(,)K 10,

bm==

for ),,!

m as well as for an entire recording (offline adaptation,

,)K 1constant2

b = [35].

It should be noted that using the statistical independence

assumption only, the separation system

W can at best be

obtained up to a linear filtering uncertainty and a permutation

of the outputs, and thus cannot itself identify the inverse mixing

system which would solve the deconvolution problem and per-

fectly dereverberate the source signals [36].

Numerous algorithms have been proposed for ICA of convol-

utive mixtures, which are often categorized as either time-

domain or frequency-domain algorithms. Time-domain

algorithms estimate the demixing system

W as finite impulse

response filters [35], whereas frequency-domain algorithms

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND