Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

IEEE SIGNAL PROCESSING MAGAZINE [24] MARCH 2015

MVDR USING RTFs

By constraining the desired component in the output signal to

be equal to the speech component at an arbitrarily chosen refer-

ence microphone

r [24], the constraint in(6) becomes

,yshswh

r00 00

== (11)

which is equivalent to ,1wh

where the RTF h

is defined as

rrr r

gg==

;

(12)

By substituting the ATFs h

with the RTFs h

in (8) and (10),

the modifed MVDR filter is obtained as

.hww

MVDRMVDR

== =

)

(13)

Note that blind identification of RTFs is significantly easier than

blind identification of ATFs. When noise and interference are

absent, this can simply be achieved by dividing the crosspower

spectral densities of the microphone signals. When noise and/or

interference are present, methods exploiting the nonstationarity

of speech or based on the generalized eigenvalue decomposition

have been proposed, e.g., [24] and [25].

GSC

The constrained optimization problem of the MVDR beamformer

in (7) can be transformed into an unconstrained optimization

problem, leading to the highly popular GSC structure [22]–[25],

consisting of three main blocks (see Figure 5): 1) a fixed beam-

former (FB), ensuring the fulfillment of the constraint in (6) or

(11), 2) a blocking matrix (BM), creating so-called noise refer-

ences

and 3) a multichannel interference canceler ,g

minimizing the residual interference and noise in the output of

the FB that is correlated with the noise references. If the target

signal leaks into the noise references due to a mismatched BM

(e.g., caused by RTF estimation errors or by DOA errors, micro-

phone mismatch, and reverberation when using free-field

HRIRs), the target signal will be partially canceled as well. To

mitigate this target signal cancellation, the interference

canceler is typically adapted only during periods when the tar-

get source is inactive; see, e.g., [23]. Moreover, several tech-

niques have been proposed to reduce the speech leakage

components in the noise references, e.g., [24], [25], and/or limit

the distorting effect of the remaining speech leakage [23], [26],

[27], e.g., by imposing a quadratic inequality constraint or by

using the so-called speech-distortion-regularized GSC [27].

APPLICATION IN ALDs

The GSC or one of its more robust variants can be considered as

the current state-of-the-art solution for monaural hearing devices

with an end-fire microphone array configuration, e.g., [28]–[30].

A very popular variant is the adaptive directional microphone

(ADM) [15], [28], [29], where the fixed beamformer and the BM

are differential beamformers forming a front- and back-oriented

cardioid pattern, and an adaptive scalar minimizes the energy

arriving from the back hemisphere. A two-microphone imple-

mentation was indeed shown to achieve a considerable speech

intelligibility improvement for hearing aid users (about 3.4 dB

improvement for three babble noise sources) [29].

MULTICHANNEL WIENER FILTER

The second popular class of multichannel signal enhancement

techniques is associated with the multichannel Wiener filter

(MWF), e.g., [2, ch. 3, 6, 14], [27], [31]. It produces a minimum

mean square error (MMSE) estimate of either the target source [2,

ch. 3], the speech component at an arbitrarily chosen microphone

[2, ch. 6,14], [31], or a reference speech signal [2, ch. 14], [27]. To

trade off speech distortion and noise reduction, the so-called

speech-distortion-weighted MWF was introduced [27], [31].

Similarly to the MVDR using RTFs, the MWF neither

requires a priori information about the microphone configura-

tion nor the position of the target source, making it an appeal-

ing approach from a robustness point of view. On the other

hand, relying on the second-order statistics of the desired and

undesired signal components implies that, for the assumed

nonstationary processes, these statistics must be estimated with

sufficient accuracy at all times; cf. the section “Estimation of

Interference and Noise Statistics.”

MMSE ESTIMATION FOR THE MWF

The MWF aims to extract the target source by minimizing the

mean square error (MSE) between the (unknown) source signal

and the beamformer output, i.e.,

{} { }.argmin argminEs y Eswwx

MWF

=-=-(14)

Assuming the target source and the interfering sources and

noise to be uncorrelated, the solution of (14) is given by

{} ,Eswxh

0MWF

xx xx

zUU==

(15)

requiring the ATFs h

and the target source PSD

z to be

estimated, which is a nontrivial task. However, similarly to the

MVDR using RTFs, we can also design an MWF aiming at

extracting the speech component at an arbitrarily chosen refer-

ence microphone

r by

,argmin Ehswwx

MWF

(16)

which yields

{} .Ehs hwx hh h

MWF

00 00

zzUU==+

(17)

Although it appears that the ATFs and the target source PSD are

required to compute (17), the (rank-1) crosspower spectral

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND