Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

IEEE SIGNAL PROCESSING MAGAZINE [22] MARCH 2015

geometric inference of the source position. The latter class com-

prises cross-correlation-based [8] and cross-relation-based algo-

rithms, e.g., [9] and [10].

The main difference of using these algorithms for ALDs com-

pared to their conventional use results from the fact that the

microphones are typically mounted close to the user’s head.

Therefore, the propagation paths of a point source to the different

microphones can not be simply modeled by the free-field TDOA,

but the filtering effects of the head should be taken into account.

As HRTFs vary between individuals, the results produced by

source localization algorithms will always suffer from some

uncertainty if the individual HRTFs and the microphone topology

are not exactly known. This is especially true for binaural sys-

tems, where the relative microphone positions are user depend-

ent and not fixed. However, useful approximations can be

employed, which are, e.g., based on

spherical head models [11] or meas-

ured HRTFs. The TDOAs for different

source directions based on the free-

field assumption, measured HRTFs,

and typical head models is depicted

in Figure 4. Alternatively, for bin-

aural systems, computational audi-

tory scene analysis (CASA)

algorithms [12] can be used for local-

izing multiple sources, e.g., incorporating a probabilistic model of

the binaural ILD and ITD cues [13].

Given the microphone topology, cross-correlation-based algo-

rithms such as the generalized cross-correlation with phase trans-

form (GCC-PHAT) [8] can be used to localize a single source for

ALDs when the head filtering effects are taken into account. How-

ever, when multiple sound sources are present, identifying the

correct source-specific TDOAs typically becomes very difficult [14].

Generalizations of the GCC, such as SRP-PHAT [2, ch. 8], coher-

ently add up signals originating from a certain point in space to

estimate the source likelihood at this position. While conceptually

suited for an arbitrary number of microphones and sources, they

involve considerable computational complexity for sufficient spa-

tial resolution and are inherently sensitive to reverberation.

More general cross-relation-based algorithms, e.g., [9] and

[10], aim at system identification via cross-relation and are natur-

ally suited for identifying relative head-related impulse responses

(HRIRs) from the source to the different microphones, delivering

TDOA information as long as the direct path can be detected in the

identified relative impulse responses. While the adaptive eigen-

value decomposition method in [9] is able to identify relative

HRIRs only for a single source while exploiting nonstationarity,

the BSS-based method in [10] can robustly localize multiple

sources even in noisy and moderately reverberant environments.

Finally, subspace-based source localization algorithms such

as MUSIC [7] are in principle also suitable for arbitrary numbers

of microphones and sources (assuming the number of sources

is known). As they essentially estimate the source positions

using the eigenvectors corresponding to the largest eigenvalues

of a spatial covariance matrix, the estimates for this covariance

matrix must be sufficiently reliable for every frequency bin.

Since subspace-based algorithms are separating the signal and

noise subspace, where the noise needs to be white or whitened,

this is typically difficult to achieve for wideband nonstationary

sources in time-varying environments where only short obser-

vation intervals can be considered.

DATA-INDEPENDENT BEAMFORMING

A simple but popular way for enhancing the target source in

ALDs is data-independent beamforming, where the filters

w in

(4) are designed to enhance sources arriving from the (estimated

or assumed) target DOA and suppress sources not arriving from

this DOA, but do not account for the statistics of the microphone

signals. Various data-independent beamformers include delay-

and-sum beamformers and superdirective or differential beam-

formers [2, ch. 2], [15]. For the

design of such beamformers, the

target DOA and the complete micro-

phone topology need to be known.

Data-independent beamformers

have mainly been used for monaural

devices [16], where robustness

against microphone mismatch is

crucial due to the closely spaced

microphones [17], [18]. For bin-

aural devices, data-independent beamformers have also been pro-

posed, which, however, suffer from spatial aliasing due to the

distance between the microphones and require consideration of

the head filtering effects, e.g., [19].

STATISTICALLY OPTIMUM SIGNAL EXTRACTION

In contrast to data-independent beamformers, data-dependent sig-

nal enhancement methods exploit both the spectrotemporal as

well as the spatial information of the microphone signals to extract

the target source

(or a filtered version of it) from all interferers

and noise [20], possibly equalizing the reverberation effect caused

by the ATFs’

Since the filters adapt to the current statistics of

the typically nonstationary signals, this will be treated as an opti-

mum multichannel filtering problem in the sequel.

Relying on estimates of either the interference and noise statis-

tics or the target source statistics, two main classes of supervised

optimum multichannel filtering will be discussed in the sections

“Minimum Variance Distortionless Response Beamformer” and

“Multichannel Wiener Filtering.” In addition, BSS algorithms, in

particular the variants exploiting target-related prior information

for constraining the optimization problem to explicitly separate

the target source, will be considered in the section “Blind Source

Separation.” Techniques for estimating the required second-order

statistics will be presented in the section “Estimation of Interfer-

ence and Noise Statistics.”

MINIMUM VARIANCE DISTORTIONLESS

RESPONSE BEAMFORMER

The minimum variance distortionless response (MVDR) beam-

former is a special case of a linearly constrained minimum

THE FUNDAMENTAL

CONCEPT OF ALL CONSIDERED

MULTIMICROPHONE

ALGORITHMS RELIES ON SPATIAL

AND/OR SPECTROTEMPORAL

DIVERSITY.

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND