Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [22] MARCH 2015
geometric inference of the source position. The latter class com-
prises cross-correlation-based [8] and cross-relation-based algo-
rithms, e.g., [9] and [10].
The main difference of using these algorithms for ALDs com-
pared to their conventional use results from the fact that the
microphones are typically mounted close to the user’s head.
Therefore, the propagation paths of a point source to the different
microphones can not be simply modeled by the free-field TDOA,
but the filtering effects of the head should be taken into account.
As HRTFs vary between individuals, the results produced by
source localization algorithms will always suffer from some
uncertainty if the individual HRTFs and the microphone topology
are not exactly known. This is especially true for binaural sys-
tems, where the relative microphone positions are user depend-
ent and not fixed. However, useful approximations can be
employed, which are, e.g., based on
spherical head models [11] or meas-
ured HRTFs. The TDOAs for different
source directions based on the free-
field assumption, measured HRTFs,
and typical head models is depicted
in Figure 4. Alternatively, for bin-
aural systems, computational audi-
tory scene analysis (CASA)
algorithms [12] can be used for local-
izing multiple sources, e.g., incorporating a probabilistic model of
the binaural ILD and ITD cues [13].
Given the microphone topology, cross-correlation-based algo-
rithms such as the generalized cross-correlation with phase trans-
form (GCC-PHAT) [8] can be used to localize a single source for
ALDs when the head filtering effects are taken into account. How-
ever, when multiple sound sources are present, identifying the
correct source-specific TDOAs typically becomes very difficult [14].
Generalizations of the GCC, such as SRP-PHAT [2, ch. 8], coher-
ently add up signals originating from a certain point in space to
estimate the source likelihood at this position. While conceptually
suited for an arbitrary number of microphones and sources, they
involve considerable computational complexity for sufficient spa-
tial resolution and are inherently sensitive to reverberation.
More general cross-relation-based algorithms, e.g., [9] and
[10], aim at system identification via cross-relation and are natur-
ally suited for identifying relative head-related impulse responses
(HRIRs) from the source to the different microphones, delivering
TDOA information as long as the direct path can be detected in the
identified relative impulse responses. While the adaptive eigen-
value decomposition method in [9] is able to identify relative
HRIRs only for a single source while exploiting nonstationarity,
the BSS-based method in [10] can robustly localize multiple
sources even in noisy and moderately reverberant environments.
Finally, subspace-based source localization algorithms such
as MUSIC [7] are in principle also suitable for arbitrary numbers
of microphones and sources (assuming the number of sources
is known). As they essentially estimate the source positions
using the eigenvectors corresponding to the largest eigenvalues
of a spatial covariance matrix, the estimates for this covariance
matrix must be sufficiently reliable for every frequency bin.
Since subspace-based algorithms are separating the signal and
noise subspace, where the noise needs to be white or whitened,
this is typically difficult to achieve for wideband nonstationary
sources in time-varying environments where only short obser-
vation intervals can be considered.
DATA-INDEPENDENT BEAMFORMING
A simple but popular way for enhancing the target source in
ALDs is data-independent beamforming, where the filters
w in
(4) are designed to enhance sources arriving from the (estimated
or assumed) target DOA and suppress sources not arriving from
this DOA, but do not account for the statistics of the microphone
signals. Various data-independent beamformers include delay-
and-sum beamformers and superdirective or differential beam-
formers [2, ch. 2], [15]. For the
design of such beamformers, the
target DOA and the complete micro-
phone topology need to be known.
Data-independent beamformers
have mainly been used for monaural
devices [16], where robustness
against microphone mismatch is
crucial due to the closely spaced
microphones [17], [18]. For bin-
aural devices, data-independent beamformers have also been pro-
posed, which, however, suffer from spatial aliasing due to the
distance between the microphones and require consideration of
the head filtering effects, e.g., [19].
STATISTICALLY OPTIMUM SIGNAL EXTRACTION
In contrast to data-independent beamformers, data-dependent sig-
nal enhancement methods exploit both the spectrotemporal as
well as the spatial information of the microphone signals to extract
the target source
s
0
(or a filtered version of it) from all interferers
and noise [20], possibly equalizing the reverberation effect caused
by the ATFs’
.h
0
Since the filters adapt to the current statistics of
the typically nonstationary signals, this will be treated as an opti-
mum multichannel filtering problem in the sequel.
Relying on estimates of either the interference and noise statis-
tics or the target source statistics, two main classes of supervised
optimum multichannel filtering will be discussed in the sections
“Minimum Variance Distortionless Response Beamformer” and
“Multichannel Wiener Filtering.” In addition, BSS algorithms, in
particular the variants exploiting target-related prior information
for constraining the optimization problem to explicitly separate
the target source, will be considered in the section “Blind Source
Separation.” Techniques for estimating the required second-order
statistics will be presented in the section “Estimation of Interfer-
ence and Noise Statistics.”
MINIMUM VARIANCE DISTORTIONLESS
RESPONSE BEAMFORMER
The minimum variance distortionless response (MVDR) beam-
former is a special case of a linearly constrained minimum
THE FUNDAMENTAL
CONCEPT OF ALL CONSIDERED
MULTIMICROPHONE
ALGORITHMS RELIES ON SPATIAL
AND/OR SPECTROTEMPORAL
DIVERSITY.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®