Zoom out Search Issue
IEEE SIGNAL PROCESSING MAGAZINE [21] MARCH 2015
spacings ranging from 7 mm to 15 mm. Since the positions of
the microphones do not coincide with the ear drum, and the
acoustic path between the loudspeaker and the ear drum differs
from the HRTF, the overall response of the device should be
equalized to match the open-ear HRTF [3].
SOURCE LOCALIZATION
The objective of source localization is to estimate the position or
the direction of arrival (DOA) of the target source (and possibly the
interfering sources), be it for supporting signal extraction or for
furnishing signal presentation algorithms with spatial
information.
SIGNAL EXTRACTION
The main task is to extract from the given recordings an undis-
torted version of the target source while all undesired compo-
nents are suppressed. Two generic approaches can be used to
achieve this:
■ One can aim at separating all point sources and then pick
the target source based on additional knowledge.
■ One can directly use the additional knowledge to extract
the target source only.
Intuitively, the second approach promises a lower overall algorith-
mic complexity for a desired performance, as it essentially requires
only to separate the target source from all other sources, and obvi-
ously avoids the complexity of estimating the potentially large
number of irrelevant sources in a given acoustic scene. In addi-
tion, the first approach may be limited to setups where the num-
ber of microphones is larger than the number of point sources.
Signal extraction is typically achieved using a filter-and-sum
structure, depicted in Figure 3, where each microphone signal
x
m
is passed through a linear filter w
*
m
and the outputs are summed.
The output signal y is then given in the STFT domain by
ywxwx
*
m
m
M
m
H
1
==
=
/
(4)
with .ww ww
** *
H
M12
f=
6
@
The time-domain output signal
may then be computed using the inverse STFT.
While, in principle, additional knowledge may describe source
characteristics in both the time-frequency domain or the spatial
domain, in this article we will mainly consider additional know-
ledge in the spatial domain, assuming that the sources are physic-
ally located at different positions. Typical prior spatial knowledge is
then given by, e.g., the estimated or assumed DOA of the target
source relative to the head. With this spatial information, we can
support signal extraction algorithms, e.g., a beamformer pointing
toward a given DOA or BSS algorithm exploiting the target DOA.
These algorithms will be covered in more detail in the sections
“Data-Independent Beamforming” and “Statistically Optimum Sig-
nal Extraction.”
SIGNAL PRESENTATION
After extracting the target source, the enhanced signal is to be
presented to the listener, where we need to distinguish between
monaural and binaural systems. For a monaural ALD, i.e., a sin-
gle device on one ear, it seems obvious to just feed the enhanced
signal to the loudspeaker of this device. For a binaural ALD, i.e.,
a system jointly considering and processing the microphone
signals of both ears, different signals can be presented to the left
and the right ear. This can generate an important binaural
advantage since the auditory system can exploit binaural cues
and the signal processing algorithms can use information from
all microphones on both devices [6, ch. 14]. On the other hand,
in a bilateral system where both devices work independently,
this potential is not fully exploited since not all microphone sig-
nals from both devices are combined. To exploit the full poten-
tial of binaural processing, both devices need to cooperate with
each other and exchange information or signals, e.g., through a
wireless link.
Besides signal extraction, a second major task should be
achieved in binaural ALDs: the auditory impression of the acoustic
scene, i.e., the spatial perception of the target source, the residual
interfering sources and noise, should be preserved. This can be
achieved either by so-called binaural rendering of the monaural
output signal of the signal extraction algorithm, or by directly
incorporating the desired binaural cues into the spatial filter
design. These algorithms will be covered in more detail in the sec-
tion “Presentation of the Enhanced Signals.”
SOURCE LOCALIZATION
In principle, any source localization algorithm that can handle
multiple nonstationary wideband sources can be used for ALDs [6,
ch. 6]. This includes direct methods based on steered-response
power (SRP) [2, ch. 8] or subspace methods [Multiple Signal Clas-
sification (MUSIC)] [7] and the large and popular class of indirect
two-step methods based on TDOA estimation and a subsequent
0 20 40 60 80 100 120 140 160 180
0
125
250
375
500
625
750
ϕ (°)
τ (μs)
Free-Field
Head Model (Woodworth Schlosberg)
Head Model (Duda, Martens)
Measured HRTF
[FIG4] TDOAs for different azimuthal directions
i
(0° = front,
180° = back) based on free-field assumption, measured HRTFs
and two head models, respectively.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®