Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [20] MARCH 2015
(often termed beamf orming) and filtering in the time-frequency
domain, respectively. In addition to exploiting the statistics of
the available observations, the optimum filter design should
also use available prior knowledge, e.g., the estimated or
assumed position of the target source. This implies that, in this
article, blind source separation
(BSS) algorithms [4] are only con-
sidered in forms that allow the
inclusion of such prior knowledge.
Aside from some target-related
knowledge, we assume natural
unpredictable scenarios that may
be arbitrarily complex and time-
varying. This implies that the fil-
ters must be estimated from
currently available observations
and cannot be learned in advance,
thus algorithms that are based on trained models (e.g., using
nonnegative matrix factorization) are not considered in this
article. In addition, in time-varying environments, the estima-
tion of the spatial and spectrotemporal information from short
observation intervals is of crucial importance, so we will focus
on techniques exploiting second-order statistics, keeping the
variance of the estimated quantities small.
SIGNAL MODEL
According to the acoustic scenario in Figure 2, we consider
P
point sources (),st
p
with t as the discrete time index, in a noise
field of unknown coherence, which are recorded by an array of
M microphones. The target source is denoted by ( ).st
0
Assum-
ing the acoustic paths between the sources and the micro-
phones to be linear and time-invariant, the
mth microphone
signal ()xt
m
is given by the convolutive mixing model
() ()* () (), ,xt h t st ntm M1
,mpm
p
P
pm
0
1
f=+=
=
-
/
(1)
where ()nt
m
denotes the noise component in the mth micro-
phone signal, ()ht
,pm
is the room impulse response (RIR)
between the pth source and the mth microphone, and *
denotes convolution. Typically, the signals are processed in the
short-time Fourier transform (STFT) domain, i.e.,
(,) () (,) (,), ,xk h ksk nk m M1
,mpm
p
P
pm
0
1
f,,,=+=
=
-
/
(2)
where (,), (,)xk sk
mp
,, and ( , )nk
m
, denote the STFTs of the
respective time-domain signals, with , representing the frame
index and k representing the fre-
quency bin index, and where ()hk
,pm
denotes the acoustic transfer function
(ATF) between the pth source and
the mth microphone. Note that (2)
is strictly speaking only valid for
frames that are significantly longer
than the RIR length. When this is not
the case, a convolutive transfer func-
tion model should be used. For con-
ciseness, we omit the dependency on
the indices
k and , in the remainder
of this article. In vector form, the equation set(2) can be written as
,ss sxh h n
hv
p
p
P
p00
1
1
00
=+ +=+
=
-
/
(3)
with ,
xx
x
M
T
1
g=
6
@
and n and h
p
defined similarly, and h
0
denoting the ATF of the target source. This signal model will
form the basis for the subsequent description of the main signal
processing tasks with ALDs, i.e., source localization, signal
enhancement, and signal presentation.
SIGNAL ACQUISITION
For ALDs in realistic acoustic environments, the ATFs include
the microphone characteristics, room acoustics, and filtering
effects due to the user’s head. The diffraction and reflection
properties of the user’s head, pinna, and torso are described by
the so-called head-related transfer function (HRTF), which is
the frequency- and angle-dependent transfer function between a
sound source and the user’s ear drum in an anechoic environ-
ment [5]. The pair of left and right HRTFs contain the so-called
binaural cues of a sound source: the interaural time difference
(ITD) and the interaural level difference (ILD), which are result-
ing from the time difference of arrival (TDOA) between both
ears and the acoustic head shadow, respectively. In contrast to
point sources, the spatial characteristics of incoherent noise can
not be properly described by the ITD and ILD, but rather by the
interaural coherence (IC) [5]. Binaural cues play a major role in
spatial awareness, i.e., for source localization and for determin-
ing the spaciousness of auditory objects, and are important for
speech intelligibility due to binaural unmasking, e.g., [5].
For capturing the relevant spatial information and binaural
cues of the sound sources, in principle, at least two micro-
phones are required, which are preferably mounted on both
sides of the head. Ideally, the microphones are placed as close as
possible to the corresponding loudspeakers that present the sig-
nals to the ear drums to allow the recreation of the authentic
spatial impression for the listener. In typical ALDs today, two or
three microphones are available on each side of the head, with
x
1
(k, )
x
2
(k, )
y (k,
)
x
M
(k, )
w
1
(k, )
w
2
(k, )
w
M
(k, )
. . .
Σ
[FIG3]
The filter-and-sum structure.
FOR HEARING-IMPAIRED
INDIVIDUALS, AS THE MOST
PROMINENT USER GROUP SO FAR,
FURTHER PROGRESS OF ASSISTED
LISTENING TECHNOLOGY IS
CRUCIAL FOR BETTER INCLUSION
INTO OUR WORLD OF PERVASIVE
ACOUSTIC COMMUNICATION.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®