Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

IEEE SIGNAL PROCESSING MAGAZINE [20] MARCH 2015

(often termed beamf orming) and filtering in the time-frequency

domain, respectively. In addition to exploiting the statistics of

the available observations, the optimum filter design should

also use available prior knowledge, e.g., the estimated or

assumed position of the target source. This implies that, in this

article, blind source separation

(BSS) algorithms [4] are only con-

sidered in forms that allow the

inclusion of such prior knowledge.

Aside from some target-related

knowledge, we assume natural

unpredictable scenarios that may

be arbitrarily complex and time-

varying. This implies that the fil-

ters must be estimated from

currently available observations

and cannot be learned in advance,

thus algorithms that are based on trained models (e.g., using

nonnegative matrix factorization) are not considered in this

article. In addition, in time-varying environments, the estima-

tion of the spatial and spectrotemporal information from short

observation intervals is of crucial importance, so we will focus

on techniques exploiting second-order statistics, keeping the

variance of the estimated quantities small.

SIGNAL MODEL

According to the acoustic scenario in Figure 2, we consider

point sources (),st

with t as the discrete time index, in a noise

field of unknown coherence, which are recorded by an array of

M microphones. The target source is denoted by ( ).st

Assum-

ing the acoustic paths between the sources and the micro-

phones to be linear and time-invariant, the

mth microphone

signal ()xt

is given by the convolutive mixing model

() ()* () (), ,xt h t st ntm M1

,mpm

f=+=

(1)

where ()nt

denotes the noise component in the mth micro-

phone signal, ()ht

,pm

is the room impulse response (RIR)

between the pth source and the mth microphone, and *

denotes convolution. Typically, the signals are processed in the

short-time Fourier transform (STFT) domain, i.e.,

(,) () (,) (,), ,xk h ksk nk m M1

,mpm

f,,,=+=

(2)

where (,), (,)xk sk

,, and ( , )nk

, denote the STFTs of the

respective time-domain signals, with , representing the frame

index and k representing the fre-

quency bin index, and where ()hk

,pm

denotes the acoustic transfer function

(ATF) between the pth source and

the mth microphone. Note that (2)

is strictly speaking only valid for

frames that are significantly longer

than the RIR length. When this is not

the case, a convolutive transfer func-

tion model should be used. For con-

ciseness, we omit the dependency on

the indices

k and , in the remainder

of this article. In vector form, the equation set(2) can be written as

,ss sxh h n

p00

=+ +=+

(3)

with ,

and n and h

defined similarly, and h

denoting the ATF of the target source. This signal model will

form the basis for the subsequent description of the main signal

processing tasks with ALDs, i.e., source localization, signal

enhancement, and signal presentation.

SIGNAL ACQUISITION

For ALDs in realistic acoustic environments, the ATFs include

the microphone characteristics, room acoustics, and filtering

effects due to the user’s head. The diffraction and reflection

properties of the user’s head, pinna, and torso are described by

the so-called head-related transfer function (HRTF), which is

the frequency- and angle-dependent transfer function between a

sound source and the user’s ear drum in an anechoic environ-

ment [5]. The pair of left and right HRTFs contain the so-called

binaural cues of a sound source: the interaural time difference

(ITD) and the interaural level difference (ILD), which are result-

ing from the time difference of arrival (TDOA) between both

ears and the acoustic head shadow, respectively. In contrast to

point sources, the spatial characteristics of incoherent noise can

not be properly described by the ITD and ILD, but rather by the

interaural coherence (IC) [5]. Binaural cues play a major role in

spatial awareness, i.e., for source localization and for determin-

ing the spaciousness of auditory objects, and are important for

speech intelligibility due to binaural unmasking, e.g., [5].

For capturing the relevant spatial information and binaural

cues of the sound sources, in principle, at least two micro-

phones are required, which are preferably mounted on both

sides of the head. Ideally, the microphones are placed as close as

possible to the corresponding loudspeakers that present the sig-

nals to the ear drums to allow the recreation of the authentic

spatial impression for the listener. In typical ALDs today, two or

three microphones are available on each side of the head, with

(k, )

y (k,

)

(k, )

∗

(k, )

∗

(k, )

∗

. . .

[FIG3]

The filter-and-sum structure.

FOR HEARING-IMPAIRED

INDIVIDUALS, AS THE MOST

PROMINENT USER GROUP SO FAR,

FURTHER PROGRESS OF ASSISTED

LISTENING TECHNOLOGY IS

CRUCIAL FOR BETTER INCLUSION

INTO OUR WORLD OF PERVASIVE

ACOUSTIC COMMUNICATION.

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND