Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

IEEE SIGNAL PROCESSING MAGAZINE [21] MARCH 2015

spacings ranging from 7 mm to 15 mm. Since the positions of

the microphones do not coincide with the ear drum, and the

acoustic path between the loudspeaker and the ear drum differs

from the HRTF, the overall response of the device should be

equalized to match the open-ear HRTF [3].

SOURCE LOCALIZATION

The objective of source localization is to estimate the position or

the direction of arrival (DOA) of the target source (and possibly the

interfering sources), be it for supporting signal extraction or for

furnishing signal presentation algorithms with spatial

information.

SIGNAL EXTRACTION

The main task is to extract from the given recordings an undis-

torted version of the target source while all undesired compo-

nents are suppressed. Two generic approaches can be used to

achieve this:

■ One can aim at separating all point sources and then pick

the target source based on additional knowledge.

■ One can directly use the additional knowledge to extract

the target source only.

Intuitively, the second approach promises a lower overall algorith-

mic complexity for a desired performance, as it essentially requires

only to separate the target source from all other sources, and obvi-

ously avoids the complexity of estimating the potentially large

number of irrelevant sources in a given acoustic scene. In addi-

tion, the first approach may be limited to setups where the num-

ber of microphones is larger than the number of point sources.

Signal extraction is typically achieved using a filter-and-sum

structure, depicted in Figure 3, where each microphone signal

is passed through a linear filter w

and the outputs are summed.

The output signal y is then given in the STFT domain by

ywxwx

(4)

with .ww ww

** *

M12

The time-domain output signal

may then be computed using the inverse STFT.

While, in principle, additional knowledge may describe source

characteristics in both the time-frequency domain or the spatial

domain, in this article we will mainly consider additional know-

ledge in the spatial domain, assuming that the sources are physic-

ally located at different positions. Typical prior spatial knowledge is

then given by, e.g., the estimated or assumed DOA of the target

source relative to the head. With this spatial information, we can

support signal extraction algorithms, e.g., a beamformer pointing

toward a given DOA or BSS algorithm exploiting the target DOA.

These algorithms will be covered in more detail in the sections

“Data-Independent Beamforming” and “Statistically Optimum Sig-

nal Extraction.”

SIGNAL PRESENTATION

After extracting the target source, the enhanced signal is to be

presented to the listener, where we need to distinguish between

monaural and binaural systems. For a monaural ALD, i.e., a sin-

gle device on one ear, it seems obvious to just feed the enhanced

signal to the loudspeaker of this device. For a binaural ALD, i.e.,

a system jointly considering and processing the microphone

signals of both ears, different signals can be presented to the left

and the right ear. This can generate an important binaural

advantage since the auditory system can exploit binaural cues

and the signal processing algorithms can use information from

all microphones on both devices [6, ch. 14]. On the other hand,

in a bilateral system where both devices work independently,

this potential is not fully exploited since not all microphone sig-

nals from both devices are combined. To exploit the full poten-

tial of binaural processing, both devices need to cooperate with

each other and exchange information or signals, e.g., through a

wireless link.

Besides signal extraction, a second major task should be

achieved in binaural ALDs: the auditory impression of the acoustic

scene, i.e., the spatial perception of the target source, the residual

interfering sources and noise, should be preserved. This can be

achieved either by so-called binaural rendering of the monaural

output signal of the signal extraction algorithm, or by directly

incorporating the desired binaural cues into the spatial filter

design. These algorithms will be covered in more detail in the sec-

tion “Presentation of the Enhanced Signals.”

SOURCE LOCALIZATION

In principle, any source localization algorithm that can handle

multiple nonstationary wideband sources can be used for ALDs [6,

ch. 6]. This includes direct methods based on steered-response

power (SRP) [2, ch. 8] or subspace methods [Multiple Signal Clas-

sification (MUSIC)] [7] and the large and popular class of indirect

two-step methods based on TDOA estimation and a subsequent

0 20 40 60 80 100 120 140 160 180

125

250

375

500

625

750

ϕ (°)

τ (μs)

Free-Field

Head Model (Woodworth Schlosberg)

Head Model (Duda, Martens)

Measured HRTF

[FIG4] TDOAs for different azimuthal directions

(0° = front,

180° = back) based on free-field assumption, measured HRTFs

and two head models, respectively.

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND