Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

100

IEEE SIGNAL PROCESSING MAGAZINE [94] MARCH 2015

sources, also on the distance of the

source. Spectral cues (SCs), caused

by the reflection or diffraction by

the listener’s torso and pinnae, pro-

vide additional localization informa-

tion. The ITD, ILD, and SCs are

conveniently represented by head-

related transfer functions (HRTFs),

which model the free-field acoustic transfer functions between

the position of the source in space and the listener’s ears [9].

Thus, in its elementary form, binaural synthesis is performed by

filtering a source signal with the respective HRTFs for the left

and right ears.

Adding room reflections and reverberation to binaural syn-

thesis improves the naturalness, source localization properties,

and externalization [6], [8], [10]. Discrete room reflections

improve the perception of both direction and distance of a sound

source. They can be incorporated into binaural synthesis by ren-

dering them as additional virtual sources obtained from a geo-

metric model, e.g., the mirror-image source model. This,

however, significantly increases reproduction complexity. Late

diffuse reverberation mainly contributes to the perception of

distance, because the ratio between

direct and reverberant energy

decreases with increasing source

distance [8]. A recent review paper

explains various artificial reverbera-

tion techniques [11].

In a natural listening environ-

ment, dynamic head movements

contribute significantly to source localization. To exploit these so-

called dynamic binaural cues, a binaural synthesis system for vir-

tual reality listening must provide several functionalities. First,

the position and orientation of the listener’s head have to be

determined continuously, e.g., by using a head tracker. Second,

the synthesis has to be adapted dynamically by updating the

HRTFs according to the relative position of the virtual sources.

Finally, the overall latency of the reproduction system must sat-

isfy perceptual limits to provide the intended localization and a

plausible listener’s experience. The situation becomes more com-

plicated if the spatial audio content is to be broadcasted as the

standards for spatial audio are still under development.

HEAD TRACKING

In many applications, it is essential that the orientation of the

user’s head is known. This is achieved by head-tracking tech-

niques. Once the listener’s orientation is known, it is possible to

use HRTFs to project the sound sources in correct directions. This

is especially important in interactive applications in which the

soundscape should remain stable even when the user turns his/

her head. Without head tracking, the soundscape moves with

respect to the user’s head, breaking the illusion of virtual sources.

Head tracking also helps in reducing reversals in sound source

localization [10].

The same head-tracking techniques can be used both for visual

head-mounted displays and headphones for audio reproduction.

For the visual domain, one of the first head-tracking systems was

presented by Ivan Sutherland in a virtual reality installation in

1968 [12]. This early system utilized mechanical and ultrasound

tracked techniques. Jens Blauert, the pioneer of head tracking for

audio reproduction over headphones, presented in his patent in

1973 several alternative techniques for head tracking, including

the use of mechanical levers as well as magnetic and gyroscopic

control arrangements [13]. Even today, most of those techniques

are utilized in practice as one can buy tracking systems that are

based on electromagnetic, inertial, or computer-vision techniques,

or in their combination, such as in the new Oculus Rift (Crystal

Cove prototype) virtual reality headset.

The category of computer-vision-based head tracking contains

two different approaches. The most common one uses external

infrared light emitters and reflective markers attached to the user.

A less intrusive technique is the use of regular cameras without

any special markers or light source, but it is more challenging to

implement reliably. While both of these techniques can provide

accurate and wireless tracking, they require the line-of-sight to

the user, which is not needed with electromagnetic or inertial

tracking. Smartphones, with their embedded video camera,

Equalizer

Ambient

Noise

Music

Masking

Estimation

Headphone

[FIG3] The unmasking of an audio signal disturbed by ambient

noise can be implemented with adaptive equalization, which

uses the external microphone and knowledge of the

characteristics of the headphones [4].

100 300 1 k 3 k 10 k

Frequency (Hz)

SPL (dB)

Music Ambient Noise

Masking Threshold

Processed Music

[FIG4] An example of the unmasking process in Bark bands,

where the black line is the energy of the music, the green

dash-dotted line is the energy of the ambient noise, the red

dashed line is the estimated masking threshold, and the purple

line is the spectrum of the unmasked music. The vertical

dashed lines show the Bark band edges.

BINAURAL SYNTHESIS IS

PERFORMED BY FILTERING A

SOURCE SIGNAL WITH THE

RESPECTIVE HRTFS FOR THE

LEFT AND RIGHT EARS.

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND