Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [94] MARCH 2015
sources, also on the distance of the
source. Spectral cues (SCs), caused
by the reflection or diffraction by
the listener’s torso and pinnae, pro-
vide additional localization informa-
tion. The ITD, ILD, and SCs are
conveniently represented by head-
related transfer functions (HRTFs),
which model the free-field acoustic transfer functions between
the position of the source in space and the listener’s ears [9].
Thus, in its elementary form, binaural synthesis is performed by
filtering a source signal with the respective HRTFs for the left
and right ears.
Adding room reflections and reverberation to binaural syn-
thesis improves the naturalness, source localization properties,
and externalization [6], [8], [10]. Discrete room reflections
improve the perception of both direction and distance of a sound
source. They can be incorporated into binaural synthesis by ren-
dering them as additional virtual sources obtained from a geo-
metric model, e.g., the mirror-image source model. This,
however, significantly increases reproduction complexity. Late
diffuse reverberation mainly contributes to the perception of
distance, because the ratio between
direct and reverberant energy
decreases with increasing source
distance [8]. A recent review paper
explains various artificial reverbera-
tion techniques [11].
In a natural listening environ-
ment, dynamic head movements
contribute significantly to source localization. To exploit these so-
called dynamic binaural cues, a binaural synthesis system for vir-
tual reality listening must provide several functionalities. First,
the position and orientation of the listener’s head have to be
determined continuously, e.g., by using a head tracker. Second,
the synthesis has to be adapted dynamically by updating the
HRTFs according to the relative position of the virtual sources.
Finally, the overall latency of the reproduction system must sat-
isfy perceptual limits to provide the intended localization and a
plausible listener’s experience. The situation becomes more com-
plicated if the spatial audio content is to be broadcasted as the
standards for spatial audio are still under development.
HEAD TRACKING
In many applications, it is essential that the orientation of the
user’s head is known. This is achieved by head-tracking tech-
niques. Once the listener’s orientation is known, it is possible to
use HRTFs to project the sound sources in correct directions. This
is especially important in interactive applications in which the
soundscape should remain stable even when the user turns his/
her head. Without head tracking, the soundscape moves with
respect to the user’s head, breaking the illusion of virtual sources.
Head tracking also helps in reducing reversals in sound source
localization [10].
The same head-tracking techniques can be used both for visual
head-mounted displays and headphones for audio reproduction.
For the visual domain, one of the first head-tracking systems was
presented by Ivan Sutherland in a virtual reality installation in
1968 [12]. This early system utilized mechanical and ultrasound
tracked techniques. Jens Blauert, the pioneer of head tracking for
audio reproduction over headphones, presented in his patent in
1973 several alternative techniques for head tracking, including
the use of mechanical levers as well as magnetic and gyroscopic
control arrangements [13]. Even today, most of those techniques
are utilized in practice as one can buy tracking systems that are
based on electromagnetic, inertial, or computer-vision techniques,
or in their combination, such as in the new Oculus Rift (Crystal
Cove prototype) virtual reality headset.
The category of computer-vision-based head tracking contains
two different approaches. The most common one uses external
infrared light emitters and reflective markers attached to the user.
A less intrusive technique is the use of regular cameras without
any special markers or light source, but it is more challenging to
implement reliably. While both of these techniques can provide
accurate and wireless tracking, they require the line-of-sight to
the user, which is not needed with electromagnetic or inertial
tracking. Smartphones, with their embedded video camera,
Equalizer
Ambient
Noise
Music
Masking
Estimation
Headphone
[FIG3] The unmasking of an audio signal disturbed by ambient
noise can be implemented with adaptive equalization, which
uses the external microphone and knowledge of the
characteristics of the headphones [4].
100 300 1 k 3 k 10 k
0
20
40
60
80
Frequency (Hz)
SPL (dB)
Music Ambient Noise
Masking Threshold
Processed Music
[FIG4] An example of the unmasking process in Bark bands,
where the black line is the energy of the music, the green
dash-dotted line is the energy of the ambient noise, the red
dashed line is the estimated masking threshold, and the purple
line is the spectrum of the unmasked music. The vertical
dashed lines show the Bark band edges.
BINAURAL SYNTHESIS IS
PERFORMED BY FILTERING A
SOURCE SIGNAL WITH THE
RESPECTIVE HRTFS FOR THE
LEFT AND RIGHT EARS.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®