Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [102] MARCH 2015
Head tracking: to adapt to the dynamic head movements
of the listener.
The following sections describe in detail the virtualization and
its interaction with head tracking, sound scene decomposition,
individualization, and equalization. These signal processing tech-
niques are integrated and evaluated using subjective tests.
VIRTUALIZATION
In digital media, sound is typically mixed for loudspeaker playback
rather than headphone playback. The spatial sound to be rendered
naturally over headphones should emulate the natural propagation
of the acoustic waves emanating from the loudspeaker to the ear-
drum of the listener. To emulate stereo or surround sound loud-
speaker rendering over headphones, virtualization techniques based
on HRTFs corresponding to the loudspeaker positions are commonly
used. Given these acoustic transfer functions (i.e., HRTFs), the virtu-
alization technique is applicable to any multichannel loudspeaker
setup, be it stereo, 5.1, 7.1, 22.2, or even loudspeaker arrays in wave-
field synthesis. As shown in Figure 2(a), for every desired loudspeaker
position, the signal in the mth channel
xn
m
^h
is filtered with the
corresponding HRTF ,,hnhn
xmL xmR
^^hh
and summed before
being routed to the left and right ears [1], [5], respectively, as:
,
,
yn h n x n
yn h n x n
LxmLm
m
M
RxmRm
m
M
1
1
)
)
=
=
=
=
^
^
^
^
^
^
h
h
h
h
h
h
/
/
(1)
where * denotes convolution and
M is the total number of chan-
nels. When the HRTFs are directly applied to multichannel loud-
speaker signals, the rendered sound scenes in headphone playback
suffer from inaccurate virtual source directions, lack of depth, and
reduced image width [5], [6].
To solve these problems in virtualization of multichannel loud-
speaker signals and achieve a faithful reproduction of the sound
scenes, the HRTFs should be applied to the individual source sig-
nals that are usually extracted (using BSS and PAE) from the loud-
speaker signals (i.e., mixtures). In this virtualization [as shown in
Figure 2(b)], the sources are rendered directly using the HRTFs of
the corresponding source directions
,hnhn
skL skR
^^hh
,
,
yn h n sn an
yn h n sn an
LskLk
k
K
L
RskRk
k
K
R
1
1
)
)
=+
=+
=
=
^
^
^
^
^
^
^
^
h
h
h
h
h
h
h
h
/
/
(2)
where K is the total number of sources, sn
k
^h
is the kth source
in the multichannel signal, and the environment signals
,anan
LR
^^hh
are the rendered signals representing the sound
environment perceived by two ears. To render the acoustics of the
environment, the environment signals can be either synthesized
according to the sound environment [7] or extracted from the
mixtures. Techniques like decorrelation [5], [8] and artificial rever-
beration [9] are commonly employed to render the environment
signals to create a more diffuse and natural sound environment.
[FIG1] A summary of the differences between natural listening and headphone listening and the corresponding signal processing
techniques to solve these challenges for natural sound rendering. The main challenges and their corresponding signal processing
techniques in each category (source, medium, and receiver) are highlighted and their interactions (not shown here) are further
discussed in the article.
Materials for
Loudspeaker Playback
Recordings/Mixtures
Headphones
Individual Filtering
(Partial Ear)
Nonadapted Head
Movements
Virtualization
Sound Scene
Decomposition
Equalization
Individualization
Head Tracking
Physical Sound Sources
and
Environment
Free Air
Individual Filtering
(Torso, Head, Ear)
Head Movements
Source
Medium
Receiver
Headphone Listening
Signal Processing
Techniques Natural Listening
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®