Zoom out Search Issue
IEEE SIGNAL PROCESSING MAGAZINE [101] MARCH 2015
original sound scene, as well as the
individual spectral characteristics of
the listener’s ears. In this article, we
mainly consider the most widely used
channel-based audio as the input sig-
nals for the natural sound rendering
system, though some of the signal
processing techniques discussed
could also be used in other audio for-
mats, such as object-based format and
ambisonics [2], [3].
In recent years, the design criteria
for commercial headphones have undergone significant develop-
ment. At Harman International Industries, Olive et al. investigated
the best target responses for designing headphones based on the lis-
tener’s preference for the most natural sound [4]. Creating realistic
surround sound in headphones has become a common pursuit of
many headphone technologies such as Dolby, DTS, etc. Further-
more, a personalized listening experience and incorporation of the
information of listening environment have also been trends in the
headphone industry. These trends in headphones share one com-
mon objective—to render natural sound in headphones.
CHALLENGES
The listening process in humans can generally be considered as a
source-medium-receiver model, as stated by Begault [1]. This
model is used in this article to highlight the differences between
natural listening in a real environment and listening via head-
phones. In natural listening, we listen to the physical sound
sources in a particular acoustic space, with the sound waves
undergoing diffraction, interference with different parts of our
morphology (torso, head, and pinna) before reaching the eardrum.
This information of sound wave propagation can be encapsulated
in spatial digital filters termed head-related transfer functions
(HRTFs) [1]. Listeners also get valuable interaural cues for sound
localization with head movements. However, headphone listening
is inherently different from natural listening as the sources we are
listening to are no longer physical sound sources but are recorded
and edited sound materials. These differences between natural and
headphone listening lead to various challenges in rendering natu-
ral sound over headphones, which can be broadly classified into
categories from the perspectives of source, medium, and receiver,
as described next.
SOURCE
The sound scenes rendered for headphone listening should com-
prise not only the individual sound sources but also the features of
the sound environment. Listeners usually perceive these sound
sources to be directional, i.e., coming from certain directions.
Moreover, in most of the digital media content, the sound environ-
ment is usually perceived by the listener to be diffuse (partially).
This perceptual difference between the sound sources and the
sound environment requires them to be considered separately in
natural sound rendering [2]. Though there are other formats that
can represent the sound scenes (e.g., object based, ambisonics), the
convention for today’s digital media
is still primarily a channel-based for-
mat. Hence, the focus of this article
lies in the rendering of channel-based
audio, where sound source and envi-
ronment signals are mixed in each
channel [2]. In channel-based sig-
nals, where only the sound mixtures
are available (assuming one mixture
in every channel), it is necessary to
extract the source signals and envi-
ronment signals, which can be quite
challenging. Furthermore, most of the traditional recordings are
processed and mixed for optimal playback over loudspeakers
rather than headphones. Direct playback of such recordings over
headphones results in an unnatural listening experience, which is
mainly due to the loss of crosstalk, and localization issues.
MEDIUM
Headphone listening does not satisfy free-air listening conditions
as in natural listening. Since the headphone transfer function
(HPTF) is not flat, equalization of the headphone is necessary.
However, this equalization is tedious and challenging as the head-
phone response is highly dependent on the individual anthropo-
metrical features and also varies with repositioning.
RECEIVER
The omission of listener’s individualized filtering with the outer
ear in headphone listening often leads to coloration and localiza-
tion inaccuracies. These individualized characteristics of the lis-
tener are lost when the sound content is recorded or synthesized
nonindividually, i.e., the subject in the listening is different from
the subject in the recording or synthesis. Furthermore, the sound
in headphone listening is not adapted to the listener’s head move-
ments, which departs from a natural listening experience.
SIGNAL PROCESSING TECHNIQUES
To tackle the aforementioned challenges and enhance natural
sound rendering over headphones, digital signal processing
techniques are commonly used. In Figure 1, we summarize the
differences between natural listening and headphone listening
and introduce the following corresponding signal processing
techniques to tackle these challenges:
■ Virtualization: to match the desired playback for the digi-
tal media content
■ Sound scene decomposition using blind source separation
(BSS) and primary-ambient extraction (PAE): to optimally
facilitate the separate rendering of sound sources and sound
environment
■ Individualization of HRTF: to compensate for the lost or
altered individual filtering of the sound in headphone
listening
■ Equalization: to preserve the original timbral quality of
the source and alleviate the adverse effect of the inherent
headphone response
TO ACHIEVE NATURAL
SOUND RENDERING, THE
VIRTUAL SOUND RENDERED
SHOULD EXACTLY EMULATE
ALL THE SPATIAL CUES OF THE
ORIGINAL SOUND SCENE, AS
WELL AS THE INDIVIDUAL
SPECTRAL CHARACTERISTICS
OF THE LISTENER’S EARS.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®