Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [106] MARCH 2015
ambient components also contribute significantly to the natu-
ralness and immersiveness of the sound scenes. Subjective
experiments revealed that BSS- and PAE-based headphone ren-
dering can improve the externalization and enlarge the sound
stage with minimal coloration [6].
Despite the recent advances in BSS and PAE, the chal-
lenges due to the complexity and uncertainty of the sound
scenes still remain to be resolved. One common challenge in
both BSS and PAE is the increasing number of audio sources
in the sound scenes, while only a limited number of mixtures
(i.e., channels) are available. In certain time-frequency repre-
sentations, the sparse solutions in BSS and PAE would require
the sources to be sparse and disjoint [15]. Considering the
diversity of audio signals, finding a robust sparse representa-
tion for different types of audio signals is extremely difficult.
The recorded or postprocessed source signals might even be
filtered due to physical or equivalently simulated propagation
and reflections. Moreover, the audio signals coming from
adverse environmental conditions (including reverberation
and strong ambient sound) usually degrade the performance
of the decomposition. These difficulties can be addressed by
studying the features of the resulting signals and by obtaining
more prior information on the sources, the sound environ-
ment, the mixing process [18], and combining auditory with
visual information of the scene.
INDIVIDUALIZATION OF HRTF
Binaural technology is the most promising solution for delivering
spatial audio in headphones, as it is the closest to natural listening.
Unlike conventional microphone recordings, which are meant for
loudspeaker playback, the binaural signals are recorded or synthe-
sized at the ears of the listener. In a binaural audio system, the
spatial encoding (i.e., HRTFs) should encapsulate all the spectral
features due to the interaction of the acoustic wave with the listen-
er’s morphology (torso, head, and pinna). The pinna, which is also
considered as the acoustic fingerprint, embeds the most idiosyn-
cratic spectral features into HRTFs, which are essential for accu-
rate perception of the sound [Figure 3(a)]. Thus, the HRTF
features of the individuals are extremely unique, as shown in
Figure 3(c). Often the HRTFs used for virtualization are nonindi-
vidualized HRTFs, typically measured on a dummy’s head, since
they are easily accessible.
However, the use of nonindividualized HRTFs leads to several
artefacts like IHL, localization inaccuracies in perceiving eleva-
tion, and front–back, up–down reversals. Additionally, subjects dis-
play poor angular resolution and sometimes find it difficult to
pinpoint the exact location of the auditory image in the case of
using nonindividualized HRTFs. Thus, individualization of the
HRTFs [Figure 3(b)] plays a critical role to create an immersive
experience closest to the natural listening experience. There are
various individualization techniques to obtain the individualized
HRTFs from acoustical measurements, anthropometric features of
the listener, customizing generic HRTFs with perceptual feedback
or frontal projection of sound, as summarized in Table 3.
ACOUSTICAL MEASUREMENTS
The most straightforward individualization technique is to actu-
ally measure the individualized HRTFs for every listener at differ-
ent sound positions [25], [26]. This is the most ideal solution but it
is extremely tedious and involves highly precise measurements.
These measurements also require the subjects to remain motion-
less for long periods, which may cause the subjects fatigue. Zotkin
et al. developed a fast HRTF measurement system using the tech-
nique of reciprocity, where a microspeaker is placed into the ear
and several microphones are placed around the listener [13].
Other researchers developed a continuous 3-D azimuth acquisi-
tion system to measure the HRTFs using a multichannel adap-
tive filtering technique [27]. However, all these techniques to
acoustically measure the individual HRTFs require a large
amount of resources and expensive setups.
ANTHROPOMETRIC DATA
Individualized HRTFs can also be modeled as weighted sums of
basis functions, which can be performed either in the frequency
or spatial domain. The basis functions are usually common to
all individuals and the individualization information is often
conveyed by the weights. The HRTFs are essentially expressed as
weighted sums of a set of eigenvectors, which can be derived
from PCA or ICA [26], [13]. The individual weights are derived
from the anthropometric parameters that are captured by opti-
cal descriptors, which can be derived from direct measure-
ments, pictures, or a 3-D mesh of the morphology [13]. The
solution to the problem of diffraction of an acoustic wave with
the listener’s body results in individual HRTFs. This solution
[TABLE 2] COMPARISON BETWEEN BSS AND PAE IN SOUND
SCENE DECOMPOSITION.
BSS PAE
OBJECTIVE TO OBTAIN USEFUL INFORMATION ABOUT THE
ORIGINAL SOUND SCENE FROM GIVEN MIXTURES
AND FACILITATE NATURAL SOUND RENDERING.
COMMON
CHARACTERISTICS
USUALLY NO PRIOR INFORMATION, ONLY MIXTURES
BASED ON CERTAIN SIGNAL MODELS
REQUIRE OBJECTIVE AS WELL AS SUBJECTIVE
EVALUATION
BASIC MIXING
MODEL
SUMS OF MULTIPLE
SOURCES (INDEPENDENT,
NON-GAUSSIAN, ETC.)
PRIMARY COMPONENTS
(HIGHLY CORRELATED) AND
AMBIENT COMPONENTS
(UNCORRELATED)
TECHNIQUES ICA [14], SPARSE
SOLUTIONS [15],
TIME-FREQUENCY
MASKING [16], NMF
[17], [18], CASA [19], ETC.
PCA [20], LS [8], [21],
TIME-FREQUENCY MASKING
[7], [20], TIME/PHASE-
SHIFTING [22], [23], ETC.
TYPICAL
APPLICATIONS
SPEECH, MUSIC MOVIE, GAMING
RELATED
APPLICATIONS
SPEECH ENHANCEMENT,
NOISE REDUCTION,
SPEECH RECOGNITION,
MUSIC CLASSIFICATION
SOUND REPRODUCTION,
SOUND LOCALIZATION,
CODING
LIMITATIONS SMALL NUMBER OF
SOURCES
SPARSENESS/DISJOINT
NO/SIMPLE
ENVIRONMENT
SMALL NUMBER OF
SOURCES
SPARSENESS/DISJOINT
LOW AMBIENT POWER
PRIMARY AMBIENT
COMPONENTS UNCORRE-
LATED
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®