Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [95] MARCH 2015
provide the required hardware for computer-vision-based head
tracking for mobile applications.
DYNAMIC BINAURAL SYNTHESIS
HRTFs can be obtained by measuring the acoustic path from a
point in space to the ear entrances of a test subject or dummy
head. Alternatively, HRTFs of an individual can be estimated from
anthropometric data or by using listening tests. These methods
are reviewed in the article “Natural Sound Rendering for Head-
phones” on page 98 of this special issue of IEEE Signal Process-
ing Magazine [14]. Existing and upcoming standards for
broadcasting spatial audio, e.g., MPEG Spatial Audio Object Cod-
ing or MPEG-H 3D Audio address the need for suitable HRTFs by
either providing interfaces to supply individualized data or by
transmitting predefined data sets in the encoded bit stream.
HRTF measurements are typically performed at discrete loca-
tions on a spherical grid centered around the test subject’s head.
Using such measurements directly for binaural synthesis would
impose the same discrete grid on the virtual source positions.
However, HRTFs at nonmeasured positions can be estimated from
available measurements via interpolation, allowing to place and
move virtual sources freely inside the measurement grid. For hi-fi
rendering, the measurement grid should cover all or most of the
sphere surrounding the listener and have a spatial resolution of
5–15° in elevation and 4–5° in azimuth, with fewer measurement
points required toward extreme elevations [15].
Different types of preprocessing, either in the time or the fre-
quency domain, are typically applied to the measured HRTF data
[16]. Equalization techniques such as free-field or diffuse-field
equalization compensate for the response of the measurement or
reproduction system. Smoothing of HRTF data decreases percep-
tually irrelevant fluctuations, thus reducing the complexity of the
frequency responses, enabling more efficient filtering and
smoother interpolation between HRTFs.
Several approaches for HRTF interpolation have been pro-
posed in the literature, including linear interpolation of neigh-
boring HRTFs, spherical splines, and spherical harmonics. The
advantage of linear interpolation over more sophisticated
approaches is the reduced complexity in terms of implementation
and computation, which can be a decisive factor in real-time
applications. Linear interpolation is typically performed via a
weighted combination of a subset of measured HRTFs lying close
to the desired spatial location.
Publicly available HRTF databases are typically measured at
locations on the surface of a sphere, based on the assumption that
HRTFs are distance-independent further than about 1 m from the
head of the listener [17]. For HRTFs measured on a sphere, the
measurement points can be grouped into nonoverlapping trian-
gles via triangulation. The interpolation is then performed by
combining the HRTFs forming the triangle enclosing the loca-
tion to be estimated. For measurement points obtained at various
distances, triangulation yields a mesh of nonoverlapping tetrahe-
dra. To estimate the HRTFs at a nonmeasured location, the
HRTFs forming a tetrahedron enclosing the location to be esti-
mated are interpolated. The weights for interpolating HRTFs
forming a triangle or tetrahedron can be calculated from bary-
centric coordinates [18].
Once a suitable subset has been determined and the interpo-
lation weights have been calculated, the actual interpolation is
performed. A direct weighted addition of the selected HRTFs,
which is equivalent to a linear combination of the correspond-
ing impulse responses due to the linearity of the Fourier trans-
form, typically leads to severe comb-filtering artifacts. This is
due to the combination of transfer functions with different
phases. Several approaches have been proposed to overcome this
problem. A typical signal flow for dynamic synthesis, which con-
tains the basic building blocks for interpolation and application
of HRTF filters, is depicted in Figure 5. The main functionalities
are the handling of time delays, interpolation of frequency
responses, convolution with the source signals, and crossfading
to enable smooth transitions between different HRTFs.
The separate handling of time delays, which are either
extracted from the HRTF data set in a preprocessing step or from
geometrical models, e.g., a spherical head model [19], yields
Acoustic Scene
Delay
Computation
Delay Line
HRTF
Selection
HRTF
Interpolation
Convolution
Convolution
Crossfading
Crossfading
HRTF
Database
Audio
Position
[FIG5] The signal flow of a dynamic binaural synthesis system for multiple sound sources.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®