Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

100

IEEE SIGNAL PROCESSING MAGAZINE [95] MARCH 2015

provide the required hardware for computer-vision-based head

tracking for mobile applications.

DYNAMIC BINAURAL SYNTHESIS

HRTFs can be obtained by measuring the acoustic path from a

point in space to the ear entrances of a test subject or dummy

head. Alternatively, HRTFs of an individual can be estimated from

anthropometric data or by using listening tests. These methods

are reviewed in the article “Natural Sound Rendering for Head-

phones” on page 98 of this special issue of IEEE Signal Process-

ing Magazine [14]. Existing and upcoming standards for

broadcasting spatial audio, e.g., MPEG Spatial Audio Object Cod-

ing or MPEG-H 3D Audio address the need for suitable HRTFs by

either providing interfaces to supply individualized data or by

transmitting predefined data sets in the encoded bit stream.

HRTF measurements are typically performed at discrete loca-

tions on a spherical grid centered around the test subject’s head.

Using such measurements directly for binaural synthesis would

impose the same discrete grid on the virtual source positions.

However, HRTFs at nonmeasured positions can be estimated from

available measurements via interpolation, allowing to place and

move virtual sources freely inside the measurement grid. For hi-fi

rendering, the measurement grid should cover all or most of the

sphere surrounding the listener and have a spatial resolution of

5–15° in elevation and 4–5° in azimuth, with fewer measurement

points required toward extreme elevations [15].

Different types of preprocessing, either in the time or the fre-

quency domain, are typically applied to the measured HRTF data

[16]. Equalization techniques such as free-field or diffuse-field

equalization compensate for the response of the measurement or

reproduction system. Smoothing of HRTF data decreases percep-

tually irrelevant fluctuations, thus reducing the complexity of the

frequency responses, enabling more efficient filtering and

smoother interpolation between HRTFs.

Several approaches for HRTF interpolation have been pro-

posed in the literature, including linear interpolation of neigh-

boring HRTFs, spherical splines, and spherical harmonics. The

advantage of linear interpolation over more sophisticated

approaches is the reduced complexity in terms of implementation

and computation, which can be a decisive factor in real-time

applications. Linear interpolation is typically performed via a

weighted combination of a subset of measured HRTFs lying close

to the desired spatial location.

Publicly available HRTF databases are typically measured at

locations on the surface of a sphere, based on the assumption that

HRTFs are distance-independent further than about 1 m from the

head of the listener [17]. For HRTFs measured on a sphere, the

measurement points can be grouped into nonoverlapping trian-

gles via triangulation. The interpolation is then performed by

combining the HRTFs forming the triangle enclosing the loca-

tion to be estimated. For measurement points obtained at various

distances, triangulation yields a mesh of nonoverlapping tetrahe-

dra. To estimate the HRTFs at a nonmeasured location, the

HRTFs forming a tetrahedron enclosing the location to be esti-

mated are interpolated. The weights for interpolating HRTFs

forming a triangle or tetrahedron can be calculated from bary-

centric coordinates [18].

Once a suitable subset has been determined and the interpo-

lation weights have been calculated, the actual interpolation is

performed. A direct weighted addition of the selected HRTFs,

which is equivalent to a linear combination of the correspond-

ing impulse responses due to the linearity of the Fourier trans-

form, typically leads to severe comb-filtering artifacts. This is

due to the combination of transfer functions with different

phases. Several approaches have been proposed to overcome this

problem. A typical signal flow for dynamic synthesis, which con-

tains the basic building blocks for interpolation and application

of HRTF filters, is depicted in Figure 5. The main functionalities

are the handling of time delays, interpolation of frequency

responses, convolution with the source signals, and crossfading

to enable smooth transitions between different HRTFs.

The separate handling of time delays, which are either

extracted from the HRTF data set in a preprocessing step or from

geometrical models, e.g., a spherical head model [19], yields

Acoustic Scene

Delay

Computation

Delay Line

HRTF

Selection

HRTF

Interpolation

Convolution

Crossfading

HRTF

Database

Audio

Position

[FIG5] The signal flow of a dynamic binaural synthesis system for multiple sound sources.

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND