Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

101

102

103

104

105

106

107

108

109

110

IEEE SIGNAL PROCESSING MAGAZINE [102] MARCH 2015

■ Head tracking: to adapt to the dynamic head movements

of the listener.

The following sections describe in detail the virtualization and

its interaction with head tracking, sound scene decomposition,

individualization, and equalization. These signal processing tech-

niques are integrated and evaluated using subjective tests.

VIRTUALIZATION

In digital media, sound is typically mixed for loudspeaker playback

rather than headphone playback. The spatial sound to be rendered

naturally over headphones should emulate the natural propagation

of the acoustic waves emanating from the loudspeaker to the ear-

drum of the listener. To emulate stereo or surround sound loud-

speaker rendering over headphones, virtualization techniques based

on HRTFs corresponding to the loudspeaker positions are commonly

used. Given these acoustic transfer functions (i.e., HRTFs), the virtu-

alization technique is applicable to any multichannel loudspeaker

setup, be it stereo, 5.1, 7.1, 22.2, or even loudspeaker arrays in wave-

field synthesis. As shown in Figure 2(a), for every desired loudspeaker

position, the signal in the mth channel

is filtered with the

corresponding HRTF ,,hnhn

xmL xmR

^^hh

and summed before

being routed to the left and right ears [1], [5], respectively, as:

yn h n x n

LxmLm

RxmRm

)

(1)

where * denotes convolution and

M is the total number of chan-

nels. When the HRTFs are directly applied to multichannel loud-

speaker signals, the rendered sound scenes in headphone playback

suffer from inaccurate virtual source directions, lack of depth, and

reduced image width [5], [6].

To solve these problems in virtualization of multichannel loud-

speaker signals and achieve a faithful reproduction of the sound

scenes, the HRTFs should be applied to the individual source sig-

nals that are usually extracted (using BSS and PAE) from the loud-

speaker signals (i.e., mixtures). In this virtualization [as shown in

Figure 2(b)], the sources are rendered directly using the HRTFs of

the corresponding source directions

,hnhn

skL skR

^^hh

yn h n sn an

LskLk

RskRk

)

(2)

where K is the total number of sources, sn

is the kth source

in the multichannel signal, and the environment signals

,anan

^^hh

are the rendered signals representing the sound

environment perceived by two ears. To render the acoustics of the

environment, the environment signals can be either synthesized

according to the sound environment [7] or extracted from the

mixtures. Techniques like decorrelation [5], [8] and artificial rever-

beration [9] are commonly employed to render the environment

signals to create a more diffuse and natural sound environment.

[FIG1] A summary of the differences between natural listening and headphone listening and the corresponding signal processing

techniques to solve these challenges for natural sound rendering. The main challenges and their corresponding signal processing

techniques in each category (source, medium, and receiver) are highlighted and their interactions (not shown here) are further

discussed in the article.

Materials for

Loudspeaker Playback

Recordings/Mixtures

Headphones

Individual Filtering

(Partial Ear)

Nonadapted Head

Movements

Virtualization

Sound Scene

Decomposition

Equalization

Individualization

Head Tracking

Physical Sound Sources

and

Environment

Free Air

Individual Filtering

(Torso, Head, Ear)

Head Movements

Source

Medium

Receiver

Headphone Listening

Signal Processing

Techniques Natural Listening

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND