Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

100

IEEE SIGNAL PROCESSING MAGAZINE [96] MARCH 2015

several advantages. First, it lowers

the required filter orders of the

HRTFs, reducing computational and

memory requirements. Second, the

phase differences between neighbor-

ing HRTFs are significantly reduced,

which alleviates the comb-filtering

artifacts during interpolation. Finally,

the perceptual limits for the ITD necessitate variable time delays

with subsample accuracy, which are best implemented using frac-

tional-delay filtering techniques, e.g., [20].

Several strategies exist to interpolate the HRTF responses

(without delays). Interpolating the magnitude and phase responses

separately preserves the complex-valued responses of HRTF filters

[6]. Other approaches make use of the physical properties of (delay-

compensated) HRTFs, which closely resemble minimum-phase

systems [21], or the limited perceptual relevance of the phase [19],

[22]. Interpolation of the magnitude responses followed by mini-

mum-phase reconstruction is proposed in [6] and [16]. Another

method is to interpolate only the HRTF magnitudes [19].

Filtering of HRTFs can be performed either by linear convolu-

tion in the time domain, or by frequency-domain fast convolution

techniques. While the latter is significantly more efficient than lin-

ear convolution for all but the lowest filter orders, it introduces an

additional blocking latency in the order of the HRTF filter length,

which can be critical for assisted listening, e.g., hear-through

applications. Partitioned convolution techniques [23], [24] enable

advantageous tradeoffs between the efficiency of fast convolution

and system latency.

HRTF crossfading, which is also denoted as commutation [6],

refers to the gradual transition between interpolated HRTFs. It

reduces audible artifacts that are caused by the exchange of filter

coefficients. Thus, crossfading is typically performed at a much

higher time resolution than HRTF interpolation. The choice of

the crossfading algorithm tightly depends on the convolution

method used for HRTF filtering. In case of linear convolution, it

can be efficiently implemented by a linear interpolation of the

finite impulse reponse (FIR) filter coefficients. In contrast, inte-

grating crossfading with frequency-domain convolution is more

difficult due to block-based operation. A typical solution is to per-

form two convolution processes in parallel and to crossfade the

filtered signals in the time domain. A technique that combines

crossfading with frequency-domain and partitioned convolution

to avoid the complexity of two separate filtering processes is pro-

posed in [24].

AUDIO-AUGMENTED REALITY

Audio-augmented reality refers to a system with which the user

hears simultaneously both the synthetic and the ambient sounds

around her/him. In addition to the requirements of regular head-

phone or virtual-reality listening, a hear-through mode is now

needed [25], [26].

The hear-through mode is trivial in open and bone-conduction

headphones, which do not block the ear canal [27]. Then the user

will always hear the ambient sounds without extra attenuation.

However, other types of headphones,

such as closed and IE headphones,

block the ear canal and suppress out-

side sounds. The hear-through mode

must compensate for this attenua-

tion so that the environmental

sounds could be heard in a natural

way. As seen in Figure 2, in closed-

back and IE headphones, the attenuation at low frequencies is not

dramatic, but at frequencies higher than 1 kHz it can be remark-

able, such as more than 20 dB. This corresponds to a severe acous-

tic isolation of the headphone user, similar to that observed with

hearing protectors.

A hear-through system is usually based on an external micro-

phone [25]. The ambient sound signal captured by the micro-

phone is filtered and sent to the earpiece with an appropriate gain.

The aim of the filtering and the amplification is to cancel the

attenuation caused by the headphone itself. Thus, the filter is usu-

ally of high-pass type, because low frequencies leak to the ear

without being much damped.

An additional constraint in a hear-through system is its

latency, or the time delay between the leaked and processed sound

[25]. It is inevitable that some delay is caused by the analog-to-dig-

ital and digital-to-analog conversions and the processing itself,

which the microphone signal undergoes. This delay can be, e.g., 1 ms.

When the delayed and processed sound are added to the leaked

sound at the ear, a comb-filtering effect can color ambient sounds,

which is disturbing. The disturbance is strongest when a notch of

the comb filter occurs at the frequency range where the leaked

and processed sound are equally loud [25]. This corresponds to a 6

dB attenuation in both direct sound and the processed sound. For

this reason, slightly surprisingly, a colorless hear-though system is

easiest to implement for headphones that attenuate outside

sounds well, because then most of the ambient sound can come

through microphones and processing.

ALL-PASS HEAR-THROUGH DESIGN

We describe briefly a method to design a hear-through system

based on the all-pass principle [28]. The method takes as its input

the impulse response corresponding to the isolation transfer func-

tion of the headset. It can be measured using a dummy head with

headphones and by playing a sinusoidal sweep signal from a loud-

speaker. Additionally, it is necessary to know the latency of the

acoustic signal processing system from the microphone input to

the earpiece output, which is easy to measure. Furthermore, it is

important to account for the magnitude and group-delay of the

earpiece response, but here we assume it to be flat and delay-free.

The beginning of the impulse response is given as the input to

the all-pass filter design method, which completes it so that the

overall system is all-pass [28]. Figure 6(a) shows an example where

the given sequence is the beginning of the isolation impulse

response, which corresponds to the low-pass filter response in Fig-

ure 6(b). When a truncated impulse response of an all-pass filter is

combined with it, the overall magnitude response becomes flat, as

shown in Figure 6(b). In practice, the headphone itself produces

AN ADDITIONAL CONSTRAINT

IN A HEAR-THROUGH SYSTEM

IS ITS LATENCY, OR THE TIME

DELAY BETWEEN THE LEAKED

AND PROCESSED SOUND.

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND