Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

IEEE SIGNAL PROCESSING MAGAZINE [33] MARCH 2015

reproduction with arbitrary setups or for binaural reproduction

[8]. Hence, instead of transmitting many microphone signals and

carrying out the entire processing at the receiving side, only two

signals (i.e., the direct and diffuse signals) need to be transmitted

together with the parametric information. These two signals

enable synthesis of the output signals on the receiving side for the

reproduction system at hand, and additionally allow the listener to

arbitrarily adjust the spatial responses. Note that in the considered

approach, the same audio and parametric side information is sent,

irrespective of the number of loudspeakers used for reproduction.

As an alternative to the classical filters used for signal enhance-

ment, where an enhanced signal is created as a weighted sum of

the available microphone signals, an enhanced signal can be cre-

ated by using the direct and diffuse sound components and the

parametric information. This approach can be seen as a generali-

zation of the parametric filters used in [9]–[12] where the filters

are calculated based on instantaneous estimates of an underlying

parametric sound field model. As will be discussed later in this art-

icle, these parameters are typically estimated in narrow frequency

bands, and their accuracy depends on the resolution of the time-

frequency transform and the geometry of the microphone array. If

accurate parameter estimates with a sufficiently high time-

frequency resolution are available, parametric filters can quickly

adapt to changes in the acoustic scene. The parametric filters have

been applied to various challenging acoustic signal processing

problems related to assisted listening, such as directional filtering

[10], dereverberation [11], and acoustic zooming [13]. Parametric

filtering approaches have been used also in the context of binaural

hearing aids [14], [15].

PARAMETRIC SOUND FIELD MODELS

BACKGROUND

Many parametric models have originally been developed with the

aim to subsequently capture, transmit, and reproduce high-quality

spatial audio; examples include directional audio coding (DirAC)

[1], microphone front ends for spatial audio coders [16], and high

angular resolution plane wave expansion (HARPEX) [17]. These

models were developed based on observations about the human

perception of spatial sound, aiming to recreate perceptually

important spatial audio attributes for the listener. For example, in

the basic form of DirAC [1], the model parameters are the DOA of

the direct sound and the diffuseness that is directly related to the

power ratio between the direct signal power and the diffuse signal

power. Using a pressure signal and this parametric information, a

direct signal and a diffuse signal could be reconstructed at the far-

end side. The direct signal is attributed to a single plane wave at

each frequency, and the diffuse signal is attributed to spatially

extended sound sources, concurrent sound sources (e.g., applause

from an audience or cafeteria noise), and room reverberation that

occurs due to multipath acoustic wave propagation when sound is

captured in an enclosed environment. A similar sound field model

that consists of the direct and diffuse sound has been applied in

spatial audio scene coding (SASC) [2] and in [3] for sound repro-

duction with arbitrary reproduction systems and for sound scene

manipulations. On the other hand, in [16] the model parameters

include the interchannel level difference and the interchannel

coherence [18] that were estimated using two microphones and

were previously used in various spatial audio coders [6]. These

model parameters are sent to the far-end side together with a so-

called downmix signal to generate multiple loudspeaker channels

for sound reproduction. In this case, the downmix signal and

parameters are compatible with those used in different spatial

audio coders. In contrast to DirAC and SASC, HARPEX assumes

that the direct signal at a particular frequency is composed only of

two plane waves.

Besides offering a compact and flexible way to transmit and

reproduce high-quality spatial audio, independent of the reproduc-

tion setup, parametric processing is highly attractive for sound

scene manipulations and signal enhancement. The extracted

model parameters can be used to compute parametric filters that

can, for instance, achieve directional filtering [10] and dereverber-

ation [11]. The parametric filters represent spectral gains applied

to a reference microphone signal, and can in principle provide

arbitrary directivity patterns that can adapt quickly to the acoustic

scene provided that the sound field analysis is performed with a

sufficiently high time-frequency resolution. For this purpose, the

short-time Fourier transform (STFT) is considered a good choice

as it often offers a sufficiently sparse signal representation to

assume a single dominant directional wave in each time-frequency

bin. For instance, the assumption that the source spectra are suffi-

ciently sparse is commonly made in speech signal processing [19].

The sources that exhibit sufficiently small spectrotemporal

overlap fulfill the so-called W-disjoint orthogonality condition.

This assumption is, however, violated when concurrent sound

sources with comparable powers are active in one frequency

band. Another family of parametric approaches emerged within

the area of computational auditory scene analysis [20], where

the auditory cues are utilized for instance to derive time-fre-

quency masks that can be used to separate different source sig-

nals from the captured sound.

Clearly, the choice of an underlying parametric model

depends on the specific application and on the way the extracted

parameters and the available audio signals are used to generate

the desired output. In this article, we focus on geometry-based

parametric models that take into account both direct and dif-

fuse sound components, allowing for high-quality spatial sound

acquisition, which can be subsequently used both for transmis-

sion and reproduction purposes, as well as to derive flexible par-

ametric filters for sound scene manipulation and signal

enhancement for assisted listening.

GEOMETRIC MODELS

In the following, we consider the time-frequency domain with

and n denoting the frequency and time indices, respectively. For

each (, ),kn we assume that the sound field is a superposition of a

single spherical wave and a diffuse sound field. The spherical wave

models the direct sound of the point-source in a reverberant envi-

ronment, while the diffuse field models room reverberation and

spatially extended sound sources. As shown in Figure 2, the

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND