Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [33] MARCH 2015
reproduction with arbitrary setups or for binaural reproduction
[8]. Hence, instead of transmitting many microphone signals and
carrying out the entire processing at the receiving side, only two
signals (i.e., the direct and diffuse signals) need to be transmitted
together with the parametric information. These two signals
enable synthesis of the output signals on the receiving side for the
reproduction system at hand, and additionally allow the listener to
arbitrarily adjust the spatial responses. Note that in the considered
approach, the same audio and parametric side information is sent,
irrespective of the number of loudspeakers used for reproduction.
As an alternative to the classical filters used for signal enhance-
ment, where an enhanced signal is created as a weighted sum of
the available microphone signals, an enhanced signal can be cre-
ated by using the direct and diffuse sound components and the
parametric information. This approach can be seen as a generali-
zation of the parametric filters used in [9]–[12] where the filters
are calculated based on instantaneous estimates of an underlying
parametric sound field model. As will be discussed later in this art-
icle, these parameters are typically estimated in narrow frequency
bands, and their accuracy depends on the resolution of the time-
frequency transform and the geometry of the microphone array. If
accurate parameter estimates with a sufficiently high time-
frequency resolution are available, parametric filters can quickly
adapt to changes in the acoustic scene. The parametric filters have
been applied to various challenging acoustic signal processing
problems related to assisted listening, such as directional filtering
[10], dereverberation [11], and acoustic zooming [13]. Parametric
filtering approaches have been used also in the context of binaural
hearing aids [14], [15].
PARAMETRIC SOUND FIELD MODELS
BACKGROUND
Many parametric models have originally been developed with the
aim to subsequently capture, transmit, and reproduce high-quality
spatial audio; examples include directional audio coding (DirAC)
[1], microphone front ends for spatial audio coders [16], and high
angular resolution plane wave expansion (HARPEX) [17]. These
models were developed based on observations about the human
perception of spatial sound, aiming to recreate perceptually
important spatial audio attributes for the listener. For example, in
the basic form of DirAC [1], the model parameters are the DOA of
the direct sound and the diffuseness that is directly related to the
power ratio between the direct signal power and the diffuse signal
power. Using a pressure signal and this parametric information, a
direct signal and a diffuse signal could be reconstructed at the far-
end side. The direct signal is attributed to a single plane wave at
each frequency, and the diffuse signal is attributed to spatially
extended sound sources, concurrent sound sources (e.g., applause
from an audience or cafeteria noise), and room reverberation that
occurs due to multipath acoustic wave propagation when sound is
captured in an enclosed environment. A similar sound field model
that consists of the direct and diffuse sound has been applied in
spatial audio scene coding (SASC) [2] and in [3] for sound repro-
duction with arbitrary reproduction systems and for sound scene
manipulations. On the other hand, in [16] the model parameters
include the interchannel level difference and the interchannel
coherence [18] that were estimated using two microphones and
were previously used in various spatial audio coders [6]. These
model parameters are sent to the far-end side together with a so-
called downmix signal to generate multiple loudspeaker channels
for sound reproduction. In this case, the downmix signal and
parameters are compatible with those used in different spatial
audio coders. In contrast to DirAC and SASC, HARPEX assumes
that the direct signal at a particular frequency is composed only of
two plane waves.
Besides offering a compact and flexible way to transmit and
reproduce high-quality spatial audio, independent of the reproduc-
tion setup, parametric processing is highly attractive for sound
scene manipulations and signal enhancement. The extracted
model parameters can be used to compute parametric filters that
can, for instance, achieve directional filtering [10] and dereverber-
ation [11]. The parametric filters represent spectral gains applied
to a reference microphone signal, and can in principle provide
arbitrary directivity patterns that can adapt quickly to the acoustic
scene provided that the sound field analysis is performed with a
sufficiently high time-frequency resolution. For this purpose, the
short-time Fourier transform (STFT) is considered a good choice
as it often offers a sufficiently sparse signal representation to
assume a single dominant directional wave in each time-frequency
bin. For instance, the assumption that the source spectra are suffi-
ciently sparse is commonly made in speech signal processing [19].
The sources that exhibit sufficiently small spectrotemporal
overlap fulfill the so-called W-disjoint orthogonality condition.
This assumption is, however, violated when concurrent sound
sources with comparable powers are active in one frequency
band. Another family of parametric approaches emerged within
the area of computational auditory scene analysis [20], where
the auditory cues are utilized for instance to derive time-fre-
quency masks that can be used to separate different source sig-
nals from the captured sound.
Clearly, the choice of an underlying parametric model
depends on the specific application and on the way the extracted
parameters and the available audio signals are used to generate
the desired output. In this article, we focus on geometry-based
parametric models that take into account both direct and dif-
fuse sound components, allowing for high-quality spatial sound
acquisition, which can be subsequently used both for transmis-
sion and reproduction purposes, as well as to derive flexible par-
ametric filters for sound scene manipulation and signal
enhancement for assisted listening.
GEOMETRIC MODELS
In the following, we consider the time-frequency domain with
k
and n denoting the frequency and time indices, respectively. For
each (, ),kn we assume that the sound field is a superposition of a
single spherical wave and a diffuse sound field. The spherical wave
models the direct sound of the point-source in a reverberant envi-
ronment, while the diffuse field models room reverberation and
spatially extended sound sources. As shown in Figure 2, the
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®