Zoom out Search Issue
IEEE SIGNAL PROCESSING MAGAZINE [37] MARCH 2015
PARAMETER ESTIMATION
For the computation of the filters described in the previous sec-
tion, the required parameters need to be estimated. In single-
channel extraction, one parameter needs to be estimated,
specifically the signal-to-diffuse ratio
(, )knSDR or the diffuse-
ness (, ).knW In the case of multichannel signal extraction, the
required parameters include the DOA (, )kn
i of the direct sound,
the diffuse sound power (, ),kn
d
z and the PSD matrix ( )k
n
U of
slowly time-varying noise. In addition, the DOA or the position of
the direct sound sources, respectively, are required to control the
application-specific processing and synthesis. It should be noted
that the quality of the extracted and synthesized sounds is largely
influenced by the accuracy of the estimated parameters.
The estimation of the DOA of a direct sound component is a
well-addressed topic in literature and different approaches for
this task are available. Common approaches to estimate the
DOAs in the different frequency bands are ESPRIT and root
MUSIC (cf. [21] and the references therein).
For estimating the SDR, two different approaches are com-
mon in practice, depending on which microphone array geome-
try is used. For linear microphone arrays, the SDR is typically
estimated based on the spatial coherence between the signals of
two array microphones [25]. The spatial coherence is given by the
normalized cross-correlation between two microphone signals in
the frequency domain. When the direct sound is strong compared
to the diffuse sound (i.e, the SDR is high), the microphone sig-
nals are strongly correlated (i.e., the spatial coherence is high).
On the other hand, when the diffuse sound is strong compared to
the direct sound (i.e., the SDR is low), the microphone signals are
less correlated.
Alternatively, when a planar microphone array is used, the SDR
can be estimated based on the so-called active sound intensity vec-
tor [26]. This vector points in the direction in which the acoustic
energy flows. When only the direct sound arriving at the array from
a specific DOA is present, the intensity vector constantly points in
this direction and does not change its direction unless the sound
source moves. In contrast, when the sound field is entirely diffuse,
the intensity vector fluctuates quickly over time and points towards
random directions as the diffuse sound is arriving from all direc-
tions. Thus, the temporal variation of the intensity vector can be
used as a measure for the SDR and diffuseness, respectively [26].
Note that, as in [1], the inverse direction of the intensity vector can
also be used to estimate the DOA of the direct sound. The intensity
vector can be determined from an omnidirectional pressure signal
and the particle velocity vector as described in [26], where the later
signals can be computed from the planar microphone array as
explained, for instance, in [11].
Various approaches have been described in the literature to
estimate the slowly time-varying noise PSD matrix
().k
n
U
Assuming that the noise is stationary, which is a reasonable
assumption in many applications (e.g., when the noise represents
microphone self-noise or a stationary background noise), the
noise PSD matrix can be estimated from the microphone signals
during periods where only the noise is present in the microphone
signals, which can be detected using a voice activity detector. To
estimate the diffuse power
(, ),kn
d
z we employ the spatial filter
(, )w kn
d
in (9) that provides an estimate of the diffuse sound
(, , ).dknX
1d
Computing the mean power of (, , )dknX
1d
t
yields
an estimate of the diffuse power.
Finally, note that for some applications, such as the virtual
classroom application described in the next section, the estimation
of the IPLS positions from which the direct sounds originate may
also be required to perform the application-specific synthesis. To
determine the IPLS positions, the DOAs at different positions in the
room are estimated using multiple distributed microphone arrays.
The IPLS position can then be determined by triangulating the
estimated DOAs, as done in [27] and illustrated in Figure 2.
APPLICATION-SPECIFIC SYNTHESIS
The compact description of the sound field in terms of a direct sig-
nal component, a diffuse signal component, and sound field
parameters, as shown in Figure 1, can contribute to assisted listen-
ing in a variety of applications. While the spatial analysis yielding
estimates of the model parameters and the direct and diffuse signal
components at a reference microphone is application independent,
the processing and synthesis is application dependent. For this
[FIG5] Spatial audio communication application: (a) communication scenario and (b) rendering of the loudspeaker signals.
Reproduction Side
DOA
Recording Side
TV
Spatial
Analysis
Processing
and
Synthesis
Diffuse Sound
Direct Sound
Decorrelators
Loudspeaker
Gain
Computation
Diffuse
Sound
Signals
(a) (b)
+
+
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®