Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

IEEE SIGNAL PROCESSING MAGAZINE [37] MARCH 2015

PARAMETER ESTIMATION

For the computation of the filters described in the previous sec-

tion, the required parameters need to be estimated. In single-

channel extraction, one parameter needs to be estimated,

specifically the signal-to-diffuse ratio

(, )knSDR or the diffuse-

ness (, ).knW In the case of multichannel signal extraction, the

required parameters include the DOA (, )kn

i of the direct sound,

the diffuse sound power (, ),kn

z and the PSD matrix ( )k

U of

slowly time-varying noise. In addition, the DOA or the position of

the direct sound sources, respectively, are required to control the

application-specific processing and synthesis. It should be noted

that the quality of the extracted and synthesized sounds is largely

influenced by the accuracy of the estimated parameters.

The estimation of the DOA of a direct sound component is a

well-addressed topic in literature and different approaches for

this task are available. Common approaches to estimate the

DOAs in the different frequency bands are ESPRIT and root

MUSIC (cf. [21] and the references therein).

For estimating the SDR, two different approaches are com-

mon in practice, depending on which microphone array geome-

try is used. For linear microphone arrays, the SDR is typically

estimated based on the spatial coherence between the signals of

two array microphones [25]. The spatial coherence is given by the

normalized cross-correlation between two microphone signals in

the frequency domain. When the direct sound is strong compared

to the diffuse sound (i.e, the SDR is high), the microphone sig-

nals are strongly correlated (i.e., the spatial coherence is high).

On the other hand, when the diffuse sound is strong compared to

the direct sound (i.e., the SDR is low), the microphone signals are

less correlated.

Alternatively, when a planar microphone array is used, the SDR

can be estimated based on the so-called active sound intensity vec-

tor [26]. This vector points in the direction in which the acoustic

energy flows. When only the direct sound arriving at the array from

a specific DOA is present, the intensity vector constantly points in

this direction and does not change its direction unless the sound

source moves. In contrast, when the sound field is entirely diffuse,

the intensity vector fluctuates quickly over time and points towards

random directions as the diffuse sound is arriving from all direc-

tions. Thus, the temporal variation of the intensity vector can be

used as a measure for the SDR and diffuseness, respectively [26].

Note that, as in [1], the inverse direction of the intensity vector can

also be used to estimate the DOA of the direct sound. The intensity

vector can be determined from an omnidirectional pressure signal

and the particle velocity vector as described in [26], where the later

signals can be computed from the planar microphone array as

explained, for instance, in [11].

Various approaches have been described in the literature to

estimate the slowly time-varying noise PSD matrix

().k

Assuming that the noise is stationary, which is a reasonable

assumption in many applications (e.g., when the noise represents

microphone self-noise or a stationary background noise), the

noise PSD matrix can be estimated from the microphone signals

during periods where only the noise is present in the microphone

signals, which can be detected using a voice activity detector. To

estimate the diffuse power

(, ),kn

z we employ the spatial filter

(, )w kn

in (9) that provides an estimate of the diffuse sound

(, , ).dknX

Computing the mean power of (, , )dknX

yields

an estimate of the diffuse power.

Finally, note that for some applications, such as the virtual

classroom application described in the next section, the estimation

of the IPLS positions from which the direct sounds originate may

also be required to perform the application-specific synthesis. To

determine the IPLS positions, the DOAs at different positions in the

room are estimated using multiple distributed microphone arrays.

The IPLS position can then be determined by triangulating the

estimated DOAs, as done in [27] and illustrated in Figure 2.

APPLICATION-SPECIFIC SYNTHESIS

The compact description of the sound field in terms of a direct sig-

nal component, a diffuse signal component, and sound field

parameters, as shown in Figure 1, can contribute to assisted listen-

ing in a variety of applications. While the spatial analysis yielding

estimates of the model parameters and the direct and diffuse signal

components at a reference microphone is application independent,

the processing and synthesis is application dependent. For this

[FIG5] Spatial audio communication application: (a) communication scenario and (b) rendering of the loudspeaker signals.

Reproduction Side

DOA

Recording Side

Spatial

Analysis

Processing

and

Synthesis

Diffuse Sound

Direct Sound

Decorrelators

Loudspeaker

Gain

Computation

Diffuse

Sound

Signals

(a) (b)

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND