Zoom out Search Issue
IEEE SIGNAL PROCESSING MAGAZINE [36] MARCH 2015
the microphone spacing and frequency [22]. Therefore,
(, )kn
d
U in (7) can be computed with (8b) when the diffuse
sound power (, )kn
d
z is known. The PSD matrix of the noise
()k
n
U in (7) is commonly estimated during silence, i.e., when
the sources are inactive, assuming that the noise is stationary.
The estimation of
(, )kn
d
z and ( )k
n
U is explained in more
detail in the next section. Note that the filter (, )w kn
s
is recom-
puted for each time-frequency bin with the geometric parame-
ters estimated for that bin. The solution is computationally
feasible since there exists a closed-form solution to the optimiza-
tion problem in (7) [21].
To estimate the diffuse sound
(, , ),knX d
1d
t
a multichannel fil-
ter that suppresses the direct sound and minimizes the noise
while capturing the diffuse sound can be applied. Such a filter can
be obtained by solving
(, ) ()wwwarg minkn k subject to
w
d
H
n
U=
(, ) (, ) (, ) (, ) .wg wakn k kn kn01and
HH
i ==(9)
The first linear constraint ensures that the direct sound is
strongly suppressed by the filter. The second linear constraint
ensures that we capture the diffuse sound as desired. Note that
there exist different definitions for the vector
(, ).a kn In [23],
(, )a kn corresponds to the propagation vector of a notional
plane wave arriving from a direction (, ),kn
0
i
which is far away
from the DOA ( , )kni of the direct sound. With this definition,
(, )w kn
d
represents a multichannel filter that captures the dif-
fuse sound mainly from direction (, ),kn
0
i while attenuating
the direct sound from direction (, ).kn
i In [24], ( , )a kn corre-
sponds to the mean relative transfer function of the diffuse
sound between the array microphones. With this approach,
(, )w kn
d
represents a multichannel filter that captures the dif-
fuse sound from all directions except for the direction (, )kn
i
from which the direct sound arrives. Note that the optimization
problem (9) has a closed-form solution [21], which can be com-
puted when the DOA
(, )kni of the direct sound is known.
Figure 4(c) and (e) depict the spectrograms of the direct
sound and diffuse sound that were extracted using the multi-
channel LCMV filters for the example scenario consisting of
noise, castanets, and speech. As can be observed, the direct
sound extracted using the multichannel filter is less noisy and
contains less diffuse sound compared to the direct sound
extracted using the single-channel filter. Moreover, the diffuse
sound extracted using the multichannel filer contains no onsets
of the direct sound (clearly visible for the onsets of the castanets
in time frames 75–150) and a significantly reduced noise level.
As expected, the multichannel filters provide more accurate
decomposition of the sound field into a direct and a diffuse sig-
nal component. The estimation accuracy strongly influences the
performance of the discussed parametric processing approaches.
[FIG4] Spectograms of (a) the input signal, (b) the direct signal estimated using a single-channel filter, (c) the direct signal estimated
using a multichannel filter, (d) the diffuse signal estimated using a single-channel filter, and (e) the diffuse signal estimated using a
multichannel filter.
Single-Channel
Extraction
Time Frame Index
100 200
Multichannel
Extraction
Time Frame Index
100 200
Time Frame Index
Input Signal
Time Frame Index
Frequency [kHz]
100 200
0
2
4
6
−60
−50
−40
−30
−20
−10
0
(a)
(b) (c)
(d) (e)
100 200
Time Frame Index
100 200
−60
−50
−40
−30
−20
−10
0
−60
−50
−40
−30
−20
−10
0
0
2
4
6
Direct Sound
Frequency [kHz]
0
2
4
6
Diffuse Sound
Frequency [kHz]
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®