Zoom out Search Issue
IEEE SIGNAL PROCESSING MAGAZINE [35] MARCH 2015
SIGNAL EXTRACTION
SINGLE-CHANNEL FILTERS
A computationally efficient estimation of the direct and the diffuse
components is possible using single-channel filters. Such process-
ing is applied for instance in DirAC [1], where the direct and dif-
fuse signals are estimated by applying a spectral gain to a single
microphone signal. The direct sound is then estimated as
(, , ) (, ) (, , ),ddXkn WknXkn
11ss
=
t
(4)
where (, )Wkn
s
is a single-channel filter, which is multiplied with
the reference microphone signal to obtain the direct sound at .d
1
An optimal filter (, )Wkn
s
can be found, for instance, by minimiz-
ing the mean-squared error between the true and estimated direct
sound, which yields the well-known Wiener filter (WF). If we
assume no microphone noise, the WF for extracting the direct
sound is given by
(, ) (, ).Wkn kn1
s
W=- Here, ( , )knW is the
diffuseness, which is defined as
(, )
(, )
,kn
kn1
1
SDR
W =
+
(5)
where (, )knSDR is the signal-to-diffuse ratio (SDR) (power
ratio of the direct sound and the diffuse sound). The diffuseness
is bounded between zero and one, and describes how diffuse the
sound field is at the recording position. For a purely diffuse field,
the SDR is zero leading to the maximum diffuseness
(, ) .kn 1W = In this case, the WF, ( , ),Wkn
s
equals zero and
thus, the estimated direct sound in (4) equals zero as well. In
contrast, when the direct sound is strong compared to the dif-
fuse sound, the SDR is high and the diffuseness in (5)
approaches zero. In this case, the WF
(, )Wkn
s
approaches one
and thus, the estimated direct sound in (4) is extracted as the
microphone signal. The SDR or diffuseness, required to compute
the WF, is estimated using multiple microphones as will be
explained in the section “Parameter Estimation.”
The diffuse sound
(, , )dX kn
1d
can be estimated in the same way
as the direct sound. In this case, the optimal filter is found by mini-
mizing the mean-squared error between the true and estimated dif-
fuse sound. The resulting WF is given by
(, ) (, ).W kn kn
d
W=
Instead of using the WF, the square root of the WF is often applied to
estimate the direct sound and diffuse sound (cf. [1]). In the absence
of sensor noise, the total power of the estimated direct and diffuse
sound components is then equal to the total power of the received
direct and diffuse sound components.
In general, extracting the direct and diffuse signals with sin-
gle-channel filters has several limitations:
1) Although the required SDR or diffuseness are estimated
using multiple microphones (as will be discussed later), only
a single microphone signal is utilized for the filtering. Hence,
the available spatial information is not fully exploited.
2) The temporal resolution of single-channel filters may be
insufficient in practice to accurately follow rapid changes in
the sound scene. This can cause leakage of the direct sound
into the estimated diffuse sound.
3) The WFs defined earlier do not guarantee a distortionless
response for the estimated direct and diffuse sounds, i.e., they
may alter the direct and diffuse sounds, respectively.
4) Since the noise, such as the microphone self-noise or the
background noise, is typically not considered when comput-
ing the filters, it may leak into the estimated signals and dete-
riorate the sound quality.
Limitations 1 and 4 are demonstrated in Figure 4(a), (b), and (d),
where the spectrograms of the input (reference microphone) signal
and both extracted components for the noise only (before time frame
75), castanet sound (between time frame 75 and time frame 150),
and speech (latter frames) are shown. The noise is clearly visible in
the estimated diffuse sound and slightly visible in the estimated
direct sound. Furthermore, the onsets of the castanets leak into the
estimated diffuse signal, while the reverberant sound from the casta-
nets and the speech leaks into the estimated direct signal.
MULTICHANNEL FILTERS
Many limitations of single-channel filters can be overcome by
using multichannel filters. In this case, the direct and diffuse
signals are estimated via a weighted sum of multiple microphone
signals. The direct sound is estimated with
(, , ) (, ) (, ),wxdXkn kn kn
1s
s
H
=
t
(6)
where (, )w kn
s
is a complex weight vector containing the filter
weights for the M microphones and
H
$
^h
denotes the conjugate
transpose. A filter (, )w kn
s
can be found for instance by minimiz-
ing the mean-squared error between the true and estimated direct
sound, similarly as in the single-channel case. Alternatively, the fil-
ter weights can be found by minimizing the diffuse sound and
noise at the filter output while providing a distortionless response
for the direct sound, which assures that the direct sound is not
altered by the filter. This filter is referred to as the linearly con-
strained minimum variance (LCMV) [21] filter, which can be
obtained by solving
(, ) (, ) ()ww wargminkn kn k
w
s
H
d n
UU=+
6
@
(, ) (,) ,wgkn k 1subject to
H
i = (7)
where the propagation vector ( , )g k i depends on the array geom-
etry and DOA (, )kn
i of the direct sound. Here, ( , )kn
d
U is the
power spectral density (PSD) matrix of the diffuse sound, which
can be written using the aforementioned assumptions as
(, ) (, ) (, )xxkn kn knE
dd
d
H
U =
"
,
(8a)
(, ) (),kn k
dd
z C= (8b)
where ( , )kn
d
z is the power of the diffuse sound and ( )k
d
C is
the diffuse sound coherence matrix. The
th(,)mm
l
element of
()k
d
C is the spatial coherence between the signals received at
microphones m and ,m
l
which is known a priori when assum-
ing a specific diffuse field characteristic. For instance, for a
spherically isotropic diffuse field and omnidirectional micro-
phones, the spatial coherence is a sinc function depending on
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®