Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

IEEE SIGNAL PROCESSING MAGAZINE [35] MARCH 2015

SIGNAL EXTRACTION

SINGLE-CHANNEL FILTERS

A computationally efficient estimation of the direct and the diffuse

components is possible using single-channel filters. Such process-

ing is applied for instance in DirAC [1], where the direct and dif-

fuse signals are estimated by applying a spectral gain to a single

microphone signal. The direct sound is then estimated as

(, , ) (, ) (, , ),ddXkn WknXkn

11ss

(4)

where (, )Wkn

is a single-channel filter, which is multiplied with

the reference microphone signal to obtain the direct sound at .d

An optimal filter (, )Wkn

can be found, for instance, by minimiz-

ing the mean-squared error between the true and estimated direct

sound, which yields the well-known Wiener filter (WF). If we

assume no microphone noise, the WF for extracting the direct

sound is given by

(, ) (, ).Wkn kn1

W=- Here, ( , )knW is the

diffuseness, which is defined as

(, )

,kn

kn1

SDR

W =

(5)

where (, )knSDR is the signal-to-diffuse ratio (SDR) (power

ratio of the direct sound and the diffuse sound). The diffuseness

is bounded between zero and one, and describes how diffuse the

sound field is at the recording position. For a purely diffuse field,

the SDR is zero leading to the maximum diffuseness

(, ) .kn 1W = In this case, the WF, ( , ),Wkn

equals zero and

thus, the estimated direct sound in (4) equals zero as well. In

contrast, when the direct sound is strong compared to the dif-

fuse sound, the SDR is high and the diffuseness in (5)

approaches zero. In this case, the WF

(, )Wkn

approaches one

and thus, the estimated direct sound in (4) is extracted as the

microphone signal. The SDR or diffuseness, required to compute

the WF, is estimated using multiple microphones as will be

explained in the section “Parameter Estimation.”

The diffuse sound

(, , )dX kn

can be estimated in the same way

as the direct sound. In this case, the optimal filter is found by mini-

mizing the mean-squared error between the true and estimated dif-

fuse sound. The resulting WF is given by

(, ) (, ).W kn kn

Instead of using the WF, the square root of the WF is often applied to

estimate the direct sound and diffuse sound (cf. [1]). In the absence

of sensor noise, the total power of the estimated direct and diffuse

sound components is then equal to the total power of the received

direct and diffuse sound components.

In general, extracting the direct and diffuse signals with sin-

gle-channel filters has several limitations:

1) Although the required SDR or diffuseness are estimated

using multiple microphones (as will be discussed later), only

a single microphone signal is utilized for the filtering. Hence,

the available spatial information is not fully exploited.

2) The temporal resolution of single-channel filters may be

insufficient in practice to accurately follow rapid changes in

the sound scene. This can cause leakage of the direct sound

into the estimated diffuse sound.

3) The WFs defined earlier do not guarantee a distortionless

response for the estimated direct and diffuse sounds, i.e., they

may alter the direct and diffuse sounds, respectively.

4) Since the noise, such as the microphone self-noise or the

background noise, is typically not considered when comput-

ing the filters, it may leak into the estimated signals and dete-

riorate the sound quality.

Limitations 1 and 4 are demonstrated in Figure 4(a), (b), and (d),

where the spectrograms of the input (reference microphone) signal

and both extracted components for the noise only (before time frame

75), castanet sound (between time frame 75 and time frame 150),

and speech (latter frames) are shown. The noise is clearly visible in

the estimated diffuse sound and slightly visible in the estimated

direct sound. Furthermore, the onsets of the castanets leak into the

estimated diffuse signal, while the reverberant sound from the casta-

nets and the speech leaks into the estimated direct signal.

MULTICHANNEL FILTERS

Many limitations of single-channel filters can be overcome by

using multichannel filters. In this case, the direct and diffuse

signals are estimated via a weighted sum of multiple microphone

signals. The direct sound is estimated with

(, , ) (, ) (, ),wxdXkn kn kn

(6)

where (, )w kn

is a complex weight vector containing the filter

weights for the M microphones and

denotes the conjugate

transpose. A filter (, )w kn

can be found for instance by minimiz-

ing the mean-squared error between the true and estimated direct

sound, similarly as in the single-channel case. Alternatively, the fil-

ter weights can be found by minimizing the diffuse sound and

noise at the filter output while providing a distortionless response

for the direct sound, which assures that the direct sound is not

altered by the filter. This filter is referred to as the linearly con-

strained minimum variance (LCMV) [21] filter, which can be

obtained by solving

(, ) (, ) ()ww wargminkn kn k

d n

UU=+

(, ) (,) ,wgkn k 1subject to

i = (7)

where the propagation vector ( , )g k i depends on the array geom-

etry and DOA (, )kn

i of the direct sound. Here, ( , )kn

U is the

power spectral density (PSD) matrix of the diffuse sound, which

can be written using the aforementioned assumptions as

(, ) (, ) (, )xxkn kn knE

U =

(8a)

(, ) (),kn k

z C= (8b)

where ( , )kn

z is the power of the diffuse sound and ( )k

C is

the diffuse sound coherence matrix. The

th(,)mm

element of

()k

C is the spatial coherence between the signals received at

microphones m and ,m

which is known a priori when assum-

ing a specific diffuse field characteristic. For instance, for a

spherically isotropic diffuse field and omnidirectional micro-

phones, the spatial coherence is a sinc function depending on

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND