Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

141

142

143

144

145

146

147

148

149

150

IEEE SIGNAL PROCESSING MAGAZINE [139] MARCH 2015

excitations, and a filter is learned to accommodate the dictionary

to a new condition.

AUDIO DEREVERBERATION

The excitation-filter model discussed in the previous section is

only able to deal with filters whose length is smaller than one

audio frame. Audio signals recorded in realistic indoor environ-

ments always contain some reverberation, which can have impulse

response lengths much longer (typically hundreds of milliseconds

to seconds) than frame lengths appropriate for audio compos-

itional models (tens of milliseconds). Furthermore, reverberation

is a commonly used effect in music production since a moderate

amount of reverberation is found to be perceptually pleasant.

However, too much reverberation decreases the intelligibil-

ity of audio and interferes with many audio analysis algorithms.

Therefore, there is a need for dereverberation methods and ana-

lysis methods that are robust to reverberations.

Reverberation can be formulated as a compositional process as

a convolution between the magnitude spectrogram

|[,]|Sft of a

dry, unreverberant signal, and the magnitude response |[,]|Hft

of a filter in the magnitude spectrogram domain [59], [60]

|[,]| |[, ]|| [,]|,Yft Sft Hf

. xx-

x =

(20)

|[,]|| [,]|,Sft Hft)/ (21)

where M is the length of the filter (in frames). Blind estimation of

dry signals and reverberation filters is not feasible since the model

is ambiguous, and the roles of the source and the impulse response

can end up swapped if other restrictions are not used. A suitable a

priori information to regularize the model can be, e.g., sparseness

[60] or a dictionary-based model [59]. Thus, in practice, we can

model

|(,)|Sft using another compositional model. The model

parameters can be estimated using the principles explained previ-

ously, i.e., by minimizing a divergence between an observed spec-

trogram and the model. Figure 11 gives an example of a

reverberant speech spectrogram that is modeled as a convolution

between a dry speech spectrogram and a spectrogram of filter.

NONNEGATIVE MATRIX DECONVOLUTIONS

The basic unsupervised NMF model in (1) is limited in the sense

that a random reordering of the frames and columns of the

observation matrix

Y does not affect the outcome of the result,

i.e., the resulting X and A are just reordered versions of X and

A that would have been obtained without reordering of .Y

Let us illustrate the limitations of the model by the example

in Figure 12(a), where few frames of the spectrogram are lost,

e.g., because of packet loss. Even though the sounds in the

example exhibit clear temporal structure that could be used to

impute the missing values, the regular NMF cannot be used for

this purpose since there is no data from which to estimate the

Elementary Filter Atoms

Activations of Filter Atoms

Weighted Filter

Atoms

Learned Filter

Synthetic

Harmonic Excitation

Modeled Atom

∑

[FIG10] Modeling atoms with the excitation-filter model. The

filter is modeled as the sum of elementary filter atoms (upper

left), weighted by activations (upper right). The filter is

pointwise multiplied by a synthetic harmonic excitation (right)

to get an atom (bottom).

Time (s) Time (s) Time (s)

(a) (b) (c)

Frequency (Hz)

Reverberant Spectrogram |Y [f, t ]|

0 0.5 1 1.5 2 2.5

1,000

2,000

3,000

4,000

5,000

Dry Spectrogram |S [f, t ]|

0 0.5

1.5 2 2.5

Reverberation Filter |H [f, t ]|

0 0.5 1

∗

≈

[FIG11] (a) The magnitude spectrogram of a reverberant signal can be approximated as (b) the convolution between the spectrograms

of a dry signal and (c) the impulse response of the reverberation.

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND