Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [139] MARCH 2015
excitations, and a filter is learned to accommodate the dictionary
to a new condition.
AUDIO DEREVERBERATION
The excitation-filter model discussed in the previous section is
only able to deal with filters whose length is smaller than one
audio frame. Audio signals recorded in realistic indoor environ-
ments always contain some reverberation, which can have impulse
response lengths much longer (typically hundreds of milliseconds
to seconds) than frame lengths appropriate for audio compos-
itional models (tens of milliseconds). Furthermore, reverberation
is a commonly used effect in music production since a moderate
amount of reverberation is found to be perceptually pleasant.
However, too much reverberation decreases the intelligibil-
ity of audio and interferes with many audio analysis algorithms.
Therefore, there is a need for dereverberation methods and ana-
lysis methods that are robust to reverberations.
Reverberation can be formulated as a compositional process as
a convolution between the magnitude spectrogram
|[,]|Sft of a
dry, unreverberant signal, and the magnitude response |[,]|Hft
of a filter in the magnitude spectrogram domain [59], [60]
|[,]| |[, ]|| [,]|,Yft Sft Hf
M
0
. xx-
x =
/
(20)
|[,]|| [,]|,Sft Hft)/ (21)
where M is the length of the filter (in frames). Blind estimation of
dry signals and reverberation filters is not feasible since the model
is ambiguous, and the roles of the source and the impulse response
can end up swapped if other restrictions are not used. A suitable a
priori information to regularize the model can be, e.g., sparseness
[60] or a dictionary-based model [59]. Thus, in practice, we can
model
|(,)|Sft using another compositional model. The model
parameters can be estimated using the principles explained previ-
ously, i.e., by minimizing a divergence between an observed spec-
trogram and the model. Figure 11 gives an example of a
reverberant speech spectrogram that is modeled as a convolution
between a dry speech spectrogram and a spectrogram of filter.
NONNEGATIVE MATRIX DECONVOLUTIONS
The basic unsupervised NMF model in (1) is limited in the sense
that a random reordering of the frames and columns of the
observation matrix
Y does not affect the outcome of the result,
i.e., the resulting X and A are just reordered versions of X and
A that would have been obtained without reordering of .Y
Let us illustrate the limitations of the model by the example
in Figure 12(a), where few frames of the spectrogram are lost,
e.g., because of packet loss. Even though the sounds in the
example exhibit clear temporal structure that could be used to
impute the missing values, the regular NMF cannot be used for
this purpose since there is no data from which to estimate the
Elementary Filter Atoms
Activations of Filter Atoms
Weighted Filter
Atoms
Learned Filter
Synthetic
Harmonic Excitation
Modeled Atom
[FIG10] Modeling atoms with the excitation-filter model. The
filter is modeled as the sum of elementary filter atoms (upper
left), weighted by activations (upper right). The filter is
pointwise multiplied by a synthetic harmonic excitation (right)
to get an atom (bottom).
Time (s) Time (s) Time (s)
(a) (b) (c)
Frequency (Hz)
Reverberant Spectrogram |Y [f, t ]|
0 0.5 1 1.5 2 2.5
0
1,000
2,000
3,000
4,000
5,000
Dry Spectrogram |S [f, t ]|
0 0.5
1
1.5 2 2.5
Reverberation Filter |H [f, t ]|
0 0.5 1
[FIG11] (a) The magnitude spectrogram of a reverberant signal can be approximated as (b) the convolution between the spectrograms
of a dry signal and (c) the impulse response of the reverberation.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®