Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [140] MARCH 2015
activations that correspond to the
missing frames.
As in the previous example,
sounds typically have strong tempo-
ral and spectral dependencies. Tem-
poral context can be included in a compositional model by
simply concatenating a number of adjacent observations to a
long observation vector [4]. However, this increase of the
dimensionality of the observations makes the inference of atoms
more difficult—e.g., in the above example, we would need mul-
tiple atoms to represent all of the temporally shifted variants of
the bird sounds.
The principles used to model reverberant spectrograms and
estimate reverberation responses and dry signals can be extended
to learn temporal and spectral patterns that span more than one
frame or frequency bin, respectively. These nonnegative matrix
deconvolution (NMD) [2], [33], [61] methods aim at modeling
either temporal or spectral context.
When the model is used in the time domain, it represents a
spectrogram as a sum of temporally shifted and scaled versions of
atomic spectrogram segments
.a
,n x
As before, the atom vectors
are indexed by ,n but now also with
,
x which is the frame index of the
short-time spectrogram segment.
An illustration of the model is given
in Figure 12. Mathematically, the
model for an individual mixture spectrogram frame
y
t
is given as
[],xtyy a
,tt k
L
k
K
k
01
. x=-
x
x
==
t
//
(22)
where L is the length of atomic spectrogram events. NMD gets its
name from this formulation, as the contribution of a single atom
is the convolution between the atom vectors and the activations.
Again, the parameters of the model can be obtained by min-
imizing a divergence between observations and the model while
constraining the model parameters to nonnegative values. In an
unsupervised scenario where both the atom vectors and their
activations are estimated, care must be taken to limit the number
of atoms and the length of events to avoid overfitting.
Convolution in frequency can be used to model pitch shifting
of atoms. A limitation of the linear models, at least when a
Frequency (kHz)
Spectrogram Matrix Y with Missing Frames
0.5 1
1.5
2 2.5
(a)
3 3.5 4 4.5
0
1
2
3
4
5
Component a
1, τ
Component a
2, τ
Frequency (kHz)
ττ
0
1
2
3
4
0 1 2 3 4 5
0
0.2
0.4
0.6
0.8
1
Time (s)
(c)(b)
Amplitude
Activations X
[FIG12] An illustration of the NMD model. (a) The magnitude spectrogram of a signal consisting of three bird sounds (Friedmann’s lark)
and background noises. The spectrogram is modeled using NMD to decompose the signal into bird sounds (component 1) and
background noises (component 2). (b) The compositional model represents the spectrogram as the weighted and delayed sum of two
short event spectrogram segments. (c) The curves show the weights for each delay. The impulses in the curves correspond to the start
times of bird sound events in the mixture. The events have been correctly found even though some of the frames in the mixture signal
are missing (black vertical bar). Since NMD models the mixture as a sum of segments longer than the missing-frame segment, the
model parameters can be used to predict the missing frames.
REVERBERATION
CAN BE FORMULATED AS A
COMPOSITIONAL PROCESS.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®