Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [136] MARCH 2015
[]
tx is maximally sparse (contains only one nonzero entry),
(17) is in fact identical to nearest neighbor classification. For
less sparse solutions, the difference is that the compositional
model represents an observation as
a combination of atoms, whereas
k-NN represents an observation as a
collection of
k
atoms that each
individually are close to
.y
t
In literature, many different types
of metainformation exist. In the
music transcription example of
Figure 7, dictionary atoms were
associated with notes. Even in the previous application, source
separation, we used metainformation by labeling atoms with a
source identity. In speaker identification [44], atoms are associ-
ated with speaker identities. In simple speech processing tasks,
such as phone classification [45] or word recognition [46], the
associated labels are simply the phones or words themselves.
In these examples, the dictionary
A is either constructed or
sampled from training data, which makes it straightforward to
associate labels to atoms. When the dictionary is learned from data,
however, the appropriate mapping from atoms to labels is unclear.
In this scenario, the mapping can be learned by first calculating
atom activations on training data for
which associated labels are known,
followed by NMF or multiple regres-
sion. In [47], this approach was shown
to improve the performance even
with a sampled dictionary. Alterna-
tively, we can treat either
g
t
or the
activations []tx as features for a con-
ventional supervised machine-learn-
ing technique such as GMMs [48] or a neural network [49].
Another powerful aspect of the compositional model is that
dictionary atoms can be as easily associated with other kinds of
information, e.g., audio. Consider, for example, a bandwidth exten-
sion task [9], [50] where the goal is to estimate a full-spectrum
audio signal given a bandwidth-limited audio signal. This is a use-
ful operation to perform since, in many audio transmission cases,
high-frequency information is removed to reduce the amount of
MIDI Note Number
Frequency (kHz)
Dictionary Matrix A
30 40 50 60 70
0
0.5
1
1.5
2
2.5
3
Activation Matrix X
0 5 10 15
MIDI Note Number
Reference Activations
0 5 10 15
30
40
50
60
70
Frequency (kHz)
0
0.5
1
1.5
2
2.5
3
MIDI Note Number
30
40
50
60
70
Time (s)
Time (s) Time (s)
(a) (b)
(c) (d)
Spectrogram Matrix Y
0 5 10 15
[FIG7] A music analysis example where a polyphonic mixture spectrogram (b) is decomposed into a set of note activations (d) using a
dictionary consisting of spectra of piano notes (a). Each atom in the dictionary is associated with an MIDI note number. The reference
note activations are given in (c). This example is an excerpt from Beethoven’s Moonlight Sonata. Even though the activations are rather
noisy and do not exactly match the reference, the structure of the music is much more clearly visible in the activation plot than in the
spectrogram of the mixture signal.
INFORMATION EXTRACTION
USING THE COMPOSITIONAL
MODEL WORDS BY ASSOCIATING
EACH ATOM IN THE DICTIONARY
WITH METAINFORMATION.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®