Zoom out Search Issue
IEEE SIGNAL PROCESSING MAGAZINE [129] MARCH 2015
the sum of factors having a fixed spectrum a
k
and time-varying
activation .[]xt
k
Representing the activation of the kth atom to
all of the spectral vectors in Y as a vector [ [ ] [ ]xx12x
kkk
g=
[]],xT
k
<
we can represent the overall contribution of a
k
to Y
as .ax
k
k
<
We can arrange all of the atoms , ,kK1a
k
g= as columns
of a matrix .A R
FK
!
#
+
We can similarly arrange the activation
vectors of the atoms, , kK1x
k
g= as rows of the a matrix
.X R
KT
!
#
+
The composition of Y in terms of the atoms and
their activations can now be written as
,YAX. (1)
where all entries are strictly nonnegative.
To decompose the signal into its atomic parts, we must deter-
mine the
A and X that together satisfy (1) most closely. To do so,
we define a scalar-valued divergence (|| )D YAX between the
observed spectrogram Y and the decomposition ,AX which char-
acterizes the error between the two. The minimum value of the
divergence is zero, which is only reached if the error is zero, i.e.,
.
YA
X= Typically, the divergence is calculated entry wise, i.e.,
(||) ( , ),DdyyYY
,
,,
ft
ft ft
=
t
t
/
(2)
where y
,ft
and y
,ft
t
are the (, )ftth entries of Y and ,Y
t
respect-
ively, and ()d is the divergence between two scalars.
The optimal values A
*
and X
*
of A and X are obtained by
minimizing this divergence.
,(||),.argmin 0DAX Y AX A 0X
**
A,X
**= (3)
Here, we assume that both the atoms A
*
and their activations
X
*
must be obtained from the decomposition. However, if the
atoms A are prespecified, then decomposition only requires
estimation of the activations
(|| ) .argmin DXYAXX0
*
X
*= (4)
A similar solution may also be defined when X is specified, and
A
*
must be obtained.
The most commonly used divergence in matrix decomposition
problems is the squared error:
( || ) || || .D YAX Y AX
F
2
=- How-
ever, in the context of audio modeling, other divergence measures
have been found more appropriate [3], [19], [20]. Audio signals
typically have a large dynamic range—i.e., the energy in high-
frequency components can be tens of decibels lower than that in
low-frequency components, even when both are perceptually
equally important. The magnitude of errors in decomposition
tends to be much larger in lower frequencies than in high ones.
The squared error emphasizes the larger errors and, as a result,
decompositions that minimize the squared error emphasize the
accuracy in lower frequencies at the cost of perceptually important
higher frequencies. Divergence measures that assign greater
emphasis to low-energy components are required for audio.
For representing audio, two commonly used divergences are
the generalized Kullback–Leibler (KL) divergence
(,) (/)logdyy y yy yy
KL
=-+
ttt
(5)
and the Itakura–Saito (IS) divergence
(,) / (/) .logdyy yy yy 1
IS
=- -
tt t
(6)
The divergences in (5) and (6) and the squared error ( , )dyy
SQ
=
t
()yy
2
-
t
are illustrated in Figure 4 for two values of y as the func-
tion of .y
t
0 0.5 1 1.5 2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 1 2 3 4
Squared
KL
IS
Squared
KL
IS
d (y, y )
∧
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
d (y, y )
∧
y
∧
y
∧
(a) (b)
[FIG4] An illustration of the typical divergence functions used in NMF. The divergences are calculated for an observation (a)
y 1= and
(b) y 2= as the function of the model .y
t
The scale of the input affects the scale of the divergence.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®