Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

131

132

133

134

135

136

137

138

139

140

IEEE SIGNAL PROCESSING MAGAZINE [129] MARCH 2015

the sum of factors having a fixed spectrum a

and time-varying

activation .[]xt

Representing the activation of the kth atom to

all of the spectral vectors in Y as a vector [ [ ] [ ]xx12x

kkk

[]],xT

we can represent the overall contribution of a

to Y

as .ax

We can arrange all of the atoms , ,kK1a

g= as columns

of a matrix .A R

We can similarly arrange the activation

vectors of the atoms, , kK1x

g= as rows of the a matrix

.X R

The composition of Y in terms of the atoms and

their activations can now be written as

,YAX. (1)

where all entries are strictly nonnegative.

To decompose the signal into its atomic parts, we must deter-

mine the

A and X that together satisfy (1) most closely. To do so,

we define a scalar-valued divergence (|| )D YAX between the

observed spectrogram Y and the decomposition ,AX which char-

acterizes the error between the two. The minimum value of the

divergence is zero, which is only reached if the error is zero, i.e.,

X= Typically, the divergence is calculated entry wise, i.e.,

(||) ( , ),DdyyYY

ft ft

(2)

where y

,ft

and y

,ft

are the (, )ftth entries of Y and ,Y

respect-

ively, and ()d is the divergence between two scalars.

The optimal values A

and X

of A and X are obtained by

minimizing this divergence.

,(||),.argmin 0DAX Y AX A 0X

A,X

**= (3)

Here, we assume that both the atoms A

and their activations

must be obtained from the decomposition. However, if the

atoms A are prespecified, then decomposition only requires

estimation of the activations

(|| ) .argmin DXYAXX0

*= (4)

A similar solution may also be defined when X is specified, and

must be obtained.

The most commonly used divergence in matrix decomposition

problems is the squared error:

( || ) || || .D YAX Y AX

=- How-

ever, in the context of audio modeling, other divergence measures

have been found more appropriate [3], [19], [20]. Audio signals

typically have a large dynamic range—i.e., the energy in high-

frequency components can be tens of decibels lower than that in

low-frequency components, even when both are perceptually

equally important. The magnitude of errors in decomposition

tends to be much larger in lower frequencies than in high ones.

The squared error emphasizes the larger errors and, as a result,

decompositions that minimize the squared error emphasize the

accuracy in lower frequencies at the cost of perceptually important

higher frequencies. Divergence measures that assign greater

emphasis to low-energy components are required for audio.

For representing audio, two commonly used divergences are

the generalized Kullback–Leibler (KL) divergence

(,) (/)logdyy y yy yy

=-+

ttt

(5)

and the Itakura–Saito (IS) divergence

(,) / (/) .logdyy yy yy 1

=- -

tt t

(6)

The divergences in (5) and (6) and the squared error ( , )dyy

()yy

are illustrated in Figure 4 for two values of y as the func-

tion of .y

0 0.5 1 1.5 2

0.2

0.4

0.6

0.8

1.2

1.4

1.6

1.8

0 1 2 3 4

Squared

d (y, y )

∧

0.2

0.4

0.6

0.8

1.2

1.4

1.6

1.8

d (y, y )

∧

(a) (b)

[FIG4] An illustration of the typical divergence functions used in NMF. The divergences are calculated for an observation (a)

y 1= and

(b) y 2= as the function of the model .y

The scale of the input affects the scale of the divergence.

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND