Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [132] MARCH 2015
additional requirement that atoms must be normalized after
every iteration. There exists also ways to take the normalization
into account in the update, which guarantee that the updates
and normalization together decrease the value of the cost func-
tion [3], [31].
One of the most common constraints is that of sparsity, e.g.,
[4], [32], and [33]. A vector
x is said to be sparse if the number
of nonzero entries in it is fewer than the dimensionality of the
vector itself, i.e.,
.Fx
0
1 The fewer the nonzero elements, the
sparser the vector is said to be. Sparsity is most commonly
applied to the activations, i.e., the columns of the activation
matrix
.X The sparsity constraint is most commonly included
by employing the
1
, norm of the activation matrix as a regular-
izer, i.e., ( ) || || [ ] .xtXX
tk
k1
U ==
//
This leads to the fol-
lowing update rule for the activations:
.
1
XX
A
A
AX
Y
! 7
m+
<
<
(14)
Other constraints may be similarly applied by modifying the regu-
larization function ()XU to favor the type of solutions desired.
Similarly, regularization functions may be applied on dictionary ,A
in which case the update rule of A should be modified. In the con-
text of compositional models for audio, the types of regularizations
applied on the dictionary include sparsity [32] and dissimilarity
between learned atoms and generic speech templates [34].
It must be noted that despite the introduction of regulariza-
tion terms, both (3) and (12) are still typically biconvex, and no
algorithm is guaranteed to reach the global minimum in prac-
tice. Different algorithms and initializations lead to different
solutions, and any solution obtained will, at best, be a local opti-
mum. In practice, this can result in some degree of variation in
the signal processing outcomes obtained through these
decompositions.
The entire discussion in this section also applies to the
PLCA decompositions, although the manner in which the
regularization terms are applied within the PLCA framework is
different. We refer the reader to [14], [27], and [35] for addi-
tional discussion of this topic.
SOURCE SEPARATION
Sound source separation refers to the problem of extracting a sin-
gle or several signals of interest from a mixture containing multi-
ple signals. This operation is central to many signal processing
applications because the fundamental algorithms are typically
built under the assumption that we operate on a clean target sig-
nal with minimal interference. Having the ability to remove
unwanted components from a recording can allow us to perform
subsequent operations that expect a clean input (e.g., speech rec-
ognition or pitch detection). We will predominantly focus on the
case where we only observe a single-channel mixture and briefly
discuss multichannel approaches later in the article.
The compositional model approach to the separation of sig-
nals from single-channel recordings addresses the problem in a
rather simple manner. It assumes that any sound source can draw
upon a characteristic set of atomic sounds to generate signals.
Here, a source can refer to an actual sound source or to some
other grouping of acoustic phenomena that should be jointly
modeled, such as background noise or even a collection of sound
classes that must be distinguished from a target class. A mixture
of signals from distinct sources is composed of atoms from the
individual sources. Hence, the separation of any particular com-
ponent signal from a mixture only requires the segregation of the
contribution of the atoms from that source from the mixture.
Mathematically, we can explain this as follows. We use the
NMF formulation in our explanation. Let matrix
A
s
represent
the set of atoms employed by the sth source. We will refer to it
as a dictionary of atoms for that source. Any spectrogram Y
s
from the sth source is composed from the atoms in the diction-
ary A
s
as .YAX
sss
= A mixed signal Y
mix
combining signals
from several sources is given by
.YYAX
s
s
s
s
smix
==
//
(15)
Equation (15) can be written more compactly as follows. Let
AAA
12
g=
6
@
be a matrix composed by stacking the dictionaries
for all the sources side by side. Let XXX
12
g=
<
<
<
6
@
be a matrix
composed by stacking the activations for all the sources vertically.
We can now express the mixed signal in compact form as
.YAX
mix
=
The contribution of the sth source to Y
mix
is simply .YAX
sss
=
In unsupervised source separation, both ,AX are estimated from
the observation ,Y
mix
followed by a process that identifies which
source each atom is predominantly associated with. In a super-
vised scenario for separation, the dictionaries,
,A
s
for each of the
sources are known a priori. We address the problem of creating
these dictionaries in the next section. Thus,
sA
s
6 are known, and
thereby so is .AX can now be estimated through iterations of (8).
The activations X
*
s
of source s can be extracted from the
estimated activation matrix X
*
by selecting the rows corre-
sponding to the atoms from the sth source. The estimated
spectrogram for the sth source is then simply computed as
.YAX
*
ss
s
=
t
(16)
An example of a source separation task using a dictionary repre-
senting isolated speech digits and a dictionary representing
background noises is shown in Figure 5.
In practice, the decomposition will not be exact and we will
only achieve approximate decomposition, i.e.,
,YAX
*
mix
. and
as a consequence, Y
mix
is not fully explained by the decomposi-
tion. Hence, the separated signal spectrograms given by (16)
will not explain the mixed signal completely.
To be able to account for all the energy in the input signal,
we can use an alternative method to extract the contributions of
the individual sources. Although the separated signals do not
completely explain the mixed signal, we assume that they do
nevertheless successfully characterize the relative proportions
of the individual signals in the mixture. This leads to the follow-
ing estimate for the separated signals:
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®