Zoom out Search Issue
IEEE SIGNAL PROCESSING MAGAZINE [138] MARCH 2015
Filtering, which corresponds to convolution in the time
domain, can be expressed as a pointwise multiplication in the fre-
quency domain. In the context of compositional models, the fil-
tering can therefore be modeled in the magnitude spectral
domain by pointwise multiplication of the magnitude spectrum
of the excitation and the magnitude spectrum response of the fil-
ter. Assuming a fixed magnitude spectrum response of the filter
that is denoted by the length-
F column vector ,h the model for a
filtered atom a
n
is given as
,aeh
kk
7= (18)
where e
k
is the excitation of the kth atom. Here, all the atoms
share the same filter, and the model for an input spectrum y
t
in
frame t is
()[].xtyah
t
k
K
kk
1
7=
=
t
/
(19)
When multiple sources are modeled, the atoms of each source
can also have a separate filter [5]. The free parameters of an
excitation-filter model can be estimated using the principles
described in the previous sections—by applying iteratively
update rules for each of the terms that decrease the divergence
between an observed spectrogram and the model. Even for com-
plex models like this, deriving update rules is rather straightfor-
ward using the principles presented in [3], [57], and [58].
Excitations can often be parameterized quite compactly: e.g., in
music signal processing, it is known that many sources are harmonic
and many sources have a distinct set of fundamental frequency val-
ues that they can produce, each corresponding to a harmonic spec-
trum with different fundamental f
0
. Therefore, many excitation-filter
models use a fixed set of harmonic excitations [5], [55], [58].
The filters, on the other hand, are specific to each instrument,
recording environment, or microphone. To avoid the filter model-
ing harmonic structures when learned in an unsupervised man-
ner, smooth filters over frequency can be obtained, e.g., by using
constraints on two adjacent filter values [56], or by modeling fil-
ters a sum of smooth elementary filter atoms [55].
Figure 10 gives an example of an atom being modeled using
the excitation-filter model. The filter is modeled as the sum of
spectrally smooth filter atoms to make the filter also spectrally
smooth. The excitation is a flat harmonic spectrum. The mod-
eled atom can have a high frequency resolution, but it is param-
eterized only by the activations of few filter atoms and the pitch
of the harmonic excitation. The model therefore offers an effi-
cient way to adapt generic harmonic atoms to represent any
harmonic signals.
The filter part of the excitation-filter model is able to compen-
sate any linear channel effects. Therefore, the excitation-filter
model can also be applied in a scenario in which the atoms in a
dictionary that are acquired in specific conditions are viewed as
Original Speech
Frequency (kHz)
0.5 1 1.5 2 2.5
0.5 1 1.5 2 2.5
0.5
1
1.5
2
2.5
3
3.5
4
Missing Speech Reconstructed Using NMF
Time (s)
Time (s)
(a)
(b)
Frequency (kHz)
MissingOriginal
0.5
1
1.5
2
2.5
3
3.5
4
[FIG9] An example of bandwidth extension of the spoken sequence of digits “nine five oh.” (a) The log-scaled spectrogram of the full-
bandwidth signal. (b) The reconstruction of the top half obtained using only the 256 lowest frequency bands. For this reconstruction, an
exemplar-based, speaker-dependent dictionary of 10,000 atoms was used, randomly extracted from a nonoverlapping data set. We can
observe that although some fine detail is lost, the overall structure is captured very well.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®