Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

131

132

133

134

135

136

137

138

139

140

IEEE SIGNAL PROCESSING MAGAZINE [138] MARCH 2015

Filtering, which corresponds to convolution in the time

domain, can be expressed as a pointwise multiplication in the fre-

quency domain. In the context of compositional models, the fil-

tering can therefore be modeled in the magnitude spectral

domain by pointwise multiplication of the magnitude spectrum

of the excitation and the magnitude spectrum response of the fil-

ter. Assuming a fixed magnitude spectrum response of the filter

that is denoted by the length-

F column vector ,h the model for a

filtered atom a

is given as

,aeh

7= (18)

where e

is the excitation of the kth atom. Here, all the atoms

share the same filter, and the model for an input spectrum y

frame t is

()[].xtyah

(19)

When multiple sources are modeled, the atoms of each source

can also have a separate filter [5]. The free parameters of an

excitation-filter model can be estimated using the principles

described in the previous sections—by applying iteratively

update rules for each of the terms that decrease the divergence

between an observed spectrogram and the model. Even for com-

plex models like this, deriving update rules is rather straightfor-

ward using the principles presented in [3], [57], and [58].

Excitations can often be parameterized quite compactly: e.g., in

music signal processing, it is known that many sources are harmonic

and many sources have a distinct set of fundamental frequency val-

ues that they can produce, each corresponding to a harmonic spec-

trum with different fundamental f

. Therefore, many excitation-filter

models use a fixed set of harmonic excitations [5], [55], [58].

The filters, on the other hand, are specific to each instrument,

recording environment, or microphone. To avoid the filter model-

ing harmonic structures when learned in an unsupervised man-

ner, smooth filters over frequency can be obtained, e.g., by using

constraints on two adjacent filter values [56], or by modeling fil-

ters a sum of smooth elementary filter atoms [55].

Figure 10 gives an example of an atom being modeled using

the excitation-filter model. The filter is modeled as the sum of

spectrally smooth filter atoms to make the filter also spectrally

smooth. The excitation is a flat harmonic spectrum. The mod-

eled atom can have a high frequency resolution, but it is param-

eterized only by the activations of few filter atoms and the pitch

of the harmonic excitation. The model therefore offers an effi-

cient way to adapt generic harmonic atoms to represent any

harmonic signals.

The filter part of the excitation-filter model is able to compen-

sate any linear channel effects. Therefore, the excitation-filter

model can also be applied in a scenario in which the atoms in a

dictionary that are acquired in specific conditions are viewed as

Original Speech

Frequency (kHz)

0.5 1 1.5 2 2.5

0.5

1.5

2.5

3.5

Missing Speech Reconstructed Using NMF

Time (s)

(a)

(b)

Frequency (kHz)

MissingOriginal

0.5

1.5

2.5

3.5

[FIG9] An example of bandwidth extension of the spoken sequence of digits “nine five oh.” (a) The log-scaled spectrogram of the full-

bandwidth signal. (b) The reconstruction of the top half obtained using only the 256 lowest frequency bands. For this reconstruction, an

exemplar-based, speaker-dependent dictionary of 10,000 atoms was used, randomly extracted from a nonoverlapping data set. We can

observe that although some fine detail is lost, the overall structure is captured very well.

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND