Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [137] MARCH 2015
information to transmit, which negatively impacts intelligibility
and the perception of quality. To use the compositional model
approach for this task, two dictionaries are first constructed: a
bandwidth-limited dictionary
A and a full-bandwidth dictionary .L
The atoms in the dictionaries should be coupled, i.e., each atom in
A should represent a band-limited version of the corresponding
atom in .L This can be done through training on parallel corpora
of full-bandwidth and band-limited signals, or by calculating L
from ,A if the details of the band-limitation process are known
and can be modeled computationally. We then estimate the atom
activations
[]tx using the limited-bandwidth observation y
t
and
the limited-bandwidth dictionary .A Finally, direct application of
(17) serves as a replacement for the audio reconstruction []tAx
and yields a full-bandwidth reconstruction. We illustrate this pro-
cess in Figure 9. Very similar principles underlay voice conversion,
in which the associated audio is another speaker [51], [52].
Missing data imputation [29], [53], [54] is closely related to
bandwidth extension in that the goal is to estimate a full-spec-
trum audio signal, but with the difference that the missing data
are not a set of predetermined frequency bands but rather arbi-
trary located time–frequency entries of the spectrogram. Algo-
rithms for compositional models can be easily modified so that
model parameters are estimated using only a part of the observed
data (ignoring missing data) [29], [54], but the model output can
be calculated also for entries corresponding to the missing data.
Provided that there is a sufficient amount of observed (not miss-
ing) data, which will allow estimating the activations (and atoms
in the case of unsupervised processing), reasonable estimates of
missing values can be obtained because of dependencies between
observed and missing values. In general, the quality of a model
can be judged by its ability to make predictions, and the capability
of compositional models to predict missing data also illustrates
its effectiveness.
EXCITATION-FILTER MODEL AND
CHANNEL COMPENSATION
Creating dictionaries from training data, as presented earlier,
yields accurate representations as long as the data from which the
dictionaries are learned match the observed data. In many prac-
tical situations, this is not the case, and there is a need to adapt the
learned dictionaries. Moreover, we often have knowledge about the
types of sources to be modeled, e.g., that they are musical instru-
ments but do not have suitable training data to estimate the dic-
tionaries in a supervised manner.
Natural sound sources can be modeled as an excitation sig-
nal being filtered by an instrument body filter or vocal tract fil-
ter. These kinds of excitation- or source-filter models have been
very effective, e.g., in speech coding (several codecs use it). In
addition to modeling the properties of a body filter, the filter can
also model the response from a source to a microphone and,
therefore, also do channel compensation.
In the context of compositional models, excitation-filter models
have been found useful in, e.g., music processing [55], [56], where
both the excitations and filters contain different type of informa-
tion: excitations typically consists of harmonic spectra with differ-
ent fundamental frequency values and are therefore useful in pitch
estimation, whereas the filter carries instrument-dependent infor-
mation that can be used for instrument recognition [5].
0 1 2 3 4 5 6 7 8 9 oh
0
0.1
0.2
0.3
Activation
0.4
0.5
0.6
0.7
Zero Noise Zero Two Noise
0.2x +0.1x +0.09x +0.08x +0.08x
Digit Labels
+. . .
[FIG8] By associating each dictionary atom from Figure 5 with a word label, the linear combination of speech atoms in Figure 5 serves
directly as evidence for the underlying word classes. We observe that the word zero, underlying the noisy observation of Figure 5,
does indeed obtain the highest score.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®