Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [142] MARCH 2015
between the 1960s and the 1980s, stereo panning was often
used as a strong effect and sources may have significantly differ-
ent amplitudes. Similarly, if a signal is captured by multiple
microphones that are far away from each other, sourcewise
amplitude differences between microphones are large. When a
signal is captured by a microphone array where the micro-
phones are close to each other, the amplitude differences
between channels are typically small, but there are clear phase
differences between the signals. In this scenario, techniques
[10], [64] that model spectrogram magnitudes with the basic
NMF model and phase differences between the channels with
another model have shown potential.
DISCUSSION
Even though compositional models are a fairly new technique
in the context of audio signal processing, as we have shown in
this article, they are applicable to many fundamental audio pro-
cessing tasks such as source separation, classification, and dere-
verberation. The compositional nature of the model, the
modeling of a spectrogram as a nonnegative sum of atoms
having a fixed spectrum and a time-varying gain, is intuitive and
offers clear interpretations of the model parameters. This makes
it easy to analyze representations obtained with the model, both
algorithmically and manually, e.g., by visualizing the models.
The linear nature of the model also offers other advantages.
Even when more complex models are used that combine mul-
tiple extensions described earlier, the linearity makes it straight-
forward to derive optimization algorithms for the estimation of
the model parameters. Unlike some methods conventionally
used for modeling multiple co-occurring sources (e.g., factorial
hidden Markov models), the computational complexity of com-
positional model algorithms scales linearly as the function of
the number of sources.
Compositional models have also some limitations. In the con-
text of audio processing, they are mainly applied on magnitudes of
time–frequency representations and require additional phase
information for signal reconstruction. Therefore, the models have
mainly applications in analyzing or processing existing signals,
and their applicability in, e.g., sound synthesis is limited. Because
of the linearity of the models, compositional models are also not
Spectrograms of the Left and Right Channel
1 2 3 4
2
4
6
Time (s)
Amplitude
Activations x
k
[t]
1020304050
0
1
2
3
4
Frequency (kHz)
Amplitude
Atoms a
k
1
2
3
4
5
6
Right
Left
0
0.5
1
Component
Index
Channel Gains g
k, c
Gain
(a) (b)
(c) (d)
[FIG14] The tensor factorization of multichannel audio. (a) Atom spectra, (b) an illustration of a stereo signal’s left and right
channel spectrograms factorized into an outer product of the atom spectra, (c) channel gains, and (d) activations in time.
Each atom is represented with a different color.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®