Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

141

142

143

144

145

146

147

148

149

150

IEEE SIGNAL PROCESSING MAGAZINE [142] MARCH 2015

between the 1960s and the 1980s, stereo panning was often

used as a strong effect and sources may have significantly differ-

ent amplitudes. Similarly, if a signal is captured by multiple

microphones that are far away from each other, sourcewise

amplitude differences between microphones are large. When a

signal is captured by a microphone array where the micro-

phones are close to each other, the amplitude differences

between channels are typically small, but there are clear phase

differences between the signals. In this scenario, techniques

[10], [64] that model spectrogram magnitudes with the basic

NMF model and phase differences between the channels with

another model have shown potential.

DISCUSSION

Even though compositional models are a fairly new technique

in the context of audio signal processing, as we have shown in

this article, they are applicable to many fundamental audio pro-

cessing tasks such as source separation, classification, and dere-

verberation. The compositional nature of the model, the

modeling of a spectrogram as a nonnegative sum of atoms

having a fixed spectrum and a time-varying gain, is intuitive and

offers clear interpretations of the model parameters. This makes

it easy to analyze representations obtained with the model, both

algorithmically and manually, e.g., by visualizing the models.

The linear nature of the model also offers other advantages.

Even when more complex models are used that combine mul-

tiple extensions described earlier, the linearity makes it straight-

forward to derive optimization algorithms for the estimation of

the model parameters. Unlike some methods conventionally

used for modeling multiple co-occurring sources (e.g., factorial

hidden Markov models), the computational complexity of com-

positional model algorithms scales linearly as the function of

the number of sources.

Compositional models have also some limitations. In the con-

text of audio processing, they are mainly applied on magnitudes of

time–frequency representations and require additional phase

information for signal reconstruction. Therefore, the models have

mainly applications in analyzing or processing existing signals,

and their applicability in, e.g., sound synthesis is limited. Because

of the linearity of the models, compositional models are also not

Spectrograms of the Left and Right Channel

1 2 3 4

Time (s)

Amplitude

Activations x

[t]

1020304050

Frequency (kHz)

Amplitude

Atoms a

Right

Left

0.5

Component

Index

Channel Gains g

k, c

Gain

(a) (b)

[FIG14] The tensor factorization of multichannel audio. (a) Atom spectra, (b) an illustration of a stereo signal’s left and right

channel spectrograms factorized into an outer product of the atom spectra, (c) channel gains, and (d) activations in time.

Each atom is represented with a different color.

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND