Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

IEEE SIGNAL PROCESSING MAGAZINE [63] MARCH 2015

is not processed alongside. To illustrate this, let us consider a

speech signal degraded by an impulse train with a period length of

which is nonzero every N T f

s00

= samples. In Figure 4, the

noisy signal (a) is presented together with the result obtained

when combining the true clean speech STFT magnitudes with the

noisy phase (b). Even though the clean magnitude is employed,

which represents the best possible result for phase-blind magni-

tude enhancement, the time-domain signal still depicts residual

impulses, which are caused by the noisy phase. In regions where

the enhanced spectral magnitude is close to zero, i.e., in speech

absence, the phase is not relevant and the peaks are well sup-

pressed. During speech presence, however, the spectral magnitude

is nonzero and the phase becomes important. Accordingly, the

residual impulses are most prominent in regions with some speech

energy at low local SNRs, where the noisy phase is close to the

phase of the impulsive noise.

Recently, Sugiyama and Miyahara proposed the concept of

phase randomization to overcome this issue; see, e.g.,[27] and

references therein. First, time-frequency points that are domi-

nated by speech are identified by finding spectral peaks in the

noisy signal. These peaks are excluded from the phase randomi-

zation to avoid speech distortions. To further narrow down

time-frequency regions where randomization of the spectral

phase is sensible, phase-based transient detection can be

employed as well [27]. Then, the spectral phase in bins classified

as dominated by transient noise is randomized by adding a

phase term that is uniformly distributed between

r- and .r In

this way, the approximately linear phase of the dominant noise

component is neutralized. The effect of phase randomization is

depicted in Figure 4(c), where a perfect magnitude estimate is

combined with the modified phase for signal reconstruction. It

can be seen that the residual peaks that are present when the

noisy phase is employed are strongly attenuated, showing that

phase randomization can indeed lead to a considerable increase

of noise reduction, especially in low local SNRs. It is interesting

to note that while the previously described iterative and sinusoi-

dal model-based approaches aim at estimating the phase of the

clean speech signal, the phase randomization approach merely

aims at reducing the impact of the phase of the noise on the

enhanced speech signal. Although the presented example is just

a simple toy experiment, it still highlights the potential of phase

randomization toward an improved suppression of transient

noise, which has also been observed for real-world impulsive

noise, like tapping noise on a touchscreen [27].

RELATION BETWEEN PHASE- AND

MAGNITUDE ESTIMATION

So far, we have discussed phase estimation using iterative

approaches, sinusoidal model-based approaches, and group

delay approaches; we now address the question of how STFT

phase estimation can best be employed to improve speech

enhancement. The most obvious way to do this is to combine

enhanced speech spectral magnitudes in the STFT domain with

the estimated or reconstructed STFT phases. It is interesting to

note that Wang and Lim [10] already stated that obtaining a

more accurate phase estimate than the noisy phase is not worth

the effort “ if the estimate is used to reconstruct a signal by

combining it with an independently estimated magnitude [...].

However, if a significantly different approach is used to exploit

the phase information such as using the phase estimate to fur-

ther improve the magnitude estimate, then a more accurate

estimation of phase may be important” [10]. However, at that

point it was not clear how a phase estimate could be employed

to improve magnitude estimation.

Gerkmann and Krawczyk [25] derived an MMSE estimator of

the spectral magnitude when an estimate of the clean speech

phase is available, referred to as phase-sensitive or phase-aware

magnitude estimation. They were able to show that the informa-

tion of the speech spectral phase can be employed to derive an

improved magnitude estimator that is capable of reducing noise

outliers that are not tracked by the noise PSD estimator. In babble

noise, in a blind setup, the PESQ MOS can be improved by 0.25

points in voiced speech at 0 dB input SNR [25]. Further experi-

mental results are given in the following section.

Instead of estimating phase and magnitude separately, one may

argue that they should ideally be jointly estimated. The first step in

this direction was proposed by Le Roux and Vincent [29] and refer-

ences therein in the context of Wiener filtering for speech

[FIG4] (a) Speech degraded by a click train. (b) Signal obtained by combination of the clean speech spectral magnitude with the noisy

phase. (c) Signal after supplemental phase randomization. Samples that contain a click are highlighted in red.

Noisy Speech

Time (s)

(a)

0.2 0.4 0.6 0.8

–1

–0.5

0.5

Time (s)

(b)

0.2 0.4 0.6 0.8

–1

–0.5

0.5

Time (s)

(c)

0.2 0.4 0.6 0.8

–1

–0.5

0.5

Enhanced Speech

Before Phase Randomization

Enhanced Speech

After Phase Randomization

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND