Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

IEEE SIGNAL PROCESSING MAGAZINE [65] MARCH 2015

ee.ic.ac.uk/hp/staff/dmb/voicebox/

voicebox.html). Because with [4] we

only have a phase estimate in voiced

sounds, we show the improvement in

voiced segments alongside the overall

improvement for entire utterances in

Figure 5. When the fundamental fre-

quency estimator detects unvoiced

speech segments, the estimators fall

back to a phase-blind estimation.

Thus, if evaluated over entire signals,

the results of the phase-aware esti-

mators will get closer to the phase-blind approaches while the

general trends remain.

It can be seen that employing phase information to improve

magnitude estimation [25] can indeed improve PESQ. The domi-

nant benefit of the phase-aware magnitude estimators is that the

phase provides additional information to distinguish between noise

outliers and speech. Thus, the stronger the outliers after process-

ing with phase-blind approaches, the larger the potential benefit of

phase-aware processing. While here we show the average result

over four noise types, a consistent improvement for the tested non-

stationary noise types has been observed. While in stationary pink

noise the PESQ scores are virtually unchanged, the largest

improvements are achieved in babble. This is because babble

bursts are often of high energy and may result in large outliers in

phase-blind magnitude estimation that can be reduced by exploit-

ing the additional information in the phase.

When an initial phase estimate is also employed as uncertain

prior information when improving the spectral phase as proposed

in the phase-aware complex estimator CUP [12], the performance

can be improved further. The CUP estimator [12] employs the

probability of a signal segment being voiced to control the cer-

tainty of the initial phase estimate. In unvoiced speech, the uncer-

tainty is largest, effectively resulting in a phase-blind estimator.

Therefore, again, we can only expect a PESQ improvement in

voiced speech. Compared to phase-blind magnitude estimation

[30] in voiced speech and at an input SNR of 0 dB, an improve-

ment in PESQ by 0.12 points is achieved when all parameters are

blindly estimated, while 0.18 points are gained with an oracle fun-

damental frequency. Considering that the improvement of the

phase-blind estimator improves PESQ by 0.46 points, the addi-

tional improvement of 0.18 points by incorporating phase infor-

mation in voiced speech is remarkable (factor 1.4), and

demonstrates the potential of phase processing for the improve-

ment of speech enhancement algorithms. While the average

improvements using phase processing are still moderate, in spe-

cific scenarios, e.g., in voiced sounds or impulsive noise, phase pro-

cessing can help to reduce noise more effectively than using

phase-blind approaches. Audio examples can be found at www.

speech.uni-oldenburg.de/pasp.html.

FUTURE DIRECTIONS

While the majority of single-channel STFT domain speech

enhancement algorithms only address the modification of STFT

magnitudes, in this article we

reviewed methods that also involve

STFT phase modifications. We

showed that phase estimation could

be done mainly based on models of

the signal or by exploiting redun-

dancy in the STFT representation.

Examples for model-based algo-

rithms are sinusoidal model-based

approaches, and approaches that

employ the group delay. By contrast,

iterative approaches mainly rely on

the spectrotemporal correlations introduced by the redundancy

of the STFT representation with overlapping signal segments.

While the results of the instrumental evaluations indicate that a

sophisticated utilization of phase information can lead to

improvements in speech quality, for a conclusive assessment, for-

mal listening tests are required, rendering the subjective evalu-

ation of particularly promising phase-aware algorithms a

necessity for future research.

Despite recent advances, there are still many open issues in

phase processing. For instance, similar to magnitude estimation,

phase estimation is still difficult in very low SNRs. A promising

approach for performance improvement is to join the different

types of phase processing approaches, such as by including more

explicit signal models into iterative phase estimation approaches or

vice versa. A first step in this direction is presented in [26]. As

another example, while the consistent Wiener filter only exploits

the phase structure of the STFT representation, an exciting chal-

lenge going forward is to integrate models of the phase structure of

the signal itself into a joint optimization framework.

Modern machine-learning approaches such as deep neural net-

works, which have proven to be very successful in improving

speech recognition performance, have recently been shown to lead

to state-of-the-art performance for speech enhancement using a

magnitude-based approach. The natural next step is to extend their

use to phase estimation to further improve performance. On top of

the fact that they are data driven, which reduces the necessity for

modeling assumptions that may be inaccurate, a great advantage

of such methods over the iterative approaches for phase estimation

presented here or approaches based on nonnegative matrix factori-

zation or Gaussian mixture models, is that they can typically be

efficiently evaluated at test time.

Indeed, striving for fast, lightweight algorithms is critical in the

context of assisted listening and speech communication devices,

where special requirements with respect to complexity and latency

persist. While more and more computational power will be availa-

ble with improved technology, for economic reasons as well as to

limit power consumption, it is always of interest to keep the com-

plexity as low as possible. Thus, more research in reducing com-

plexity remains of interest. Complexity reduction could be

obtained, for instance, by decreasing the overlap of the STFT analy-

sis, but its impact on performance of phase estimation algorithms

is not well studied. On the other hand, the lower bound on the

latency of the algorithms is dominated by the window lengths in

A PROMISING APPROACH

FOR PERFORMANCE IMPROVEMENT

IS TO JOIN THE DIFFERENT

TYPES OF PHASE PROCESSING

APPROACHES, SUCH AS BY

INCLUDING MORE EXPLICIT

SIGNAL MODELS INTO ITERATIVE

PHASE ESTIMATION

APPROACHES OR VICE VERSA.

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND

____________________

___

________

______________________