Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [65] MARCH 2015
ee.ic.ac.uk/hp/staff/dmb/voicebox/
voicebox.html). Because with [4] we
only have a phase estimate in voiced
sounds, we show the improvement in
voiced segments alongside the overall
improvement for entire utterances in
Figure 5. When the fundamental fre-
quency estimator detects unvoiced
speech segments, the estimators fall
back to a phase-blind estimation.
Thus, if evaluated over entire signals,
the results of the phase-aware esti-
mators will get closer to the phase-blind approaches while the
general trends remain.
It can be seen that employing phase information to improve
magnitude estimation [25] can indeed improve PESQ. The domi-
nant benefit of the phase-aware magnitude estimators is that the
phase provides additional information to distinguish between noise
outliers and speech. Thus, the stronger the outliers after process-
ing with phase-blind approaches, the larger the potential benefit of
phase-aware processing. While here we show the average result
over four noise types, a consistent improvement for the tested non-
stationary noise types has been observed. While in stationary pink
noise the PESQ scores are virtually unchanged, the largest
improvements are achieved in babble. This is because babble
bursts are often of high energy and may result in large outliers in
phase-blind magnitude estimation that can be reduced by exploit-
ing the additional information in the phase.
When an initial phase estimate is also employed as uncertain
prior information when improving the spectral phase as proposed
in the phase-aware complex estimator CUP [12], the performance
can be improved further. The CUP estimator [12] employs the
probability of a signal segment being voiced to control the cer-
tainty of the initial phase estimate. In unvoiced speech, the uncer-
tainty is largest, effectively resulting in a phase-blind estimator.
Therefore, again, we can only expect a PESQ improvement in
voiced speech. Compared to phase-blind magnitude estimation
[30] in voiced speech and at an input SNR of 0 dB, an improve-
ment in PESQ by 0.12 points is achieved when all parameters are
blindly estimated, while 0.18 points are gained with an oracle fun-
damental frequency. Considering that the improvement of the
phase-blind estimator improves PESQ by 0.46 points, the addi-
tional improvement of 0.18 points by incorporating phase infor-
mation in voiced speech is remarkable (factor 1.4), and
demonstrates the potential of phase processing for the improve-
ment of speech enhancement algorithms. While the average
improvements using phase processing are still moderate, in spe-
cific scenarios, e.g., in voiced sounds or impulsive noise, phase pro-
cessing can help to reduce noise more effectively than using
phase-blind approaches. Audio examples can be found at www.
speech.uni-oldenburg.de/pasp.html.
FUTURE DIRECTIONS
While the majority of single-channel STFT domain speech
enhancement algorithms only address the modification of STFT
magnitudes, in this article we
reviewed methods that also involve
STFT phase modifications. We
showed that phase estimation could
be done mainly based on models of
the signal or by exploiting redun-
dancy in the STFT representation.
Examples for model-based algo-
rithms are sinusoidal model-based
approaches, and approaches that
employ the group delay. By contrast,
iterative approaches mainly rely on
the spectrotemporal correlations introduced by the redundancy
of the STFT representation with overlapping signal segments.
While the results of the instrumental evaluations indicate that a
sophisticated utilization of phase information can lead to
improvements in speech quality, for a conclusive assessment, for-
mal listening tests are required, rendering the subjective evalu-
ation of particularly promising phase-aware algorithms a
necessity for future research.
Despite recent advances, there are still many open issues in
phase processing. For instance, similar to magnitude estimation,
phase estimation is still difficult in very low SNRs. A promising
approach for performance improvement is to join the different
types of phase processing approaches, such as by including more
explicit signal models into iterative phase estimation approaches or
vice versa. A first step in this direction is presented in [26]. As
another example, while the consistent Wiener filter only exploits
the phase structure of the STFT representation, an exciting chal-
lenge going forward is to integrate models of the phase structure of
the signal itself into a joint optimization framework.
Modern machine-learning approaches such as deep neural net-
works, which have proven to be very successful in improving
speech recognition performance, have recently been shown to lead
to state-of-the-art performance for speech enhancement using a
magnitude-based approach. The natural next step is to extend their
use to phase estimation to further improve performance. On top of
the fact that they are data driven, which reduces the necessity for
modeling assumptions that may be inaccurate, a great advantage
of such methods over the iterative approaches for phase estimation
presented here or approaches based on nonnegative matrix factori-
zation or Gaussian mixture models, is that they can typically be
efficiently evaluated at test time.
Indeed, striving for fast, lightweight algorithms is critical in the
context of assisted listening and speech communication devices,
where special requirements with respect to complexity and latency
persist. While more and more computational power will be availa-
ble with improved technology, for economic reasons as well as to
limit power consumption, it is always of interest to keep the com-
plexity as low as possible. Thus, more research in reducing com-
plexity remains of interest. Complexity reduction could be
obtained, for instance, by decreasing the overlap of the STFT analy-
sis, but its impact on performance of phase estimation algorithms
is not well studied. On the other hand, the lower bound on the
latency of the algorithms is dominated by the window lengths in
A PROMISING APPROACH
FOR PERFORMANCE IMPROVEMENT
IS TO JOIN THE DIFFERENT
TYPES OF PHASE PROCESSING
APPROACHES, SUCH AS BY
INCLUDING MORE EXPLICIT
SIGNAL MODELS INTO ITERATIVE
PHASE ESTIMATION
APPROACHES OR VICE VERSA.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
____________________
___
________
______________________