Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

Digital Object Identifier 10.1109/MSP.2014.2369251

Date of publication: 12 February 2015

ith the advancement of technology, both

assisted listening devices and speech com-

munication devices are becoming more por-

table and also more frequently used. As a

consequence, users of devices such as hearing

aids, cochlear implants, and mobile telephones, expect their

devices to work robustly anywhere and at any time. This holds in

particular for challenging noisy environments like a cafeteria, a

restaurant, a subway, a factory, or in traffic. One way to making

assisted listening devices robust to noise is to apply speech

enhancement algorithms. To improve the corrupted speech, spa-

tial diversity can be exploited by a constructive combination of

microphone signals (so-called beamforming), and by exploiting

the different spectrotemporal properties of speech and noise.

Here, we focus on single-channel speech enhancement algorithms

which rely on spectrotemporal properties. On the one hand, these

algorithms can be employed when the miniaturization of devices

only allows for using a single microphone. On the other hand,

when multiple microphones are available, single-channel algo-

rithms can be employed as a postprocessor at the output of a

beamformer. To exploit the short-term stationary properties of

natural sounds, many of these approaches process the signal in a

time-frequency representation, most frequently the short-time

discrete Fourier transform (STFT) domain. In this domain, the

coefficients of the signal are complex-valued, and can therefore be

represented by their absolute value (referred to in the literature

both as STFT magnitude and STFT amplitude) and their phase.

While the modeling and processing of the STFT magnitude has

been the center of interest in the past three decades, phase has

been largely ignored.

In this article, we review the role of phase processing for

speech enhancement in the context of assisted listening and

speech communication devices. We explain why most of the

research conducted in this field used to focus on estimating

spectral magnitudes in the STFT domain, and why recently

phase processing is attracting increasing interest in the speech

EARPHONES—IMAGE LICENSED BY INGRAM PUBLISHING

[

Timo Gerkmann, Martin Krawczyk-Becker, and Jonathan Le Roux

]

[

History and recent advances

]

Phase Processing

for Single-Channel

Speech Enhancement

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND