Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [66] MARCH 2015
STFT analysis and synthesis. Further research could therefore also
address phase estimation using low latency filter banks.
After many years in the shadow of magnitude-centric speech
enhancement, phase-aware signal processing is now burgeoning
and expanding quickly: with still many aspects to explore, it is an
exciting area of research that is likely to lead to important break-
throughs and push speech processing forward. Supplemental
material and further references can be found at www.speech.
uni-oldenburg.de/pasp.html.
ACKNOWLEDGMENT
This work was supported by grant GE2538/2-1 of the German
Research Foundation.
AUTHORS
Timo Gerkmann (timo.gerkmann@uni-oldenburg.de) received his
Dipl.-Ing. and Dr.-Ing. degrees in electrical engineering and infor-
mation technology in 2004 and 2010 from the Ruhr-Universität
Bochum, Germany. In 2005, he spent six months with Siemens
Corporate Research in Princeton, New Jersey, United States. From
2010 to 2011, he was a postdoctoral researcher at the Royal Institute
of Technology, Stockholm, Sweden. Since 2011, he has been a pro-
fessor for speech signal processing at the University of Oldenburg,
Germany. His main research interests are digital speech and audio
processing, including speech enhancement, dereverberation, mode-
ling of speech signals, speech recognition, and hearing devices.
Martin Krawczyk-Becker (martin.krawczyk-becker@
uni-oldenburg.de) studied electrical engineering and informa-
tion technology at the Ruhr-Universität Bochum, Germany. His
major was communication technology with a focus on audio
processing, and he received his Dipl.-Ing. degree in August
2011. From January 2010 to July 2010, he was with Siemens
Corporate Research in Princeton, New Jersey, United States.
Since November 2011, he has been pursuing his Ph.D. degree in
the field of speech enhancement and noise reduction at the
University of Oldenburg, Germany.
Jonathan Le Roux (leroux@merl.com) completed his B.Sc. and
M.Sc. degrees in mathematics at the Ecole Normale Supérieure,
Paris, France, and his Ph.D. degree at the University of Tokyo, Japan,
and the Université Pierre et Marie Curie, Paris, France. He is a prin-
cipal research scientist at Mitsubishi Electric Research Laboratories
in Cambridge, Massachusetts, United States, and was previously a
postdoctoral researcher at Nippon Telegraph and Telephone
Communication Science Laboratories. His research interests are in
signal processing and machine learning applied to speech and audio.
He is a Senior Member of the IEEE and a member of the IEEE Audio
and Acoustic Signal Processing Technical Committee.
REFERENCES
[1] D. W. Griffin and J. S. Lim, “Signal estimation from modified short-time Fourier
transform,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 32, no. 2, pp. 236
243, Apr. 1984.
[2] B. Yegnanarayana and H. Murthy, “Significance of group delay functions in spec-
trum estimation,” IEEE Trans. Signal Processing, vol. 40, no. 9, pp. 2281–2289, Sept.
1992.
[3] A. P. Stark and K. K. Paliwal, “Speech analysis using instantaneous frequency de-
viation,” in Proc. ISCA Interspeech, 2008, pp. 2602–2605.
[4] M. Krawczyk and T. Gerkmann, “STFT phase reconstruction in voiced speech for
an improved single-channel speech enhancement,” IEEE/ACM Trans. Audio, Speech,
Lang. Processing, vol. 22, no. 12, pp. 1931–1940, Dec. 2014.
[5] J. L. Flanagan and R. M. Golden, “Phase vocoder,” Bell Syst. Tech. J., vol. 45,
no. 9, pp. 1493–1509, 1966.
[6] R. C. Hendriks, T. Gerkmann, and J. Jensen, DFT-Domain Based Single-
Microphone Noise Reduction for Speech Enhancement: A Survey of the State-of-
the-art. San Rafael, CA: Morgan & Claypool, Feb. 2013.
[7] Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square
error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal
Processing, vol. 32, no. 6, pp. 1109–1121, Dec. 1984.
[8] T. F. Quatieri, “Phase estimation with application to speech analysis-synthesis,”
Ph.D. dissertation, Massachusetts Inst. Technol., 1979.
[9] N. Sturmel and L. Daudet, “Iterative phase reconstruction of Wiener filtered sig-
nals,” in Proc. ICASSP,Mar.2012, pp. 101104.
[10] D. L. Wang and J. S. Lim, “The unimportance of phase in speech enhancement,”
IEEE Trans. Acoust., Speech, Signal Processing, vol. 30, no. 4, pp. 679681, 1982.
[11] P. Vary, “Noise suppression by spectral magnitude estimation – mechanism and
theoretical limits,” Elsevier Signal Process., vol. 8, pp. 387–400, May 1985.
[12] T. Gerkmann, “Bayesian estimation of clean speech spectral coefficients given
a priori knowledge of the phase,” IEEE Trans. Signal Processing, vol. 62, no. 16,
pp. 41994208, Aug. 2014.
[13] J. R. Hershey, S. J. Rennie, and J. Le Roux, “Factorial models for noise robust
speech recognition,” in Techniques for Noise Robustness in Automatic Speech
Recognition, T. Virtanen, R. Singh, and B. Raj, Eds. Hoboken, NJ: Wiley, 2012,
ch. 12.
[14] M. Kazama,
S. Gotoh, M. Tohyama, and T. Houtgast, “On the significance of
phase in the short term Fourier spectrum for speech intelligibility.” J. Acoust. Soc.
Amer., vol. 127, no. 3, pp. 1432–1439, Mar. 2010.
[15] K. Paliwal, K. Wójcicki, and B. Shannon, “The importance of phase in speech
enhancement,” Elsevier Speech Commun., vol. 53, no. 4, pp. 465–494, Apr. 2011.
[16] D. W. Griffin, “Signal estimation from modified short-time Fourier transform
magnitude,” Master’s thesis, Dept. Electr. Eng. and Computer Sci., Massachusetts
Inst. Technol., Dec. 1983.
[17] X. Zhu, G. T. Beauregard, and L. L. Wyse, “Real-time signal estimation from mod-
ified short-time Fourier transform magnitude spectra,” IEEE Trans. Audio, Speech,
Lang. Processing, vol. 15, no. 5, pp. 1645–1653, July 2007.
[18] J. Le Roux, N. Ono, and S. Sagayama, “Explicit consistency constraints for STFT
spectrograms and their application to phase reconstruction,” in Proc. ISCA Workshop
Statistical Perceptual Audition (SAPA), Sept. 2008, pp. 23–28.
[19] J. Le Roux, H. Kameoka, N. Ono, and S. Sagayama, “Phase initialization schemes
for faster spectrogram-consistency-based signal reconstruction,” in Proc. Acoustical
Society Japan Autumn Meeting, paper no. 3-10-3, Sept. 2010.
[20] V. Gnann and M. Spiertz, “Improving RTISI phase estimation with energy
order and phase unwrapping,” in Proc. Int. Conf. Digital Audio Effects (DAFx),
Sept. 2010.
[21] D. Gunawan and D. Sen, “Iterative phase estimation for the synthesis of sepa-
rated sources from single-channel mixtures,” IEEE Signal Process. Lett., vol. 17, no.
5, pp. 421424, May 2010.
[22] N. Sturmel and L. Daudet, “Informed source separation using iterative recon-
struction,” IEEE Trans. Audio, Speech, Lang. Processing, vol. 21, no. 1, pp. 178–185,
Jan. 2013.
[23] J. Jensen and J. H. Hansen, “Speech enhancement using a constrained iterative
sinusoidal model,” IEEE Trans. Speech Audio Processing, vol. 9, no. 7, pp. 731–740,
Oct. 2001.
[24] J. Laroche and M. Dolson, “Improved phase vocoder time-scale modification of
audio,” IEEE Trans. Speech Audio Processing, vol. 7, no. 3, pp. 323–332, May 1999.
[25] T. Gerkmann and M. Krawczyk, “MMSE-optimal spectral amplitude estimation
given the STFT-phase,” IEEE Signal Process. Lett., vol. 20, no. 2, pp. 129–132, Feb.
2013.
[26] P. Mowlaee and R. Saeidi, “Iterative closed-loop phase-aware single-channel
speech enhancement,” IEEE Signal Process. Lett., vol. 20, no. 12, pp. 1235–1239,
Dec. 2013.
[27] A. Sugiyama and R. Miyahara, “Tapping-noise suppression with magnitude-
weighted phase-based detection,” in Proc. IEEE WASPAA, Oct 2013, pp. 1–4.
[28]J.Bello,L.Daudet,S.Abdallah,C.Duxbury,M.
Davies, and M. B. Sandler, “A
tutorial on onset detection in music signals,” IEEE Trans. Speech Audio Process.,
vol. 13, no. 5, pp. 1035–1047, Sept. 2005.
[29] J. Le Roux and E. Vincent, “Consistent Wiener filtering for audio source sepa-
ration,” IEEE Signal Process. Lett., vol. 20, no. 3, pp. 217–220, Mar. 2013.
[30] C. Breithaupt, M. Krawczyk, and R. Martin, “Parameterized MMSE spectral
magnitude estimation for the enhancement of noisy speech,” in Proc. IEEE Int.
Conf. Acoustics, Speech, Signal Processing (ICASSP),Las Vegas, NV, Apr.2008,
pp. 4037–4040.
[SP]
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
__________
__________
__________________
____________________
________________