Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

IEEE SIGNAL PROCESSING MAGAZINE [66] MARCH 2015

STFT analysis and synthesis. Further research could therefore also

address phase estimation using low latency filter banks.

After many years in the shadow of magnitude-centric speech

enhancement, phase-aware signal processing is now burgeoning

and expanding quickly: with still many aspects to explore, it is an

exciting area of research that is likely to lead to important break-

throughs and push speech processing forward. Supplemental

material and further references can be found at www.speech.

uni-oldenburg.de/pasp.html.

ACKNOWLEDGMENT

This work was supported by grant GE2538/2-1 of the German

Research Foundation.

AUTHORS

Timo Gerkmann (timo.gerkmann@uni-oldenburg.de) received his

Dipl.-Ing. and Dr.-Ing. degrees in electrical engineering and infor-

mation technology in 2004 and 2010 from the Ruhr-Universität

Bochum, Germany. In 2005, he spent six months with Siemens

Corporate Research in Princeton, New Jersey, United States. From

2010 to 2011, he was a postdoctoral researcher at the Royal Institute

of Technology, Stockholm, Sweden. Since 2011, he has been a pro-

fessor for speech signal processing at the University of Oldenburg,

Germany. His main research interests are digital speech and audio

processing, including speech enhancement, dereverberation, mode-

ling of speech signals, speech recognition, and hearing devices.

Martin Krawczyk-Becker (martin.krawczyk-becker@

uni-oldenburg.de) studied electrical engineering and informa-

tion technology at the Ruhr-Universität Bochum, Germany. His

major was communication technology with a focus on audio

processing, and he received his Dipl.-Ing. degree in August

2011. From January 2010 to July 2010, he was with Siemens

Corporate Research in Princeton, New Jersey, United States.

Since November 2011, he has been pursuing his Ph.D. degree in

the field of speech enhancement and noise reduction at the

University of Oldenburg, Germany.

Jonathan Le Roux (leroux@merl.com) completed his B.Sc. and

M.Sc. degrees in mathematics at the Ecole Normale Supérieure,

Paris, France, and his Ph.D. degree at the University of Tokyo, Japan,

and the Université Pierre et Marie Curie, Paris, France. He is a prin-

cipal research scientist at Mitsubishi Electric Research Laboratories

in Cambridge, Massachusetts, United States, and was previously a

postdoctoral researcher at Nippon Telegraph and Telephone

Communication Science Laboratories. His research interests are in

signal processing and machine learning applied to speech and audio.

He is a Senior Member of the IEEE and a member of the IEEE Audio

and Acoustic Signal Processing Technical Committee.

REFERENCES

[1] D. W. Griffin and J. S. Lim, “Signal estimation from modified short-time Fourier

transform,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 32, no. 2, pp. 236–

243, Apr. 1984.

[2] B. Yegnanarayana and H. Murthy, “Significance of group delay functions in spec-

trum estimation,” IEEE Trans. Signal Processing, vol. 40, no. 9, pp. 2281–2289, Sept.

1992.

[3] A. P. Stark and K. K. Paliwal, “Speech analysis using instantaneous frequency de-

viation,” in Proc. ISCA Interspeech, 2008, pp. 2602–2605.

[4] M. Krawczyk and T. Gerkmann, “STFT phase reconstruction in voiced speech for

an improved single-channel speech enhancement,” IEEE/ACM Trans. Audio, Speech,

Lang. Processing, vol. 22, no. 12, pp. 1931–1940, Dec. 2014.

[5] J. L. Flanagan and R. M. Golden, “Phase vocoder,” Bell Syst. Tech. J., vol. 45,

no. 9, pp. 1493–1509, 1966.

[6] R. C. Hendriks, T. Gerkmann, and J. Jensen, DFT-Domain Based Single-

Microphone Noise Reduction for Speech Enhancement: A Survey of the State-of-

the-art. San Rafael, CA: Morgan & Claypool, Feb. 2013.

[7] Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square

error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal

Processing, vol. 32, no. 6, pp. 1109–1121, Dec. 1984.

[8] T. F. Quatieri, “Phase estimation with application to speech analysis-synthesis,”

Ph.D. dissertation, Massachusetts Inst. Technol., 1979.

[9] N. Sturmel and L. Daudet, “Iterative phase reconstruction of Wiener filtered sig-

nals,” in Proc. ICASSP,Mar.2012, pp. 101–104.

[10] D. L. Wang and J. S. Lim, “The unimportance of phase in speech enhancement,”

IEEE Trans. Acoust., Speech, Signal Processing, vol. 30, no. 4, pp. 679–681, 1982.

[11] P. Vary, “Noise suppression by spectral magnitude estimation – mechanism and

theoretical limits,” Elsevier Signal Process., vol. 8, pp. 387–400, May 1985.

[12] T. Gerkmann, “Bayesian estimation of clean speech spectral coefficients given

a priori knowledge of the phase,” IEEE Trans. Signal Processing, vol. 62, no. 16,

pp. 4199–4208, Aug. 2014.

[13] J. R. Hershey, S. J. Rennie, and J. Le Roux, “Factorial models for noise robust

speech recognition,” in Techniques for Noise Robustness in Automatic Speech

Recognition, T. Virtanen, R. Singh, and B. Raj, Eds. Hoboken, NJ: Wiley, 2012,

ch. 12.

[14] M. Kazama,

S. Gotoh, M. Tohyama, and T. Houtgast, “On the significance of

phase in the short term Fourier spectrum for speech intelligibility.” J. Acoust. Soc.

Amer., vol. 127, no. 3, pp. 1432–1439, Mar. 2010.

[15] K. Paliwal, K. Wójcicki, and B. Shannon, “The importance of phase in speech

enhancement,” Elsevier Speech Commun., vol. 53, no. 4, pp. 465–494, Apr. 2011.

[16] D. W. Griffin, “Signal estimation from modified short-time Fourier transform

magnitude,” Master’s thesis, Dept. Electr. Eng. and Computer Sci., Massachusetts

Inst. Technol., Dec. 1983.

[17] X. Zhu, G. T. Beauregard, and L. L. Wyse, “Real-time signal estimation from mod-

ified short-time Fourier transform magnitude spectra,” IEEE Trans. Audio, Speech,

Lang. Processing, vol. 15, no. 5, pp. 1645–1653, July 2007.

[18] J. Le Roux, N. Ono, and S. Sagayama, “Explicit consistency constraints for STFT

spectrograms and their application to phase reconstruction,” in Proc. ISCA Workshop

Statistical Perceptual Audition (SAPA), Sept. 2008, pp. 23–28.

[19] J. Le Roux, H. Kameoka, N. Ono, and S. Sagayama, “Phase initialization schemes

for faster spectrogram-consistency-based signal reconstruction,” in Proc. Acoustical

Society Japan Autumn Meeting, paper no. 3-10-3, Sept. 2010.

[20] V. Gnann and M. Spiertz, “Improving RTISI phase estimation with energy

order and phase unwrapping,” in Proc. Int. Conf. Digital Audio Effects (DAFx),

Sept. 2010.

[21] D. Gunawan and D. Sen, “Iterative phase estimation for the synthesis of sepa-

rated sources from single-channel mixtures,” IEEE Signal Process. Lett., vol. 17, no.

5, pp. 421–424, May 2010.

[22] N. Sturmel and L. Daudet, “Informed source separation using iterative recon-

struction,” IEEE Trans. Audio, Speech, Lang. Processing, vol. 21, no. 1, pp. 178–185,

Jan. 2013.

[23] J. Jensen and J. H. Hansen, “Speech enhancement using a constrained iterative

sinusoidal model,” IEEE Trans. Speech Audio Processing, vol. 9, no. 7, pp. 731–740,

Oct. 2001.

[24] J. Laroche and M. Dolson, “Improved phase vocoder time-scale modification of

audio,” IEEE Trans. Speech Audio Processing, vol. 7, no. 3, pp. 323–332, May 1999.

[25] T. Gerkmann and M. Krawczyk, “MMSE-optimal spectral amplitude estimation

given the STFT-phase,” IEEE Signal Process. Lett., vol. 20, no. 2, pp. 129–132, Feb.

2013.

[26] P. Mowlaee and R. Saeidi, “Iterative closed-loop phase-aware single-channel

speech enhancement,” IEEE Signal Process. Lett., vol. 20, no. 12, pp. 1235–1239,

Dec. 2013.

[27] A. Sugiyama and R. Miyahara, “Tapping-noise suppression with magnitude-

weighted phase-based detection,” in Proc. IEEE WASPAA, Oct 2013, pp. 1–4.

[28]J.Bello,L.Daudet,S.Abdallah,C.Duxbury,M.

Davies, and M. B. Sandler, “A

tutorial on onset detection in music signals,” IEEE Trans. Speech Audio Process.,

vol. 13, no. 5, pp. 1035–1047, Sept. 2005.

[29] J. Le Roux and E. Vincent, “Consistent Wiener filtering for audio source sepa-

ration,” IEEE Signal Process. Lett., vol. 20, no. 3, pp. 217–220, Mar. 2013.

[30] C. Breithaupt, M. Krawczyk, and R. Martin, “Parameterized MMSE spectral

magnitude estimation for the enhancement of noisy speech,” in Proc. IEEE Int.

Conf. Acoustics, Speech, Signal Processing (ICASSP),Las Vegas, NV, Apr.2008,

pp. 4037–4040.

[SP]

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND

__________

__________________

____________________

________________