Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [56] MARCH 2015
enhancement community. Further-
more, we review both early and
recent methods for phase process-
ing in speech enhancement. We aim
to show that phase processing is an
exciting field of research with the
potential to make assisted listening
and speech communication devices
more robust in acoustically challen-
ging environments.
INTRODUCTION
Let us first consider the common speech enhancement setup con-
sisting of STFT analysis, spectral modification, and subsequent
inverse STFT (iSTFT) resynthesis. The analyzed digital signal
,xn
^h
with time index ,n is chopped into L segments with a
length of N samples, overlapping by NR- samples, where R
denotes the segment shift. Each segment , is multiplied with the
appropriately shifted analysis window ()wn R
a
,- and trans-
formed into the frequency domain by applying the discrete Fou-
rier transform (DFT), yielding the complex-valued STFT
coefficients
X C
,k
!
,
for every segment , and frequency band .k
To compactly describe this procedure, we define the STFT opera-
tor:
.XxSTFT=
^h
Here, x is a vector containing the complete
time-domain signal xn
^h
and X is an N L# matrix of all ,X
,k ,
which we will refer to as the spectrogram. Since we are interested
in real-valued acoustic signals, we consider only complex symmet-
ric spectrograms
,X CS
N L
!1
#
where S denotes the subset of
spectrograms for which XX
,
,
Nk
k
=
,
,
-
for all , and ,k with X
being the complex conjugate of .X
After some processing, such as magnitude improvement, is
applied on the STFT coefficients, a modified spectrogram
X
M
is
obtained. From X
M
a time-domain signal can be resynthesized
through an iSTFT operation, denoted
by
x ().XiSTFT=
K
M
For this, the
inverse DFT of the STFT coefficients
is computed and each segment is
multiplied by a synthesis window
();wn R
s
,- the windowed segments
are then overlapped and added to
obtain the modified time-domain sig-
nal. A final renormalization step is
performed to ensure that, if no processing is applied to the spectral
coefficients, there is perfect reconstruction of the input signal,
i.e., 
.xxiSTFT STFT =
^^hh
The renormalization term, equal to
,w n qR w n qR
q
sa
++
3
3
=-
+
^^hh
/
is R -periodic and can be
included in the synthesis window. A common choice for both
wn
a
^h
and wn
s
^h
is the square-root Hann window, which for
overlaps such that /NR N! (e.g., 50%, 75%, etc.) only requires
normalization by a scalar. If the spectrogram is modified, using the
same window for synthesis as for analysis can be shown to lead to a
resynthesized signal whose spectrogram is closest to
X
M
in the
least-squares sense [1]. This fact will turn out to be important for
the iterative phase estimation approaches discussed later.
Until recently, in STFT-based speech enhancement, the focus
was on modifying only the magnitude of the STFT components,
because it was generally considered that most of the insight
about the structure of the signal could be obtained from the mag-
nitude, while little information could be obtained from the phase
component. This would seem to be substantiated by Figure1
when considering only (a) and (b), where the STFT magnitude (a)
and STFT phase (b) of a clean speech excerpt are depicted. In
contrast to the magnitude spectrogram, the phase spectrogram
appears to show only little temporal and spectral regularities.
There are nonetheless distinct structures inherent to the spectral
phase, but they are hidden to a great extent because the phase is
[FIG1] (a) Magnitude spectrogram, (b) phase spectrogram, (c) group delay, and (d) IF deviation of the utterance ”glowed jewel-bright”
using a segment length of 32 ms and a shift of 4 ms.
Magnitude
Frequency (kHz)
Time (s)
(dB)
0.2 0.4 0.6 0.8 1
–80
–60
–40
–20
0
0
2
4
6
8
Frequency (kHz)
Time (s)
0.2 0.4 0.6 0.8 1
0
2
4
6
8
Frequency (kHz)
Time (s)
0.2 0.4 0.6 0.8 1
0
2
4
6
8
Frequency (kHz)
Time (s)
(a)
(c)
(b)
(d)
0.2 0.4 0.6 0.8 1
0
2
4
6
8
Phase
(rad)
π
0
π
Group Delay
(ms)
0
5
10
15
20
25
30
IF Deviation
(Hz)
–100
–50
0
50
100
WITH THE ADVANCEMENT OF
TECHNOLOGY, BOTH ASSISTED
LISTENING DEVICES AND SPEECH
COMMUNICATION DEVICES ARE
BECOMING MORE PORTABLE AND
ALSO MORE FREQUENTLY USED.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®