Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [45] MARCH 2015
is nonnegative and cannot be larger than the entropy .()H M
T
˘
Thus, the difference (, ) () (; )DHIMM M MM
LT T LT
˘˘ ˘˘˘
=- is non-
negative and can be interpreted as a distortion. It can be written
as a general distortion measure operating on
p
|LT
for a given
talker message distribution :p
T
( , ) ( ) ( ( | )),DpMdpMMMM
|
LT
T
M
T
M
LT L T
˘˘
TL
=
{{{
{{
//
(2)
where d is a nonnegative function of ( | ).pMM
|LT L T
{{
For the
mutual information based distortion measure ((|))dp M M
|LT L T
=
{{
( | ) ( ( )/ ( , )),logpMM pMpMM
|,LTL T LLLTLT
{{ { {{
where we note that the
argument of the logarithm can be written in terms of (|)pMM
|LT L T
{{
and the given ()pM
TT
{
only. The intelligibility enhancement
problem is to find the p
|LT
that minimizes the distortion (2)
subject to the constraints set by the scenario.
An alternative to the mutual information based distortion
measure can be based on the hit-or-miss distortion,
( (|)) (|)( ),dp M M p M M 1
|| ,LT L T LT L T M M
LT
d=-
{{ {{
{{
where
,MM
LT
d
{{
is a
Kronecker delta function. In this case (2) becomes
(, ) (, ) [ ( | )].EDpMMp11MM M M
|LT
LT
M
TT TLT
TT
˘˘ ˘ ˘
T
=- =-
{{
{
/
(3)
The conditional probability (|)pMM
|LT T T
{{
in (3) corresponds to
the probability that the message is interpreted correctly. Thus, an
alternative to maximizing the mutual information of the conveyed
and received message is to maximize the expected probability of
correct message interpretation,
[ ( | )],pE MM
|TLT
TT
˘˘
where the
expectation is over the conveyed messages, .M
T
˘
We will discuss
the practical use of this high-level measure in the section “Mea-
sures Operating on a Word Sequence.”
While the measures (1) and (3) are general, they cannot be
used directly. Either the description of the message or the
human cognitive system must be approximated such that the
measures can be applied to observable signals. The paradigm
shows where such approximations are made, but it does not
show their quantitative impact. Thus, experiments must be used
to verify the validity of the resulting system.
Next, we consider how to derive a low-level, acoustics-based
measure from a high-level, message-based measure. For this it is
convenient to consider the message as a continuous variable. A
conveyed speech message
M
T
is rendered in the form of an acous-
tic signal, which we represent by an acoustic sequence a
T
. The
sequence a
T
can, for example, consist of signal samples or short-
term spectral descriptions, such as cepstral vectors. This sequence
is rendered in a noisy environment and the listener observes a cor-
rupted sequence
,a
L
which is then interpreted as a message .M
L
The communication process thus forms a Markov chain
.MaaM
TTL L
""" It is natural that environmental noise
makes the mapping aa
TL
" stochastic.
Upon reflection, it is clear that the mappings Ma
TT
" and
aM
LL
" are also stochastic: a message is generally not formulated
and never articulated in precisely the same manner, and the inter-
pretation of the acoustic sequence
a
L
is subject to random varia-
tions during the human cognitive process. Anticipating the
discussions in the section “Measures Operating on a Word
Sequence,” it can be argued that these variations are captured by
the statistical modeling of modern automatic speech recognition
(ASR) algorithms. If we assume the message formulation is perfect,
a simple but effective model of the production and interpretation
processes is that they are subject to additive noise components [15],
which we will refer to as, respectively, production noise and inter-
pretation noise. For example, variability in articulation across differ-
ent persons may be approximated as additive noise in a
representation based on cepstral or log spectral vectors.
For convenience let us define auxiliary bijective mappings
Ms
TT
) and ,Ms
LL
) where s
T
and s
L
are realizations of ran-
dom acoustic sequences. We have
asv
TTT
=+
aav
LTE
=+ (4)
,sav
LLL
=+
where ,v
T
,v
E
and v
L
are additive noise processes, modeling the
production noise, environmental noise, and interpretation noise,
respectively. Note that the system model differs from the stan-
dard system model in communication theory, which does not
include production noise and interpretation noise.
To facilitate analysis, let us assume the sequences
,s
T
,v
T
,v
E
and v
L
to be jointly Gaussian processes. Furthermore, we denote
by
sa
t the correlation coefficient of (the samples of the) processes
s and a and write .
0 sa as
ttt=
TT LL
Let us first consider the case
where the signals are white. Exploiting that mutual information is
invariant under reparametrization of the marginal variables, it is
then easy to see that [15]
(; ) (;)
()
,logII
2
1
1
11
MM ss
LT TL
0
2
p
tp
==-
+
-+
(5)
where (/)
22
av
pvv=
TE
is the signal-to-noise ratio (SNR) of the
acoustic channel ,aa
TL
" and
2
a
v
T
and
2
v
v
E
are the variances of
processes a
T
and ,v
E
respectively. An important and intuitive con-
clusion that can be drawn from (5) is that if the environmental
noise variance is small compared to the production and interpreta-
tion noise variances, then the mutual information between talker
and listener is not affected significantly by the environmental noise.
The spectral coloring of the acoustic content can be accounted
for by splitting the signal into spectral bands such that each band
can be approximated as white. If we assume the signals to be sta-
tionary, the frequency bands are independent and the mutual
information can be written as the sum of the mutual informations
in the bands
(; )
()
,logI
2
1
1
11
MM
,
LT
i
i
i
i
0
2
p
tp
=-
+
-+
/
(6)
where i is the band index and where ( / )
i
22
av
pvv=
,,Ti Ei
is the SNR
of the acoustic channel in band .i Note that the SNR in (6) is com-
puted on whichever representation is used for the acoustic fea-
tures. Also note that the variances
2
a
v
,Ti
and
2
v
v
,Ei
are generally
unknown and must be estimated in practice. For example, if the
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®