Zoom out Search Issue

IEEE SIGNAL PROCESSING MAGAZINE [44] MARCH 2015
consideration of the rendering environment. It is also a major fac-
tor in human-to-human communications as communication
technology degrades or severs the auditory and visual links
between the speaker and the environment. For example, an
announcer at a railway station generally receives little visual or
auditory feedback. Similarly, a phone user lacks information about
the rendering environment, even less so if effective noise-suppres-
sion technology is used.
The lack of feedback, together with the recent ability to commu-
nicate from anywhere to anywhere, often leads to low intelligibility.
Phone booths are a relic of the past:
the mobile phone is expected to func-
tion in any environment, whether it
is a car, a cafeteria, or a windstorm.
Thus, there is a strong motivation for
algorithms that can improve the in-
telligibility of speech rendered in a
noisy environment.
Ever since the early work of
Griffiths [2] and Niederjohn and Gro-
telueschen [3], researchers have
attempted to create processing meth-
ods that increase the intelligibility of speech in a noisy environ-
ment. Driven by the rapid growth of mobile telephony, research
efforts on intelligibility in noise have increased significantly in the
last five years. The result is that it is now possible to significantly
increase the intelligibility of speech in noise, e.g., [4]– [11].
Approaches to intelligibility enhancement are increasingly based
on the mathematical optimization of quantitative measures that
are hypothesized to represent intelligibility accurately. First intro-
duced by [2], the optimization approach has been used in numer-
ous recent studies, starting with [12]. The optimization criteria
vary widely as the signal processing algorithms are derived from
different viewpoints and with different computational and delay
constraints. Criteria used include the probability of correct pho-
neme recognition [11], auditory models [6], [13], [14], the articu-
lation index [2], the SII [4], [8], mutual information [15], and
sound-field distortion [16].
In this tutorial, we describe a range of methods for intelligibil-
ity enhancement from a unified vantage point, delineating the
similarities and dissimilarities between the various approaches. In
contrast to the broad overview of human and algorithmic modifi-
cations that affect intelligibility in [7], our discussion focuses on
the definition and use of quantitative measures of intelligibility,
showing that many of these measures can be derived from the
same basic principle.
MEASURES OF INTELLIGIBILITY
In this section, we first discuss how to define a quantitative mea-
sure of intelligibility. We then discuss practical measures of
intelligibility.
DEFINING INTELLIGIBILITY
The word int el li gi b i li ty expresses a qualitative measure of whether
a conveyed message is interpreted correctly by a human listener.
To define quantitative instrumental measures of intelligibility, we
must select a level of abstraction. That is, we must decide if we
measure intelligibility on the sequence of words spoken, on the
sequence of sounds, on a sequence of states of the auditory system,
or on the acoustic signal waveform. A word sequence is an exam-
ple of a description at a high level of abstraction, whereas a signal
waveform is a description at a low level of abstraction.
The higher the level of abstraction, the more fundamental the
measure of intelligibility: the objective of speech is to convey a mes-
sage and not to convey a sequence of sounds. A particular measure
will be useful for enhancement at its
own level of abstraction and below.
Consider an intelligibility measure
operating at the word sequence level.
It can be used to evaluate which of a
set of sentence formulations with
similar meaning is more intelligible.
It can also be used to evaluate if a par-
ticular spectral modification (e.g., a
particular filtering operation) makes
speech more intelligible.
The generality of high-level mea-
sures has a cost: we must map the observations into a sequence at
that high abstraction level. For acoustic observations and a mea-
sure operating at the word-sequence level, this requires a robust
model of hearing that maps the observed acoustic signal into a
word sequence. Therefore, although it cannot optimize linguistic
formulations, an intelligibility measure operating on a sequence of
auditory states may be attractive when optimizing a spectral modi-
fication of the signal.
While illusive in practical measurements, the message itself, a
random variable that we denote as
,M can be used to define the
most basic measure of intelligibility. (To aid clarity, we will write
random variables as bold-face characters and their realizations
as regular characters.) In the following, we will show how such
a basic measure can be used to derive measures that have been
derived earlier on a heuristic basis. To facilitate our reasoning,
we will be opportunistic and sometimes describe the messages
as countable, which is consistent with the notion that a mes-
sage is a discrete word sequence, and at other times as continu-
ous, which is consistent with the notion that articulation is
continuously variable. To avoid confusion, we add a breve, as in
,M
˘
whenever messages are considered countable.
A natural measure of intelligibility is the mutual informa-
tion between the message conveyed by the talker
M
T
˘
and the
message interpreted by the listener :M
L
˘
(; ) (, )
()
(|)
,logIpMM
pM
pMM
MM
,
|
LT
MM
LT
LL
LT LT
˘˘
LT
LT
=
{{
{
{{
{{
/
(1)
where we used the simplified notation pp
MMLT
LT
˘˘
= and
pp
||LT MM
˘˘
=
LT
for the joint and conditional probabilities and use
the same convention for the marginal probabilities of the con-
veyed and received messages and
p
T
and .p
L
We can reformulate the criterion (1) as a measure of distor-
tion
(, )D MM
LT
˘˘
that is a functional of .p
|LT
Mutual information
IN RECENT YEARS,
A RANGE OF ALGORITHMS HAS
BEEN DEVELOPED TO ENHANCE
THE INTELLIGIBILITY OF SPEECH
RENDERED IN A NOISY ENVIRONMENT.
WE DESCRIBE METHODS FOR
INTELLIGIBILITY ENHANCEMENT
FROM A UNIFIED VANTAGE POINT.
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®
Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page
q
q
M
M
q
q
M
M
q
M
THE WORLD’S NEWSSTAND
®