Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

IEEE SIGNAL PROCESSING MAGAZINE [44] MARCH 2015

consideration of the rendering environment. It is also a major fac-

tor in human-to-human communications as communication

technology degrades or severs the auditory and visual links

between the speaker and the environment. For example, an

announcer at a railway station generally receives little visual or

auditory feedback. Similarly, a phone user lacks information about

the rendering environment, even less so if effective noise-suppres-

sion technology is used.

The lack of feedback, together with the recent ability to commu-

nicate from anywhere to anywhere, often leads to low intelligibility.

Phone booths are a relic of the past:

the mobile phone is expected to func-

tion in any environment, whether it

is a car, a cafeteria, or a windstorm.

Thus, there is a strong motivation for

algorithms that can improve the in-

telligibility of speech rendered in a

noisy environment.

Ever since the early work of

Griffiths [2] and Niederjohn and Gro-

telueschen [3], researchers have

attempted to create processing meth-

ods that increase the intelligibility of speech in a noisy environ-

ment. Driven by the rapid growth of mobile telephony, research

efforts on intelligibility in noise have increased significantly in the

last five years. The result is that it is now possible to significantly

increase the intelligibility of speech in noise, e.g., [4]– [11].

Approaches to intelligibility enhancement are increasingly based

on the mathematical optimization of quantitative measures that

are hypothesized to represent intelligibility accurately. First intro-

duced by [2], the optimization approach has been used in numer-

ous recent studies, starting with [12]. The optimization criteria

vary widely as the signal processing algorithms are derived from

different viewpoints and with different computational and delay

constraints. Criteria used include the probability of correct pho-

neme recognition [11], auditory models [6], [13], [14], the articu-

lation index [2], the SII [4], [8], mutual information [15], and

sound-field distortion [16].

In this tutorial, we describe a range of methods for intelligibil-

ity enhancement from a unified vantage point, delineating the

similarities and dissimilarities between the various approaches. In

contrast to the broad overview of human and algorithmic modifi-

cations that affect intelligibility in [7], our discussion focuses on

the definition and use of quantitative measures of intelligibility,

showing that many of these measures can be derived from the

same basic principle.

MEASURES OF INTELLIGIBILITY

In this section, we first discuss how to define a quantitative mea-

sure of intelligibility. We then discuss practical measures of

intelligibility.

DEFINING INTELLIGIBILITY

The word int el li gi b i li ty expresses a qualitative measure of whether

a conveyed message is interpreted correctly by a human listener.

To define quantitative instrumental measures of intelligibility, we

must select a level of abstraction. That is, we must decide if we

measure intelligibility on the sequence of words spoken, on the

sequence of sounds, on a sequence of states of the auditory system,

or on the acoustic signal waveform. A word sequence is an exam-

ple of a description at a high level of abstraction, whereas a signal

waveform is a description at a low level of abstraction.

The higher the level of abstraction, the more fundamental the

measure of intelligibility: the objective of speech is to convey a mes-

sage and not to convey a sequence of sounds. A particular measure

will be useful for enhancement at its

own level of abstraction and below.

Consider an intelligibility measure

operating at the word sequence level.

It can be used to evaluate which of a

set of sentence formulations with

similar meaning is more intelligible.

It can also be used to evaluate if a par-

ticular spectral modification (e.g., a

particular filtering operation) makes

speech more intelligible.

The generality of high-level mea-

sures has a cost: we must map the observations into a sequence at

that high abstraction level. For acoustic observations and a mea-

sure operating at the word-sequence level, this requires a robust

model of hearing that maps the observed acoustic signal into a

word sequence. Therefore, although it cannot optimize linguistic

formulations, an intelligibility measure operating on a sequence of

auditory states may be attractive when optimizing a spectral modi-

fication of the signal.

While illusive in practical measurements, the message itself, a

random variable that we denote as

,M can be used to define the

most basic measure of intelligibility. (To aid clarity, we will write

random variables as bold-face characters and their realizations

as regular characters.) In the following, we will show how such

a basic measure can be used to derive measures that have been

derived earlier on a heuristic basis. To facilitate our reasoning,

we will be opportunistic and sometimes describe the messages

as countable, which is consistent with the notion that a mes-

sage is a discrete word sequence, and at other times as continu-

ous, which is consistent with the notion that articulation is

continuously variable. To avoid confusion, we add a breve, as in

whenever messages are considered countable.

A natural measure of intelligibility is the mutual informa-

tion between the message conveyed by the talker

and the

message interpreted by the listener :M

(; ) (, )

()

(|)

,logIpMM

pMM

LT LT

˘˘

{{

{

{{

(1)

where we used the simplified notation pp

MMLT

˘˘

= and

||LT MM

˘˘

for the joint and conditional probabilities and use

the same convention for the marginal probabilities of the con-

veyed and received messages and

and .p

We can reformulate the criterion (1) as a measure of distor-

tion

(, )D MM

˘˘

that is a functional of .p

|LT

Mutual information

IN RECENT YEARS,

A RANGE OF ALGORITHMS HAS

BEEN DEVELOPED TO ENHANCE

THE INTELLIGIBILITY OF SPEECH

RENDERED IN A NOISY ENVIRONMENT.

WE DESCRIBE METHODS FOR

INTELLIGIBILITY ENHANCEMENT

FROM A UNIFIED VANTAGE POINT.

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND