Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

Digital Object Identifier 10.1109/MSP.2014.2365594

Date of publication: 12 February 2015

odern communication technology facilitates

communication from anywhere to anywhere. As

a result, low speech intelligibility has become a

common problem, which is exacerbated by the

lack of feedback to the talker about the render-

ing environment. In recent years, a range of algorithms has been

developed to enhance the intelligibility of speech rendered in a

noisy environment. We describe methods for intelligibility

enhancement from a unified vantage point. Before one defines a

measure of intelligibility, the level of abstraction of the representa-

tion must be selected. For example, intelligibility can be measured

on the message, the sequence of words spoken, the sequence of

sounds, or a sequence of states of the auditory system. Natural

measures of intelligibility defined at the message level are mutual

information and the hit-or-miss criterion. The direct evaluation of

high-level measures requires quantitative knowledge of human

cognitive processing. Lower-level measures can be derived from

higher-level measures by making restrictive assumptions. We dis-

cuss the implementation and performance of some specific

enhancement systems in detail, including speech intelligibility

index (SII)-based systems and systems aimed at enhancing the

sound-field where it is perceived by the listener. We conclude with a

discussion of the current state of the field and open problems.

INTRODUCTION

Humans adapt their speech to the physical environment. Based on

the facial expression of a listener, a talker may repeat or reformu-

late the message. A noisy environment gives rise to the Lombard

effect, e.g., [1], an involuntary change in the speech characteristics

that makes speech more intelligible.

In modern communication systems, the speaker often has lit-

tle or no awareness of the physical environment in which the

speech is rendered. This is perhaps most obvious for current-

generation speech synthesis, which produces speech without

[

W. Bastiaan Kleijn, João B. Crespo, Richard C. Hendriks,

Petko N. Petkov, Bastian Sauert, and Peter Vary

]

EARPHONES—IMAGE LICENSED BY INGRAM PUBLISHING

[

A unified view

]

Optimizing Speech

Intelligibility in a

Noisy Environment

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND