Zoom out Search Issue

ManualsBrandsContents Manualsaudio & home theatreZoom in

IEEE SIGNAL PROCESSING MAGAZINE [32] MARCH 2015

sounds can be extracted from a specific direction or from a specific

arbitrary two-dimensional or even three-dimensional region of inter-

est. Furthermore, the sound scene can be manipulated to create an

acoustic zoom effect in which direct sounds within the listening

angular range are amplified depending on the zoom factor, while

other sounds are suppressed. In addition, the signals and parameters

can be used to create surround sound signals. As the manipulation

and synthesis are highly application dependent, we focus in this arti-

cle on three illustrative assisted listening applications: spatial audio

communication, virtual classroom, and binaural hearing aids.

INTRODUCTION

Communication and assisted listening devices commonly use

multiple microphones to create one or more signals, the content

of which highly depends on the application. For example, when

smart glasses are used to record a video, the microphones can be

used to create a surround sound recording that consists of mul-

tiple audio signals. A compact yet accurate representation of the

sound field at the recording position makes it possible to render

the sound field on an arbitrary reproduction setup in a different

location. On the other hand, when the device is used in hands-free

or speech recognition mode, the microphones can be used to

extract the user’s speech while reducing background noise and

interfering sounds. In the last few decades, sophisticated solutions

for these applications were developed.

Spatial recordings are commonly made using specific micro-

phone setups. For instance, there are several stereo recording

techniques in which different positioning of the microphones of

the same or different types (e.g., cardioid or omnidirectional

microphones) is exploited to make a stereo recording that can be

reproduced using loudspeakers. When more loudspeakers are

available for spatial sound rendering, the microphone recordings

are often specifically mixed for a given reproduction setup. These

classical techniques do not provide the flexibility required in many

modern applications where the reproduction setup is not known

in advance. Signal enhancement, on the other hand, is commonly

achieved by filtering, and subsequently summing the available

microphone signals. Classical spatial filters often require informa-

tion on the second-order statistics (SOS) of the desired and unde-

sired signals (cf. [4] and [5]). For real-time applications, the SOS

need to be estimated online, and the quality of the output signal

highly depends on the accuracy of these estimates. To date, major

challenges remain, such as:

1) achieving a sufficiently fast response to changes in the

sound scene (such as moving and emerging sources) and to

changes in the acoustic conditions

2) providing sufficient flexibility in terms of spatial selectivity

3) ensuring a high-quality output signal at all times

4) providing solutions with a manageable computational

complexity.

Although the use of multiple microphones provides, at least in

theory, a major advantage over a single microphone, the adoption

of multimicrophone techniques in practical systems has not been

particularly popular until very recently. Possible reasons for this

could be that in real-life scenarios, these techniques provided

insufficient improvement over single-microphone techniques,

while significantly increasing the computational complexity, the

system calibration effort, and the manufacturing costs. In the last

few years, the smartphone and hearing aid industries made a sig-

nificant step forward in using multiple microphones, which has

recently become a standard for these devices.

Parametric spatial sound processing provides a unified solution

to both the spatial recording and signal enhancement problems,

as well as to other challenging sound processing tasks such as add-

ing virtual sound sources to the sound scene. As illustrated in

Figure 1, the parametric processing is performed in two successive

steps that can be completed on the same device or on different

devices. In the first step, the sound field is analyzed in narrow fre-

quency bands using multiple microphones to obtain a compact

and perceptually meaningful description of the sound field in

terms of direct and diffuse sound components and some paramet-

ric information (e.g., DOAs and positions). In the second step, the

input signals and possibly the parameters are modified, and one or

more output signals are synthesized. The modification and synthe-

sis can be user, application, or scenario dependent. Parametric

spatial sound processing is also common in audio coding (cf. [6])

where parametric information is extracted directly from the loud-

speaker channels instead of the microphone signals.

The described scheme also allows for an efficient transmission

of sound scenes to the far-end side [1], [7] for loudspeaker

[FIG1] A high-level overview of the parametric spatial sound processing scheme.

Microphone

Signals

Spatial

Analysis

Direct

Diffuse

Parameters

Storage

Transmission

(Optional)

Direct

Diffuse

Parameters

Processing

and

Synthesis

Output

Signal(s)

User Settings

THE WORLD’S NEWSSTAND

THE WORLD’S NEWSSTAND